Ryan Carey

Research Fellow (Oxford)


Ryan Carey a research fellow at the Future of Humanity Institute, University of Oxford, a Statistics DPhil student with Robin Evans, and a co-founder of the Causal Incentives Working Group.

He is interested in how causal models can be used to model agents’ incentives, with applications to the safety and fairness of AI systems.

The Incentives that Shape Behaviour

The Incentives that Shape Behaviour. SafeAI workshop at AAAI 2019. https://arxiv.org/abs/2001.07118 Abstract: Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this […]

(When) Is Truth-telling Favored in AI Debate?

(When) Is Truth-telling Favored in AI Debate? SafeAI Workshop at AAAI, 2019. https://arxiv.org/abs/1911.04266 Abstract: For some problems, humans may not be able to accurately judge the goodness of AI-proposed solutions. Irving et al. (2018) propose that in such cases, we may use a debate between two AI systems to amplify the problem-solving capabilities of a human judge. We […]