This project is developing processes to ensure that AI systems are transparent, reliable and trustworthy.
As AI systems are widely deployed in real-world settings, it is critical for us to understand the mechanisms by which they take decisions, when they can be trusted to perform well, and when they may fail. This project addresses these goals in three strands.
We need to be able to understand the internal processes of AI systems. This is particularly challenging for approaches such as neural networks or genetic algorithms, which learn or evolve to carry out a task without clear mappings to chains of inference that are easy for a human to understand. This strand will study ways to make interpretable the reasons for an AI’s predictions or decisions.
Real-world AIs need to perform reliably in settings that could be very different to their training environments, with associated risks of unpredictable and unwanted behaviours. We seek to develop new approaches that can guarantee good performance for scalable probabilistic reasoning, even in unforeseen settings. This may include notions of learning and inference which can supply proofs of accuracy, as used in formal verification systems. Another approach is to explore ways for an AI to monitor its situation dynamically to detect if its environment has changed beyond its reliability zone (allowing an alert and shift to a fallback mode).
Human studies indicate that a theory of mind may be essential to build empathetic trust, and for reliable initiation of acts of kindness. Equipping AIs to infer beliefs and goals of other agents (such as humans) may improve human-machine collaborations; yet such cognitive insight may prove a double-edged sword, allowing deception and even manipulation. We shall explore these themes with researchers on the Agents and Persons, and Kinds of Intelligence projects, and with leading experts from psychology.