Bayesian oracles and safety bounds – Yoshua Bengio
Bayesian oracles and safety bounds
Yoshua Bengio – Scientific Director, Mila & Full Professor, U. Montreal
Could there be safety advantages to the training of a Bayesian oracle that is trained to only do that job, i.e., estimate P(answer | question, data)? What are the scenarios in which such an AI could cause catastrophic harm? Can we even use such an oracle as the intelligence engine of an agent, e.g., by sampling actions that help to achieve goals? What can go wrong even if we assume that we have a perfect prediction of the Bayesian posterior, e.g., if the true explanatory theory is a minority voice in the Bayesian posterior regarding harm prediction? If such an oracle is estimated by a neural network with amortized inference, what could go wrong? Could the implicit optimization used to train the estimated posterior create loopholes with an optimistic bias regarding harm? Could we also use such a Bayesian oracle to obtain conservative risk estimates, i.e., bounds on the probability of harm, that can mitigate the imperfections in such an agent?
GS AI seminars
The monthly seminar series on Guaranteed Safe AI brings together researchers to advance the field of building AI with high-assurance quantitative safety guarantees.