CBL Alumni Talk: Examining Critiques in Bayesian Deep Learning – Andrew Gordon Wilson

When:
April 16, 2021 @ 4:00 pm – 5:00 pm
2021-04-16T16:00:00+01:00
2021-04-16T17:00:00+01:00
Where:
https://eng-cam.zoom.us/j/82969702755?pwd=L0dIVnlwSHJHV2NGbUQ1cmxpYjIyUT09
Contact:
Elre Oldewage

Approximate inference procedures in Bayesian deep learning have become scalable and practical, often providing better accuracy and calibration than classical training, without significant computational overhead. However, there have emerged several challenges to the Bayesian approach in deep learning. It was found in an empirical study that deep ensembles, formed from re-training an architecture and ensembling the result, outperformed some approaches to approximate Bayesian inference — which led to the question of whether we should pursue ensembling instead of Bayesian methods in deep learning. It was later observed that several approximate inference approaches appear to raise the posterior to a power 1/T, with T less than 1, leading to a “cold posterior”, which was asserted as being “sharply divergent” with Bayesian principles. In the same paper, the popular Gaussian priors we use in deep learning were questioned as unreasonable, supported by an experiment showing that each sample function from a prior appears to assign nearly all of CIFAR-10 to a particular class.

In this talk, we will examine these critiques, and show that (1) deep ensembles provide a better approximation of the Bayesian predictive distribution than the approximate inference procedures considered in the empirical study, and in general are a reasonable approach to approximate inference in deep learning under severe computational constraints; (2) tempering is in fact not typically required, and is also a reasonable procedure in general; (3) the example of prior functions assigning nearly all data to one class can be easily resolved by calibrating the signal variance of the Gaussian prior; (4) Gaussian priors, while imperfect like any prior, induce a prior over functions with many desirable properties when combined with a neural architecture.

A theme in this talk is that while we should be careful to scrutinize our modelling procedures, we should also apply the same critical scrutiny to the critiques, leading to a deeper and more nuanced understanding, and more successful practical innovations.

Sections 3.2, 3.3, 4-9 of https://arxiv.org/abs/2002.08791 provide good background reading for the talk.

Leave a Reply

Your email address will not be published. Required fields are marked *