2017 - 2018

BAYESIAN DEEP RL FOR DIALOGUE

I did this work in the Dialogue Systems group at the University of Cambridge, under the supervision of Milica Gasic and Pawel Budzianowski. The project was about dialogue policy optimisation and uncertainty.

Classical Gaussian-process dialogue policies had strong sample efficiency, but the computational cost made them harder to scale. Deep Q-networks were more flexible, but their uncertainty estimates were weak. I explored whether Bayesian approximations inside neural policies could recover some of the useful exploration behaviour without paying the full Gaussian-process cost.

What I implemented

I implemented several uncertainty-estimation methods for DQN-based dialogue management in PyDial: Bayes-by-Backprop, dropout, concrete dropout, bootstrapped ensembles, and alpha-divergences. The experiments compared how these methods affected policy learning, convergence, and task success in simulated dialogue domains.

Bayes-by-Backprop gave the strongest results among the neural approaches, reaching performance comparable to state-of-the-art GPSARSA while avoiding the high computational complexity of Gaussian Processes.

Publications

Tegho, C., Budzianowski, P., and Gasic, M. (2018). Benchmarking Uncertainty Estimates With Deep Reinforcement Learning for Dialogue Policy Optimisation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Best Student Paper Award.
Tegho, C., Budzianowski, P., and Gasic, M. (2017). Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation. Bayesian Deep Learning Workshop, NeurIPS.
Tegho, C. Bayes By Backprop Neural Networks forDialogue Management. Thesis Dissertation for MPhil in Machine Learning, Speech and Language Technology, University of Cambridge.

Scholarships and Awards

Best Student Paper Award (ICASSP 2018)
Graduate Masters Scholarship from the Fonds de Recherche - Nature et Technologie Quebec

Bayesian neural networks / deep RL / dialogue systems / PyDial