Local policy search with Bayesian optimization

2021

Conference Paper

ics

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.

Author(s):	Müller, Sarah and von Rohr, Alexander and Trimpe, Sebastian
Book Title:	Advances in Neural Information Processing Systems 34
Volume:	25
Pages:	20708--20720
Year:	2021
Month:	December
Editors:	Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P. S. and Wortman Vaughan, J.
Publisher:	Curran Associates, Inc.

Department(s):	Intelligent Control Systems
Bibtex Type:	Conference Paper (inproceedings)
Paper Type:	Conference

Event Name:	35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Event Place:	Online

Address:	Red Hook, NY
ISBN:	978-1-7138-4539-3
State:	Published
URL:	https://papers.nips.cc/paper/2021/hash/ad0f7a25211abc3889cb0f420c85e671-Abstract.html

Links:	arXiv GitHub

BibTex @inproceedings{muller2021local, title = {Local policy search with Bayesian optimization}, author = {M{\"u}ller, Sarah and von Rohr, Alexander and Trimpe, Sebastian}, booktitle = {Advances in Neural Information Processing Systems 34}, volume = {25}, pages = {20708--20720}, editors = {Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P. S. and Wortman Vaughan, J.}, publisher = {Curran Associates, Inc.}, address = {Red Hook, NY}, month = dec, year = {2021}, doi = {}, url = {https://papers.nips.cc/paper/2021/hash/ad0f7a25211abc3889cb0f420c85e671-Abstract.html}, month_numeric = {12} }

People

ics

Alexander von Rohr

Alumni

ics

Sebastian Trimpe

Alumni

Local policy search with Bayesian optimization

2021

Conference Paper

ics

People

Latest News

Links

Contact Us