Publications | Intelligent Control Systems - Max Planck Institute for Intelligent Systems

89 results (View BibTeX file of all listed publications)

2022

The Wheelbot: A Jumping Reaction Wheel Unicycle

Geist, A. R., Fiene, J., Tashiro, N., Jia, Z., Trimpe, S.

IEEE Robotics and Automation Letters, 7(4):9683-9690, IEEE, 2022 (article)

Abstract

Combining off-the-shelf components with 3D- printing, the Wheelbot is a symmetric reaction wheel unicycle that can jump onto its wheels from any initial position. With non-holonomic and under-actuated dynamics, as well as two coupled unstable degrees of freedom, the Wheelbot provides a challenging platform for nonlinear and data-driven control research. This letter presents the Wheelbot's mechanical and electrical design, its estimation and control algorithms, as well as experiments demonstrating both self-erection and disturbance rejection while balancing.

link (url) DOI [BibTex]

2022

Geist, A. R., Fiene, J., Tashiro, N., Jia, Z., Trimpe, S. The Wheelbot: A Jumping Reaction Wheel Unicycle IEEE Robotics and Automation Letters, 7(4):9683-9690, IEEE, 2022 (article)

link (url) DOI [BibTex]

Learning Fast and Precise Pixel-to-Torque Control: A Platform for Reproducible Research of Learning on Hardware

Bleher, S., Heim, S., Trimpe, S.

IEEE Robotics & Automation Magazine, 29(2):75-84 , June 2022 (article)

DOI [BibTex]

Bleher, S., Heim, S., Trimpe, S. Learning Fast and Precise Pixel-to-Torque Control: A Platform for Reproducible Research of Learning on Hardware IEEE Robotics & Automation Magazine, 29(2):75-84 , June 2022 (article)

DOI [BibTex]

2021

Learning-enhanced robust controller synthesis with rigorous statistical and control-theoretic guarantees

Fiedler, C., Scherer, C. W., Trimpe, S.

In 60th IEEE Conference on Decision and Control (CDC), IEEE, December 2021 (inproceedings) Accepted

Abstract

The combination of machine learning with control offers many opportunities, in particular for robust control. However, due to strong safety and reliability requirements in many real-world applications, providing rigorous statistical and control-theoretic guarantees is of utmost importance, yet difficult to achieve for learning-based control schemes. We present a general framework for learning-enhanced robust control that allows for systematic integration of prior engineering knowledge, is fully compatible with modern robust control and still comes with rigorous and practically meaningful guarantees. Building on the established Linear Fractional Representation and Integral Quadratic Constraints framework, we integrate Gaussian Process Regression as a learning component and stateof-the-art robust controller synthesis. In a concrete robust control example, our approach is demonstrated to yield improved performance with more data, while guarantees are maintained throughout.

link (url) [BibTex]

2021

Fiedler, C., Scherer, C. W., Trimpe, S. Learning-enhanced robust controller synthesis with rigorous statistical and control-theoretic guarantees In 60th IEEE Conference on Decision and Control (CDC), IEEE, December 2021 (inproceedings) Accepted

link (url) [BibTex]

Task space adaptation via the learning of gait controllers of magnetic soft millirobots

Demir, S. O., Culha, U., Karacakol, A. C., Pena-Francesch, A., Trimpe, S., Sitti, M.

The International Journal of Robotics Research, 40(12-14):1331-1351, December 2021 (article)

DOI Project Page [BibTex]

Demir, S. O., Culha, U., Karacakol, A. C., Pena-Francesch, A., Trimpe, S., Sitti, M. Task space adaptation via the learning of gait controllers of magnetic soft millirobots The International Journal of Robotics Research, 40(12-14):1331-1351, December 2021 (article)

DOI Project Page [BibTex]

Local policy search with Bayesian optimization

Müller, S., von Rohr, A., Trimpe, S.

In Advances in Neural Information Processing Systems 34, 25, pages: 20708-20720, (Editors: Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P. S. and Wortman Vaughan, J.), Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) , December 2021 (inproceedings)

Abstract

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.

arXiv GitHub link (url) [BibTex]

Müller, S., von Rohr, A., Trimpe, S. Local policy search with Bayesian optimization In Advances in Neural Information Processing Systems 34, 25, pages: 20708-20720, (Editors: Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P. S. and Wortman Vaughan, J.), Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) , December 2021 (inproceedings)

arXiv GitHub link (url) [BibTex]

Using Physics Knowledge for Learning Rigid-Body Forward Dynamics with Gaussian Process Force Priors

Rath, L., Geist, A. R., Trimpe, S.

In Proceedings of the 5th Conference on Robot Learning, 164, pages: 101-111, Proceedings of Machine Learning Research, (Editors: Faust, Aleksandra and Hsu, David and Neumann, Gerhard), PMLR, 5th Conference on Robot Learning (CoRL 2021), November 2021 (inproceedings)

link (url) [BibTex]

Rath, L., Geist, A. R., Trimpe, S. Using Physics Knowledge for Learning Rigid-Body Forward Dynamics with Gaussian Process Force Priors In Proceedings of the 5th Conference on Robot Learning, 164, pages: 101-111, Proceedings of Machine Learning Research, (Editors: Faust, Aleksandra and Hsu, David and Neumann, Gerhard), PMLR, 5th Conference on Robot Learning (CoRL 2021), November 2021 (inproceedings)

link (url) [BibTex]

Models for Data-Efficient Reinforcement Learning on Real-World Applications

Doerr, A.

University of Stuttgart, Stuttgart, October 2021 (phdthesis)

DOI [BibTex]

Doerr, A. Models for Data-Efficient Reinforcement Learning on Real-World Applications University of Stuttgart, Stuttgart, October 2021 (phdthesis)

DOI [BibTex]

GoSafe: Globally Optimal Safe Robot Learning

Baumann, D., Marco, A., Turchetta, M., Trimpe, S.

In 2021 IEEE International Conference on Robotics and Automation (ICRA 2021), pages: 4452-4458, IEEE, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2021), October 2021 (inproceedings)

DOI [BibTex]

Baumann, D., Marco, A., Turchetta, M., Trimpe, S. GoSafe: Globally Optimal Safe Robot Learning In 2021 IEEE International Conference on Robotics and Automation (ICRA 2021), pages: 4452-4458, IEEE, Piscataway, NJ, IEEE International Conference on Robotics and Automation (ICRA 2021), October 2021 (inproceedings)

DOI [BibTex]

Probabilistic robust linear quadratic regulators with Gaussian processes

von Rohr, A., Neumann-Brosig, M., Trimpe, S.

Proceedings of the 3rd Conference on Learning for Dynamics and Control, pages: 324-335, Proceedings of Machine Learning Research (PMLR), Vol. 144, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.), PMLR, Brookline, MA 02446 , 3rd Annual Conference on Learning for Dynamics and Control (L4DC), June 2021 (conference)

Abstract

Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. While learning-based control has the potential to yield superior performance in demanding applications, robustness to uncertainty remains an important challenge. Since Bayesian methods quantify uncertainty of the learning results, it is natural to incorporate these uncertainties in a robust design. In contrast to most state-of-the-art approaches that consider worst-case estimates, we leverage the learning methods’ posterior distribution in the controller synthesis. The result is a more informed and thus efficient trade-off between performance and robustness. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin. The formulation is based on a recently proposed algorithm for linear quadratic control synthesis, which we extend by giving probabilistic robustness guarantees in the form of credibility bounds for the system’s stability. Comparisons to existing methods based on worst-case and certainty-equivalence designs reveal superior performance and robustness properties of the proposed method.

link (url) DOI [BibTex]

von Rohr, A., Neumann-Brosig, M., Trimpe, S. Probabilistic robust linear quadratic regulators with Gaussian processes Proceedings of the 3rd Conference on Learning for Dynamics and Control, pages: 324-335, Proceedings of Machine Learning Research (PMLR), Vol. 144, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.), PMLR, Brookline, MA 02446 , 3rd Annual Conference on Learning for Dynamics and Control (L4DC), June 2021 (conference)

link (url) DOI [BibTex]

On exploration requirements for learning safety constraints

Massiani, P., Heim, S., Trimpe, S.

In Proceedings of the 3rd Conference on Learning for Dynamics and Control, pages: 905-916, Proceedings of Machine Learning Research (PMLR), Vol. 144, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie), PMLR, 3rd Annual Conference on Learning for Dynamics and Control (L4DC), June 2021 (inproceedings)

Abstract

Enforcing safety for dynamical systems is challenging, since it requires constraint satisfaction along trajectory predictions. Equivalent control constraints can be computed in the form of sets that enforce positive invariance, and can thus guarantee safety in feedback controllers without predictions. However, these constraints are cumbersome to compute from models, and it is not yet well established how to infer constraints from data. In this paper, we shed light on the key objects involved in learning control constraints from data in a model-free setting. In particular, we discuss the family of constraints that enforce safety in the context of a nominal control policy, and expose that these constraints do not need to be accurate everywhere. They only need to correctly exclude a subset of the state-actions that would cause failure, which we call the critical set.

link (url) [BibTex]

Massiani, P., Heim, S., Trimpe, S. On exploration requirements for learning safety constraints In Proceedings of the 3rd Conference on Learning for Dynamics and Control, pages: 905-916, Proceedings of Machine Learning Research (PMLR), Vol. 144, (Editors: Jadbabaie, Ali and Lygeros, John and Pappas, George J. and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie), PMLR, 3rd Annual Conference on Learning for Dynamics and Control (L4DC), June 2021 (inproceedings)

link (url) [BibTex]

Structured learning of rigid-body dynamics: A survey and unified view from a robotics perspective

Geist, A. R., Trimpe, S.

GAMM-Mitteilungen, 44(2):e202100009, Special Issue: Scientific Machine Learning, 2021 (article)

Abstract

Accurate models of mechanical system dynamics are often critical for model-based control and reinforcement learning. Fully data-driven dynamics models promise to ease the process of modeling and analysis, but require considerable amounts of data for training and often do not generalize well to unseen parts of the state space. Combining data-driven modeling with prior analytical knowledge is an attractive alternative as the inclusion of structural knowledge into a regression model improves the model's data efficiency and physical integrity. In this article, we survey supervised regression models that combine rigid-body mechanics with data-driven modeling techniques. We analyze the different latent functions (such as kinetic energy or dissipative forces) and operators (such as differential operators and projection matrices) underlying common descriptions of rigid-body mechanics. Based on this analysis, we provide a unified view on the combination of data-driven regression models, such as neural networks and Gaussian processes, with analytical model priors. Furthermore, we review and discuss key techniques for designing structured models such as automatic differentiation.

DOI [BibTex]

Geist, A. R., Trimpe, S. Structured learning of rigid-body dynamics: A survey and unified view from a robotics perspective GAMM-Mitteilungen, 44(2):e202100009, Special Issue: Scientific Machine Learning, 2021 (article)

DOI [BibTex]

Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression

Fiedler, C., Scherer, C. W., Trimpe, S.

In The Thirty-Fifth AAAI Conference on Artificial Intelligence, the Thirty-Third Conference on Innovative Applications of Artificial Intelligence, the Eleventh Symposium on Educational Advances in Artificial Intelligence, 8, pages: 7439-7447, AAAI Press, Palo Alto, CA, Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), Thirty-Third Conference on Innovative Applications of Artificial Intelligence (IAAI 2021), Eleventh Symposium on Educational Advances in Artificial Intelligence (EAAI 2021), May 2021 (inproceedings)

Abstract

Gaussian Process regression is a popular nonparametric regression method based on Bayesian principles that provides uncertainty estimates for its predictions. However, these estimates are of a Bayesian nature, whereas for some important applications, like learning-based control with safety guarantees, frequentist uncertainty bounds are required. Although such rigorous bounds are available for Gaussian Processes, they are too conservative to be useful in applications. This often leads practitioners to replacing these bounds by heuristics, thus breaking all theoretical guarantees. To address this problem, we introduce new uncertainty bounds that are rigorous, yet practically useful at the same time. In particular, the bounds can be explicitly evaluated and are much less conservative than state of the art results. Furthermore, we show that certain model misspecifications lead to only graceful degradation. We demonstrate these advantages and the usefulness of our results for learning-based control with numerical examples.},

link (url) [BibTex]

Fiedler, C., Scherer, C. W., Trimpe, S. Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression In The Thirty-Fifth AAAI Conference on Artificial Intelligence, the Thirty-Third Conference on Innovative Applications of Artificial Intelligence, the Eleventh Symposium on Educational Advances in Artificial Intelligence, 8, pages: 7439-7447, AAAI Press, Palo Alto, CA, Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), Thirty-Third Conference on Innovative Applications of Artificial Intelligence (IAAI 2021), Eleventh Symposium on Educational Advances in Artificial Intelligence (EAAI 2021), May 2021 (inproceedings)

link (url) [BibTex]

A little damping goes a long way

Heim, S., Millard, M., Mouel, C. L., Badri-Spröwitz, A.

In Integrative and Comparative Biology, 61(Supplement 1):E367-E367, Oxford University Press, Society for Integrative and Comparative Biology Annual Meeting (SICB Annual Meeting 2021) , March 2021 (inproceedings)

link (url) DOI [BibTex]

Heim, S., Millard, M., Mouel, C. L., Badri-Spröwitz, A. A little damping goes a long way In Integrative and Comparative Biology, 61(Supplement 1):E367-E367, Oxford University Press, Society for Integrative and Comparative Biology Annual Meeting (SICB Annual Meeting 2021) , March 2021 (inproceedings)

link (url) DOI [BibTex]

Robot Learning with Crash Constraints

Marco, A., Baumann, D., Khadiv, M., Hennig, P., Righetti, L., Trimpe, S.

IEEE Robotics and Automation Letters, 6(2):1439-1446, IEEE, February 2021 (article)

Abstract

In the past decade, numerous machine learning algorithms have been shown to successfully learn optimal policies to control real robotic systems. However, it is common to encounter failing behaviors as the learning loop progresses. Specifically, in robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. Both complicate the design of proper reward functions to penalize failures. In this paper, we propose a framework that addresses those issues. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints, where no data is obtained upon constraint violation. The no-data case is addressed by a novel GP model (GPCR) for the constraint that combines discrete events (failure/success) with continuous observations (only obtained upon success). We demonstrate the effectiveness of our framework on simulated benchmarks and on a real jumping quadruped, where the constraint threshold is unknown a priori. Experimental data is collected, by means of constrained Bayesian optimization, directly on the real robot. Our results outperform manual tuning and GPCR proves useful on estimating the constraint threshold.

link (url) DOI [BibTex]

Marco, A., Baumann, D., Khadiv, M., Hennig, P., Righetti, L., Trimpe, S. Robot Learning with Crash Constraints IEEE Robotics and Automation Letters, 6(2):1439-1446, IEEE, February 2021 (article)

link (url) DOI [BibTex]

Wireless Control for Smart Manufacturing: Recent Approaches and Open Challenges

Baumann, D., Mager, F., Wetzker, U., Thiele, L., Zimmerling, M., Trimpe, S.

Proceedings of the IEEE, 109(4):441-467, 2021 (article)

arXiv DOI [BibTex]

Baumann, D., Mager, F., Wetzker, U., Thiele, L., Zimmerling, M., Trimpe, S. Wireless Control for Smart Manufacturing: Recent Approaches and Open Challenges Proceedings of the IEEE, 109(4):441-467, 2021 (article)

arXiv DOI [BibTex]

Learning Event-triggered Control from Data through Joint Optimization

Funk, N., Baumann, D., Berenz, V., Trimpe, S.

IFAC Journal of Systems and Control, 16, pages: 100144, 2021 (article)

Abstract

We present a framework for model-free learning of event-triggered control strategies. Event-triggered methods aim to achieve high control performance while only closing the feedback loop when needed. This enables resource savings, e.g., network bandwidth if control commands are sent via communication networks, as in networked control systems. Event-triggered controllers consist of a communication policy, determining when to communicate, and a control policy, deciding what to communicate. It is essential to jointly optimize the two policies since individual optimization does not necessarily yield the overall optimal solution. To address this need for joint optimization, we propose a novel algorithm based on hierarchical reinforcement learning. The resulting algorithm is shown to accomplish high-performance control in line with resource savings and scales seamlessly to nonlinear and high-dimensional systems. The method’s applicability to real-world scenarios is demonstrated through experiments on a six degrees of freedom real-time controlled manipulator. Further, we propose an approach towards evaluating the stability of the learned neural network policies.

arXiv link (url) DOI [BibTex]

Funk, N., Baumann, D., Berenz, V., Trimpe, S. Learning Event-triggered Control from Data through Joint Optimization IFAC Journal of Systems and Control, 16, pages: 100144, 2021 (article)

arXiv link (url) DOI [BibTex]

Event-triggered Learning for Linear Quadratic Control

Schlüter, H., Solowjow, F., Trimpe, S.

IEEE Transactions on Automatic Control, 66(10):4485-4498, 2021 (article)

arXiv DOI [BibTex]

Schlüter, H., Solowjow, F., Trimpe, S. Event-triggered Learning for Linear Quadratic Control IEEE Transactions on Automatic Control, 66(10):4485-4498, 2021 (article)

arXiv DOI [BibTex]

Controller Design via Experimental Exploration With Robustness Guarantees

Holicki, T., Scherer, C. W., Trimpe, J. S.

IEEE Control Systems Letters, 5(2):641-646, 2021 (article)

DOI [BibTex]

Holicki, T., Scherer, C. W., Trimpe, J. S. Controller Design via Experimental Exploration With Robustness Guarantees IEEE Control Systems Letters, 5(2):641-646, 2021 (article)

DOI [BibTex]

2020

Learning and Control Strategies for Cyber-physical Systems: From Wireless Control over Deep Reinforcement Learning to Causal Identification

Baumann, D.

KTH Royal Institute of Technology, Stockholm, Sweden, December 2020 (phdthesis)

PDF link (url) [BibTex]

2020

Baumann, D. Learning and Control Strategies for Cyber-physical Systems: From Wireless Control over Deep Reinforcement Learning to Causal Identification KTH Royal Institute of Technology, Stockholm, Sweden, December 2020 (phdthesis)

PDF link (url) [BibTex]

A Learnable Safety Measure

Heim, S., Rohr, A. V., Trimpe, S., Badri-Spröwitz, A.

Proceedings of the Conference on Robot Learning, 100, pages: 627-639, Proceedings of Machine Learning Research, (Editors: Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei), PMLR, Conference on Robot Learning, October 2020 (article)

Arxiv [BibTex]

Heim, S., Rohr, A. V., Trimpe, S., Badri-Spröwitz, A. A Learnable Safety Measure Proceedings of the Conference on Robot Learning, 100, pages: 627-639, Proceedings of Machine Learning Research, (Editors: Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei), PMLR, Conference on Robot Learning, October 2020 (article)

Arxiv [BibTex]

A little damping goes a long way: a simulation study of how damping influences task-level stability in running

Heim, S., Millard, M., Le Mouel, C., Badri-Spröwitz, A.

Biology Letters, 16(9):20200467, September 2020 (article)

Abstract

It is currently unclear if damping plays a functional role in legged locomotion, and simple models often do not include damping terms. We present a new model with a damping term that is isolated from other parameters: that is, the damping term can be adjusted without retuning other model parameters for nominal motion. We systematically compare how increased damping affects stability in the face of unexpected ground-height perturbations. Unlike most studies, we focus on task-level stability: instead of observing whether trajectories converge towards a nominal limit-cycle, we quantify the ability to avoid falls using a recently developed mathematical measure. This measure allows trajectories to be compared quantitatively instead of only being separated into a binary classification of ‘stable' or ‘unstable'. Our simulation study shows that increased damping contributes significantly to task-level stability; however, this benefit quickly plateaus after only a small amount of damping. These results suggest that the low intrinsic damping values observed experimentally may have stability benefits and are not simply minimized for energetic reasons. All Python code and data needed to generate our results are available open source.

link (url) DOI Project Page [BibTex]

Heim, S., Millard, M., Le Mouel, C., Badri-Spröwitz, A. A little damping goes a long way: a simulation study of how damping influences task-level stability in running Biology Letters, 16(9):20200467, September 2020 (article)

link (url) DOI Project Page [BibTex]

Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods

Marco-Valle, A.

Eberhard Karls Universität Tübingen, Tübingen, July 2020 (phdthesis)

Abstract

The problem of designing controllers to regulate dynamical systems has been studied by engineers during the past millennia. Ever since, suboptimal performance lingers in many closed loops as an unavoidable side effect of manually tuning the parameters of the controllers. Nowadays, industrial settings remain skeptic about data-driven methods that allow one to automatically learn controller parameters. In the context of robotics, machine learning (ML) keeps growing its influence on increasing autonomy and adaptability, for example to aid automating controller tuning. However, data-hungry ML methods, such as standard reinforcement learning, require a large number of experimental samples, prohibitive in robotics, as hardware can deteriorate and break. This brings about the following question: Can manual controller tuning, in robotics, be automated by using data-efficient machine learning techniques? In this thesis, we tackle the question above by exploring Bayesian optimization (BO), a data-efficient ML framework, to buffer the human effort and side effects of manual controller tuning, while retaining a low number of experimental samples. We focus this work in the context of robotic systems, providing thorough theoretical results that aim to increase data-efficiency, as well as demonstrations in real robots. Specifically, we present four main contributions. We first consider using BO to replace manual tuning in robotic platforms. To this end, we parametrize the design weights of a linear quadratic regulator (LQR) and learn its parameters using an information-efficient BO algorithm. Such algorithm uses Gaussian processes (GPs) to model the unknown performance objective. The GP model is used by BO to suggest controller parameters that are expected to increment the information about the optimal parameters, measured as a gain in entropy. The resulting “automatic LQR tuning” framework is demonstrated on two robotic platforms: A robot arm balancing an inverted pole and a humanoid robot performing a squatting task. In both cases, an existing controller is automatically improved in a handful of experiments without human intervention. BO compensates for data scarcity by means of the GP, which is a probabilistic model that encodes prior assumptions about the unknown performance objective. Usually, incorrect or non-informed assumptions have negative consequences, such as higher number of robot experiments, poor tuning performance or reduced sample-efficiency. The second to fourth contributions presented herein attempt to alleviate this issue. The second contribution proposes to include the robot simulator into the learning loop as an additional information source for automatic controller tuning. While doing a real robot experiment generally entails high associated costs (e.g., require preparation and take time), simulations are cheaper to obtain (e.g., they can be computed faster). However, because the simulator is an imperfect model of the robot, its information is biased and could have negative repercussions in the learning performance. To address this problem, we propose “simu-vs-real”, a principled multi-fidelity BO algorithm that trades off cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. The resulting algorithm is demonstrated on a cart-pole system, where simulations and real experiments are alternated, thus sparing many real evaluations. The third contribution explores how to adequate the expressiveness of the probabilistic prior to the control problem at hand. To this end, the mathematical structure of LQR controllers is leveraged and embedded into the GP, by means of the kernel function. Specifically, we propose two different “LQR kernel” designs that retain the flexibility of Bayesian nonparametric learning. Simulated results indicate that the LQR kernel yields superior performance than non-informed kernel choices when used for controller learning with BO. Finally, the fourth contribution specifically addresses the problem of handling controller failures, which are typically unavoidable in practice while learning from data, specially if non-conservative solutions are expected. Although controller failures are generally problematic (e.g., the robot has to be emergency-stopped), they are also a rich information source about what should be avoided. We propose “failures-aware excursion search”, a novel algorithm for Bayesian optimization under black-box constraints, where failures are limited in number. Our results in numerical benchmarks indicate that by allowing a confined number of failures, better optima are revealed as compared with state-of-the-art methods. The first contribution of this thesis, “automatic LQR tuning”, lies among the first on applying BO to real robots. While it demonstrated automatic controller learning from few experimental samples, it also revealed several important challenges, such as the need of higher sample-efficiency, which opened relevant research directions that we addressed through several methodological contributions. Summarizing, we proposed “simu-vs-real”, a novel BO algorithm that includes the simulator as an additional information source, an “LQR kernel” design that learns faster than standard choices and “failures-aware excursion search”, a new BO algorithm for constrained black-box optimization problems, where the number of failures is limited.

Repository (Universitätsbibliothek) - University of Tübingen PDF DOI [BibTex]