To quantify the effect of natural dynamics on the policy learning problem, we formalize our work using the concept of viability. Unlike traditional control, viability analysis starts by defining a set of failures instead of a target. The viable set (also known as the viability kernel) is then the set of all states which can remain inside the set, and therefore avoid failure for all time.
While finding the viable does not shed any information on convergence or optimality, it also requires no definition of an objective, the reward function or even the policy parameterization. This allows us to begin quantifying robustness to failure for a system design prior to designing or learning the actual control policy.
We currently find that systems that are more robust to noise in action space are also more amenable to learning control policies, and allow more flexibility. This allows us to compare different designs of the mechanical system as well as low-level controllers (such as reflexes) [ ].