Compliant robotics have seen successful applications in energy efficient locomotion and cyclic manipulation. However, exploitation of variable physical impedance for energy efficient sequential movements has not been extensively addressed. This work employs a hierarchical approach to encapsulate low-level optimal control for sub-movement generation into an outer loop of iterative policy improvement, thereby leveraging the benefits of both optimal control and reinforcement learning. The framework enables optimizing efficiency trade-off for minimal energy expenses in a model-free manner, by taking account of cost function weighting, variable impedance exploitation, and transition timing – which are associated with the skill of compliance. The effectiveness of the proposed method is evaluated using two consecutive reaching tasks on a variable impedance actuator. The results demonstrate significant energy saving by improving the skill of compliance, with an electrical consumption reduction of about 30% measured in a physical robot experiment.