Chen, H. and Allgšwer, F. (1998). A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica, 34(10), 1205Ð1217. Gros, S. and Zanon, M. (2019a). Towards safe reinforcement learning using nmpc and policy gradients: Part i-stochastic case. arXiv preprint arXiv:1906.04057. Gros, S. and Zanon, M. (2019b). Towards safe reinforcement learning using nmpc and policy gradients: Part ii-deterministic case. arXiv preprint arXiv:1906.04034. Gros, S., Zanon, M., and Bemporad, A. (2020). Safe reinforcement learning via projection on a safe set: How to achieve optimality? arXiv preprint arXiv:2004.00915. Kim, J.W., Park, B.J., Yoo, H., Oh, T.H., Lee, J.H., and Lee, J.M. (2020). A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system. Journal of Process Control, 87, 166Ð178. Lucia, S., Finkler, T., and Engell, S. (2013). Multi-stage nonlinear model predictive control applied to a semibatch polymerization reactor under uncertainty. Journal of Process Control, 23(9), 1306Ð1319. Mayne, D.Q., Kerrigan, E.C., Van Wyk, E., and Falugi, P. (2011). Tube-based robust nonlinear model predictive control. International Journal of Robust and Nonlinear Control, 21(11), 1341Ð1353. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438. Shin, J., Badgwell, T.A., Liu, K.H., and Lee, J.H. (2019). Reinforcement learningÐoverview of recent progress and implications for process control. Computers & Chemical Engineering, 127, 282Ð294. Spielberg, S.P.K., Gopaluni, R.B., and Loewen, P.D. (2017). Deep reinforcement learning approaches for process control. In 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP), 201Ð206. Srinivasan, B., Bonvin, D., Visser, E., and Palanki, S. (2003). Dynamic optimization of batch processes: Ii. role of measurements in handling uncertainty. Computers & chemical engineering, 27(1), 27Ð44. Sutton, R.S., Barto, A.G., and Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine, 12(2), 19Ð22. Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, 1057Ð1063. Szepesv‡ri, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1Ð103. Wabersich, K.P. and Zeilinger, M.N. (2018). Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning. arXiv preprint arXiv:1812.05506. Zanon, M. and Gros, S. (2019). Safe reinforcement learning using robust mpc. arXiv preprint arXiv:1906.04005.