Abbeel, P. and Ng, A.Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, 1. Achiam, J., Held, D., Tamar, A., and Abbeel, P. (2017). Constrained policy optimization. arXiv preprint arXiv:1705.10528. Altman, E. (1999). Constrained Markov decision processes, volume 7. CRC Press. Bradford, E., Imsland, L., Zhang, D., and del Rio Chanona, E.A. (2020). Stochastic data-driven model predictive control using gaussian processes. Computers & Chemical Engineering, 139, 106844. Chowdhary, G., Liu, M., Grande, R., Walsh, T., How, J., and Carin, L. (2014). Off-policy reinforcement learning with gaussian processes. IEEE/CAA Journal of Automatica Sinica, 1(3), 227–238. Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., and Madry, A. (2020). Implementation matters in deep policy gradients: A case study on ppo and trpo. arXiv preprint arXiv:2005.12729. Ge, Y., Zhu, F., Ling, X., and Liu, Q. (2019). Safe qlearning method based on constrained markov decision processes. IEEE Access, 7, 165007–165017. Kelley, C.T. (1995). Iterative methods for linear and nonlinear equations. SIAM. Lee, J.M. and Lee, J.H. (2005). Approximate dynamic programming-based approaches for input–output datadriven control of nonlinear processes. Automatica, 41(7), 1281–1288. Lin, L.J. (1993). Reinforcement learning for robots using neural networks. Technical report, Carnegie-Mellon Univ Pittsburgh PA School of Computer Science. Liu, Y., Ding, J., and Liu, X. (2019). IPO: Interiorpoint Policy Optimization under Constraints. URL http://arxiv.org/abs/1910.09615. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Ng, A.Y., Harada, D., and Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, 278–287. Petsagkourakis, P., Sandoval, I.O., Bradford, E., Galvanin, F., Zhang, D., and del Rio-Chanona, E.A. (2020a). Chance constrained policy optimization for process control and optimization. arXiv preprint arXiv:2008.00030. Petsagkourakis, P., Sandoval, I.O., Bradford, E., Zhang, D., and del Rio-Chanona, E.A. (2020b). Reinforcement learning for batch bioprocess optimization. Computers & Chemical Engineering, 133, 106649. Pyeatt, L.D., Howe, A.E., et al. (2001). Decision tree function approximation in reinforcement learning. In Proceedings of the third international symposium on adaptive systems: evolutionary computation and probabilistic graphical models, volume 2, 70–77. Cuba. Rafiei, M. and Ricardez-Sandoval, L.A. (2018). Stochastic back-off approach for integration of design and control under uncertainty. Industrial & Engineering Chemistry Research, 57(12), 4351–4365. Shin, J., Badgwell, T.A., Liu, K.H., and Lee, J.H. (2019). Reinforcement learning–overview of recent progress and implications for process control. Computers & Chemical Engineering, 127, 282–294. Singh, V. and Kodamana, H. (2020). Reinforcement learning based control of batch polymerisation processes. IFAC-PapersOnLine, 53(1), 667–672. Slowik, A. and Kwasnicka, H. (2020). Evolutionary algorithms and their applications to engineering problems. Neural Computing and Applications, 1–17. Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P.D., and Bhushan Gopaluni, R. (2019). Toward self-driving processes: A deep reinforcement learning approach to control. AIChE Journal, 65(10), e16689. Tessler, C., Mankowitz, D.J., and Mannor, S. (2018). Reward constrained policy optimization. arXiv preprint arXiv:1805.11074. Treloar, N.J., Fedorec, A.J., Ingalls, B., and Barnes, C.P. (2020). Deep reinforcement learning for the control of microbial co-cultures in bioreactors. PLOS Computational Biology, 16(4), e1007783. Wang, Z., Li, H.X., and Chen, C. (2019). Incremental reinforcement learning in continuous spaces via policy relaxation and importance weighting. IEEE Transactions on Neural Networks and Learning Systems. Watkins, C.J. and Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), 279–292