Abbeel, P. and Ng, A.Y. (2004). Apprenticeship learning
via inverse reinforcement learning. In Proceedings of
the twenty-first international conference on Machine
learning, 1.
Achiam, J., Held, D., Tamar, A., and Abbeel, P.
(2017). Constrained policy optimization. arXiv preprint
arXiv:1705.10528.
Altman, E. (1999). Constrained Markov decision processes, volume 7. CRC Press.
Bradford, E., Imsland, L., Zhang, D., and del
Rio Chanona, E.A. (2020). Stochastic data-driven
model predictive control using gaussian processes.
Computers & Chemical Engineering, 139, 106844.
Chowdhary, G., Liu, M., Grande, R., Walsh, T., How,
J., and Carin, L. (2014). Off-policy reinforcement
learning with gaussian processes. IEEE/CAA Journal
of Automatica Sinica, 1(3), 227–238.
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos,
F., Rudolph, L., and Madry, A. (2020). Implementation
matters in deep policy gradients: A case study on ppo
and trpo. arXiv preprint arXiv:2005.12729.
Ge, Y., Zhu, F., Ling, X., and Liu, Q. (2019). Safe qlearning method based on constrained markov decision
processes. IEEE Access, 7, 165007–165017.
Kelley, C.T. (1995). Iterative methods for linear and
nonlinear equations. SIAM.
Lee, J.M. and Lee, J.H. (2005). Approximate dynamic
programming-based approaches for input–output datadriven control of nonlinear processes. Automatica, 41(7),
1281–1288.
Lin, L.J. (1993). Reinforcement learning for robots using
neural networks. Technical report, Carnegie-Mellon
Univ Pittsburgh PA School of Computer Science.
Liu, Y., Ding, J., and Liu, X. (2019). IPO: Interiorpoint Policy Optimization under Constraints. URL
http://arxiv.org/abs/1910.09615.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013).
Playing atari with deep reinforcement learning. arXiv
preprint arXiv:1312.5602.
Ng, A.Y., Harada, D., and Russell, S. (1999). Policy
invariance under reward transformations: Theory and
application to reward shaping. In ICML, volume 99,
278–287.
Petsagkourakis, P., Sandoval, I.O., Bradford, E., Galvanin,
F., Zhang, D., and del Rio-Chanona, E.A. (2020a).
Chance constrained policy optimization for process control and optimization. arXiv preprint arXiv:2008.00030.
Petsagkourakis, P., Sandoval, I.O., Bradford, E., Zhang,
D., and del Rio-Chanona, E.A. (2020b). Reinforcement
learning for batch bioprocess optimization. Computers
& Chemical Engineering, 133, 106649.
Pyeatt, L.D., Howe, A.E., et al. (2001). Decision tree
function approximation in reinforcement learning. In
Proceedings of the third international symposium on
adaptive systems: evolutionary computation and probabilistic graphical models, volume 2, 70–77. Cuba.
Rafiei, M. and Ricardez-Sandoval, L.A. (2018). Stochastic
back-off approach for integration of design and control
under uncertainty. Industrial & Engineering Chemistry
Research, 57(12), 4351–4365.
Shin, J., Badgwell, T.A., Liu, K.H., and Lee, J.H. (2019).
Reinforcement learning–overview of recent progress and
implications for process control. Computers & Chemical
Engineering, 127, 282–294.
Singh, V. and Kodamana, H. (2020). Reinforcement learning based control of batch polymerisation processes.
IFAC-PapersOnLine, 53(1), 667–672.
Slowik, A. and Kwasnicka, H. (2020). Evolutionary algorithms and their applications to engineering problems.
Neural Computing and Applications, 1–17.
Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P.D.,
and Bhushan Gopaluni, R. (2019). Toward self-driving
processes: A deep reinforcement learning approach to
control. AIChE Journal, 65(10), e16689.
Tessler, C., Mankowitz, D.J., and Mannor, S. (2018).
Reward constrained policy optimization. arXiv preprint
arXiv:1805.11074.
Treloar, N.J., Fedorec, A.J., Ingalls, B., and Barnes, C.P.
(2020). Deep reinforcement learning for the control of
microbial co-cultures in bioreactors. PLOS Computational Biology, 16(4), e1007783.
Wang, Z., Li, H.X., and Chen, C. (2019). Incremental
reinforcement learning in continuous spaces via policy
relaxation and importance weighting. IEEE Transactions on Neural Networks and Learning Systems.
Watkins, C.J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3-4), 279–292