TY - JOUR
T1 - Cognitive Optimal-Setting Control of AIoT Industrial Applications with Deep Reinforcement Learning
AU - Lai, Ying Hsun
AU - Wu, Tung Cheng
AU - Lai, Chin Feng
AU - Yang, Laurence Tianruo
AU - Zhou, Xiaokang
N1 - Funding Information:
Manuscript received December 15, 2019; revised March 13, 2020; accepted March 18, 2020. Date of publication April 20, 2020; date of current version November 20, 2020. This work was supported in part by the Ministry of Science and Technology of the Republic of China under contract number MOST 108-2511-H-143 -001. Paper no. TII-19-5300. (Corresponding author: Laurence Tianruo Yang.) Ying-Hsun Lai is with the Department of Computer Science and Information Engineering, National Taitung University, Taitung 95092, Taiwan (e-mail: yhlai@nttu.edu.tw).
PY - 2021/3
Y1 - 2021/3
N2 - For industrial applications of the artificial intelligence of things, mechanical control usually affects the overall product output and production schedule. Recently, more and more engineers have applied the deep reinforcement learning method to mechanical control to improve the company's profit. However, the problem of deep reinforcement learning training stage is that overfitting often occurs, which results in accidental control and increases the risk of overcontrol. In order to address this problem, in this article, an expected advantage learning method is proposed for moderating the maximum value of expectation-based deep reinforcement learning for industrial applications. With the tanh softmax policy of the softmax function, we replace the sigmod function with the tanh function as the softmax function activation value. It makes it so that the proposed expectation-based method can successfully decrease the value overfitting in cognitive computing. In the experimental results, the performance of the Deep Q Network algorithm, advantage learning algorithm, and propose expected advantage learning method were evaluated in every episodes with the four criteria: the total score, total step, average score, and highest score. Comparing with the AL algorithm, the total score of the proposed expected advantage learning method is increased by 6% in the same number of trainings. This shows that the action probability distribution of the proposed expected advantage learning method has better performance than the traditional soft-max strategy for the optimal setting control of industrial applications.
AB - For industrial applications of the artificial intelligence of things, mechanical control usually affects the overall product output and production schedule. Recently, more and more engineers have applied the deep reinforcement learning method to mechanical control to improve the company's profit. However, the problem of deep reinforcement learning training stage is that overfitting often occurs, which results in accidental control and increases the risk of overcontrol. In order to address this problem, in this article, an expected advantage learning method is proposed for moderating the maximum value of expectation-based deep reinforcement learning for industrial applications. With the tanh softmax policy of the softmax function, we replace the sigmod function with the tanh function as the softmax function activation value. It makes it so that the proposed expectation-based method can successfully decrease the value overfitting in cognitive computing. In the experimental results, the performance of the Deep Q Network algorithm, advantage learning algorithm, and propose expected advantage learning method were evaluated in every episodes with the four criteria: the total score, total step, average score, and highest score. Comparing with the AL algorithm, the total score of the proposed expected advantage learning method is increased by 6% in the same number of trainings. This shows that the action probability distribution of the proposed expected advantage learning method has better performance than the traditional soft-max strategy for the optimal setting control of industrial applications.
UR - http://www.scopus.com/inward/record.url?scp=85087909165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087909165&partnerID=8YFLogxK
U2 - 10.1109/TII.2020.2986501
DO - 10.1109/TII.2020.2986501
M3 - Article
AN - SCOPUS:85087909165
VL - 17
SP - 2116
EP - 2123
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
SN - 1551-3203
IS - 3
M1 - 9072609
ER -