For industrial applications of the artificial intelligence of things, mechanical control usually affects the overall product output and production schedule. Recently, more and more engineers have applied the deep reinforcement learning method to mechanical control to improve the company's profit. However, the problem of deep reinforcement learning training stage is that overfitting often occurs, which results in accidental control and increases the risk of overcontrol. In order to address this problem, in this article, an expected advantage learning method is proposed for moderating the maximum value of expectation-based deep reinforcement learning for industrial applications. With the tanh softmax policy of the softmax function, we replace the sigmod function with the tanh function as the softmax function activation value. It makes it so that the proposed expectation-based method can successfully decrease the value overfitting in cognitive computing. In the experimental results, the performance of the Deep Q Network algorithm, advantage learning algorithm, and propose expected advantage learning method were evaluated in every episodes with the four criteria: the total score, total step, average score, and highest score. Comparing with the AL algorithm, the total score of the proposed expected advantage learning method is increased by 6% in the same number of trainings. This shows that the action probability distribution of the proposed expected advantage learning method has better performance than the traditional soft-max strategy for the optimal setting control of industrial applications.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Information Systems
- Computer Science Applications
- Electrical and Electronic Engineering