TY - JOUR
T1 - Backward Q-learning
T2 - The combination of Sarsa algorithm and Q-learning
AU - Wang, Yin Hao
AU - Li, Tzuu Hseng S.
AU - Lin, Chih Jui
N1 - Funding Information:
This research was, in part, supported by Ministry of Education, Taiwan, ROC , The aim for the Top University Project to the National Cheng Kung University (NCKU). This work was also supported by the National Science Council of the Republic of China under Grant NSC101-2221-E-006-193-MY3 .
PY - 2013
Y1 - 2013
N2 - Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.
AB - Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.
UR - http://www.scopus.com/inward/record.url?scp=84888336880&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84888336880&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2013.06.016
DO - 10.1016/j.engappai.2013.06.016
M3 - Article
AN - SCOPUS:84888336880
SN - 0952-1976
VL - 26
SP - 2184
EP - 2193
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
IS - 9
ER -