Study of Backward Q-learning Fuzzy Reinforcement Learning and EP-Like Particle Swarm Optimization and Their Control Applications

  • 王 英豪

Student thesis: Doctoral Thesis


In this dissertation we propose four different novel algorithms including backward Q-learning fuzzy Q-learning based on backward Q-learning (FQLBQ) a hybrid of genetic algorithm and fuzzy Sarsa learning (HGAFSL) and evolutionary programming like particle swarm optimization (EPSO) respectively First we focus on how to combine Q-learning with the Sarsa algorithm and presents a new method called backward Q-learning which can be implemented with the Sarsa algorithm or other reinforcement learning (RL) algorithm The backward Q-learning directly tunes the Q-values and then the Q-values will indirectly affect the action selection policy Second the backward Q-learning is utilized to integrate with the fuzzy Q-learning (FQL) The FQL is applied to tune and learn the consequence part of the fuzzy control system and the backward Q-learning is employed to enhance learning speed of FQL Third we offer HGAFSL to fast tune the consequent part of the fuzzy rules in order to overcome the conventional GA randomly chooses the crossover point When each individual estimates the fitness the FSL will simultaneously compute the Q-value of every gene The Q-value is regarded as the predicted information that can assist the GA in distinguishing the better or worse gene from an individual or population Hence the crossover operation will select multiple crossover point and multiple parents by roulette wheel selection (RWS) according to the Q-value instead of random choice Finally a fast color information setup based on EPSO for the manipulator control system is examined The first step for a manipulator to grasp and place color objects into the correct location is to correctly identify the RGB or the corresponding HSV (Hue Saturation Value) color model The commonly used method to determine the thresholds of HSV range is manual tuning but it is time-consuming to find the best boundary to segment the color image Therefore we propose a new method to learn color information which is executed by semi-automatic learning The watershed algorithm incorporates user interactions to segment the color image and obtain the target image Then the comparison between the target image and the original image is utilized to build a lookup table (LUT) of color information where three HSV thresholds are learned by EPSO methods The EPSO methods can not only rapidly learn the thresholds to segment a color image but can also jump out the local minimum
Date of Award2014 Jan 27
Original languageEnglish
SupervisorTzuu-Hseng S. Li (Supervisor)

Cite this