Skip navigation

Category Archives: Reinforcement Learning

This month’s focus is RRL Code, Model based RL Video, and Incomplete data structures.  The RRL Code is my Prolog implementation.  The Model based RL Video is Michael Littman’s video lectures from NIPS  2009, which available from video lectures dot net.  Finally, I read Chapter 15 on Incomplete Data Structures of the Sterling and Shapiro (1994) book.  Each of these topics have their own blog entry.

From video lectures I have started watching an An Introduction to Statistical Relational Learning by Dr. Lise Getoor and Policy Gradient Reinforcement Learning by Dr. Douglass Aberdeen.  From the NIPS 2009 conference, I watched Bootstrapping from Game Tree Search, and consequently the article.

Read More »

I recently watched on Video Lectures the Model based RL video tutorial by Michael Littman. The video tutorial was very good as I learned not only about model based RL, but also value based (Q-values) and model free RL.  The tutorial starts with the taxi cab problem.  The tutorial covers Markov Decision Process, Dynamic Bayes Net (DBN), Model Based Bayesian RL, and various generic solutions  and algorithms.  Michael Littman illustrates various implementations of the taxi cab problem.  One solution that caught my interest was the object representation, which reduced the number of states visited by the agent to 143 states.  In my opinion, this object representation is similar to the relational representation, which too demonstrated a reduction in the number of states visited by the learning agent.

According to Michael Littman, the value based RL is the most popular research interest. The video consists of two parts lasting under two hours total. You can find the tutorial here along with other reinforcement learning videos.

In my quest in discovering if temporal difference methods can be used in games, I discovered the article Learning to Play Games Using Temporal Difference Methods (Wiering, Patist, Mannen 2005).  The authors used TD methods and Neural Network for function approximation to evaluate Backgammon, Chess, and Draughts.  In addition, the paper demonstrated three methods of function valuation from either self play, learning from expert, and database of human master games.  In the end, the authors concluded that learning from an expert or self-play the agent was able reach its maximum evaluation function in the neural net compared to the agent that learned from observing games stored in a database.  In a way this article was a demonstration of Transfer Learning.  Finally, the authors were aware of other function approximation methodologies such as support vector machines and gradient descent.

My main focus this month has been on researching in topics in reinforcement learning, non-deterministic programming, agent implementations, and finally Prolog and LISP programming techniques.  First my article in Q-learning shares my recent insights into Q-learning.  My purpose to gain a better understanding of Q-learning as to implement it in my block’s world environment. Read my blog entry on Q-learning.

Next, I have been studying Prolog techniques.  For example, in the Sterling and Shapiro book, I was reading about non-determinism in Prolog as a programming technique.  The generate and test approach to logic programming allows the generation of solution X and it gets tested.  Read more my blog entry in non-deterministic programming.

Read More »

I spent time researching for Q-learning in Google Scholar looking for relevant articles. I found the technical report from Watkins and Dayan demonstrating the convergence of Q-Learning.  I was actually searching for the actual article containing the Q-learning algorithm.  So, I did a search for Watkins and found his website.  In the publication section it contains an electronic, though not original, copy of his thesis containing Q-learning.  Although I have read other articles using Q-learning, I wanted to read the original source on the subject because I wanted to learn the algorithm and implement it in a programming language such as LISP and Prolog or possibly in JAVA.

In addition, the AIMA book in the reinforcement learning chapter contains a Q-learning agent.  I will need some time to review the LISP implementation by Norvig.  I have not checked if the new AIMA java source code contains implementation of the reinforcement learning agents.

During this month I focused on three items – completing my blocks world planning agent with environment, transfer learning, and reinforcement learning.  First, as part on my ongoing understanding towards RRL, finally I completed the blocks world planning agent using SWI-Prolog v5.8.1.  The test_environment clause is currently set to move three blocks.  The planner agent uses a depth-first search to find the correct plan.  It takes nine steps to complete the operation.  My next step is to study [Sutton and Barto 1998].  The combination between Relational Learning (RL) or Inductive Logic Programming (ILP) and Reinforcement Learning (RL) was suggested by Kaebling and Sutton in 1997, which lead to the Relational Reinforcement Learning (RRL).

Read More »

The month of November was very productive.  I began the month reviewing the RRL paper.  From there I reviewed from [Luger and Stubblefield 1993] the blocks world and its Prolog version of the blocks world planner.  Next step was to take the algorithm in Chapter 2, Figure 2.14 from [Russell and Norvig 1995], and to create an agent frame work in Prolog.  Upon review of [Covington, Nute, and Vellino 1997], I was able to apply advanced prolog tips to the agent frame work.  My initial objective is to have a functional blocks world planner in an agent frame work.  The basic planner agent has been created and returns an action.  However, the planner agent needs to generate a plan and return each action from the plan.  This needs further development; in other words, a work in progress.  The final blocks world planner agent will be posted upon completion.

Read More »

In this month, I continued my studies in Relational Reinforcement Learning by reviewing the article Towards Informed Reinforcement Learning from the proceedings of the 2004 Machine Learning workshop of Relational Reinforcement Learning.   Basically the articles summarizes that an agent with limited information can find an optimal policy and can achieve a goal or goal states with limited information about its environment.  The experiments reported seems to suggest this type of exploration is possible.  According to Google Scholar search, there are 11 subsequent articles that reference this one.  In the RRL arena, my goal is to repeat the block’s world experiment as reported in Relational Reinforcement Learning article by Dzeroski, De Raedt, and Blockeel.

Read More »

This month marks my first year in utilizing wordpress to host my blog.  It has been a great journey so far and hopefully will be better this coming year.

The month of September has been busy.  Unfortunately I did not write any blog entries due to my busy schedule.  However, from statistical point of view, my blog had the second highest total of number of views (362 in total).  Also, I covered various topics this month with Probability and Statistics, Bayesian Inference, Reinforcement Learning, and finally a LISP review.

I started to review the Reinforcement Learning book (Sutton and Barto 1997) only reading Part 1 and part of Part 2.  I did download the lisp code associated with the book.  I ran the tic-tac-toe program in my test environment.

Read More »

During the month of August, I focused on Relational Reinforcement Learning, a field that combines Relational Learning and Reinforcement Learning fields.  Please read my blog entry for Relational Reinforcement Learning.  Afterwards, I have read a number of different articles regarding RRL learning from different authors, including the RRL workshop at the ICML 2004 conference.  My initial preference is the work performed and researched by Dr. Eduardo F. Morales.

I posted my review and comments regarding part I of the book The Art of Prolog (Sterling and Shapiro 1994).  There are very few books regarding Logic Programming in recent years.  Since the mid 1990s, much of the effort in Logic Programming has been with Inductive Logic Programming and Relational Learning.  The emerging field of Statistical Relational Learning has become a new field of research.  Read More »

Follow

Get every new post delivered to your Inbox.