Skip navigation

Category Archives: Machine Learning

This month’s focus is RRL Code, Model based RL Video, and Incomplete data structures.  The RRL Code is my Prolog implementation.  The Model based RL Video is Michael Littman’s video lectures from NIPS  2009, which available from video lectures dot net.  Finally, I read Chapter 15 on Incomplete Data Structures of the Sterling and Shapiro (1994) book.  Each of these topics have their own blog entry.

From video lectures I have started watching an An Introduction to Statistical Relational Learning by Dr. Lise Getoor and Policy Gradient Reinforcement Learning by Dr. Douglass Aberdeen.  From the NIPS 2009 conference, I watched Bootstrapping from Game Tree Search, and consequently the article.

Read More »

I recently watched on Video Lectures the Model based RL video tutorial by Michael Littman. The video tutorial was very good as I learned not only about model based RL, but also value based (Q-values) and model free RL.  The tutorial starts with the taxi cab problem.  The tutorial covers Markov Decision Process, Dynamic Bayes Net (DBN), Model Based Bayesian RL, and various generic solutions  and algorithms.  Michael Littman illustrates various implementations of the taxi cab problem.  One solution that caught my interest was the object representation, which reduced the number of states visited by the agent to 143 states.  In my opinion, this object representation is similar to the relational representation, which too demonstrated a reduction in the number of states visited by the learning agent.

According to Michael Littman, the value based RL is the most popular research interest. The video consists of two parts lasting under two hours total. You can find the tutorial here along with other reinforcement learning videos.

In the Relational Reinforcement Learning [Dzeroski et. al.1998 and Dzeroski et. al. 2001], I reviewed the prolog code segments.  By definition, the state is a list of grounded terms.  The pre/2 predicate is intended to show the predictions are met in a similar fashion as in the STRIPS implementation.  If the given state, the pre/2 produces the move/2 term.  The delta/3 predicate produces the next state given the current state and move term.  My objective is to take this prolog code segments and implement it into SWI-Prolog.

Read More »

In my quest in discovering if temporal difference methods can be used in games, I discovered the article Learning to Play Games Using Temporal Difference Methods (Wiering, Patist, Mannen 2005).  The authors used TD methods and Neural Network for function approximation to evaluate Backgammon, Chess, and Draughts.  In addition, the paper demonstrated three methods of function valuation from either self play, learning from expert, and database of human master games.  In the end, the authors concluded that learning from an expert or self-play the agent was able reach its maximum evaluation function in the neural net compared to the agent that learned from observing games stored in a database.  In a way this article was a demonstration of Transfer Learning.  Finally, the authors were aware of other function approximation methodologies such as support vector machines and gradient descent.

In this month I finished reading the thesis Using Patterns and Plans to Solve Problems and Control Search (Wilkins 1979).  This paper discusses PARADISE, a program that uses patterns, generates plans, and control search for taticially sharp middlegame positions in chess.  According to the author, PARADISE uses the blackboard architecture with 28 knowledge sources.  PARADISE uses knowledge to find either a gain in material or check mate.  Another objective of application is to use knowledge to control search by eliminating useless lines.  The author uses Reinfeld’s book for the source of the material to test PARADISE.  In the end PARADISE solved 97 percent of the positions.

Read More »

My main focus this month has been on researching in topics in reinforcement learning, non-deterministic programming, agent implementations, and finally Prolog and LISP programming techniques.  First my article in Q-learning shares my recent insights into Q-learning.  My purpose to gain a better understanding of Q-learning as to implement it in my block’s world environment. Read my blog entry on Q-learning.

Next, I have been studying Prolog techniques.  For example, in the Sterling and Shapiro book, I was reading about non-determinism in Prolog as a programming technique.  The generate and test approach to logic programming allows the generation of solution X and it gets tested.  Read more my blog entry in non-deterministic programming.

Read More »

As I continue my reading of the Art of Prolog, in Chapter 14, the authors focus on Non-deterministic Programming.  I was reading about non-deterministic programming in prolog as a programming technique.  The generate and test approach to logic programming allows the generation of solution X and it gets tested.  I used a similar technique in my knight tour’s implementation.  The authors implement a blocks world using a depth first search to find path solutions without regard on whether a path or state had been visited or not.  The solution has 20 steps to implement the initial state to the final state.  The authors offered additional Prolog clauses to guide the search by avoiding already visited states, which leads to a three step solution.

Read More »

I spent time researching for Q-learning in Google Scholar looking for relevant articles. I found the technical report from Watkins and Dayan demonstrating the convergence of Q-Learning.  I was actually searching for the actual article containing the Q-learning algorithm.  So, I did a search for Watkins and found his website.  In the publication section it contains an electronic, though not original, copy of his thesis containing Q-learning.  Although I have read other articles using Q-learning, I wanted to read the original source on the subject because I wanted to learn the algorithm and implement it in a programming language such as LISP and Prolog or possibly in JAVA.

In addition, the AIMA book in the reinforcement learning chapter contains a Q-learning agent.  I will need some time to review the LISP implementation by Norvig.  I have not checked if the new AIMA java source code contains implementation of the reinforcement learning agents.

Once again I am attempting to finish books such as The Art of Prolog.  Also, I am working with Introduction to Bayesian Analysis and Decision.   From Google scholar, I found an article on Temporal Relational Reinforcement Learning, which in turn lead to other articles of interest.

From the first book I completed Section 2 for which the author covers the core of Prolog programming language.  Techniques such as accumulators and tail recursion are discussed.  The section ends with programming style and debugging techniques.

Read More »

For this month, I continued to complete books that I started.  As a result, I continued to complete the book called An Introduction to Bayesian Inference and Decision.  I finished the Chapter 2 exercises and continued into Chapter 3, completing 18 of 60 exercises.  The material from Chapter 3 starts information about discrete random variables, expectation, variance, probability mass function, continuous distribution function, joint probability distributions, conditional probabilities and the Laws of Expectation.  The next section introduces terms such as prior probabilities, likelihood, and posterior probabilities as an introduction to bayesian inference.  Basically Bayes Theorem is mechanism to update probabilities when new information is available.  As additional information or samples are collected, then prior probability distributions are updated to generate revised probability distributions.

Read More »

Follow

Get every new post delivered to your Inbox.