RLLib Lightweight On or Off Policy Reinforcement Learning Library http://mloss.orgUpdates and additions to RLLib Lightweight On or Off Policy Reinforcement Learning Library enFri, 25 Apr 2014 02:58:32 -0000RLLib Lightweight On or Off Policy Reinforcement Learning Library 2.0<html><h1>RLLib</h1> <h2>(C++ Template Library to Predict, Control, Learn Behaviors, and Represent Learnable Knowledge using On/Off Policy Reinforcement Learning)</h2> <p>RLLib is a lightweight C++ template library that implements <code>incremental</code>, <code>standard</code>, and <code>gradient temporal-difference</code> learning algorithms in Reinforcement Learning. It is an optimized library for robotic applications and embedded devices that operates under fast duty cycles (e.g., &lt; 30 ms). RLLib has been tested and evaluated on RoboCup 3D soccer simulation agents, physical NAO V4 humanoid robots, and Tiva C series launchpad microcontrollers to predict, control, learn behaviors, and represent learnable knowledge. The implementation of the RLLib library is inspired by the RLPark API, which is a library of temporal-difference learning algorithms written in Java. </p> <h2>Features</h2> <ul> <li> <strong>Off-policy prediction algorithms</strong>: </li> <li> <code>GTD(lambda)</code> </li> <li> <code>GQ(lambda)</code> </li> <li> <strong>Off-policy control algorithms</strong>:<br /> </li> <li> <code>Greedy-GQ(lambda)</code> </li> <li> <code>Softmax-GQ(lambda)</code> </li> <li> <code>Off-PAC (can be used in on-policy setting)</code> </li> <li> <strong>On-policy algorithms</strong>: </li> <li> <code>TD(lambda)</code> </li> <li> <code>TD(lambda)AlphaBound</code> </li> <li> <code>TD(lambda)True</code> </li> <li> <code>Sarsa(lambda)</code> </li> <li> <code>Sarsa(lambda)AlphaBound</code> </li> <li> <code>Sarsa(lambda)True</code> </li> <li> <code>Sarsa(lambda)Expected</code> </li> <li> <code>Actor-Critic (continuous actions, discrete actions, discounted reward settting, averaged reward settings, and so on)</code> </li> <li> <strong>Supervised learning algorithms</strong>: </li> <li> <code>Adaline</code> </li> <li> <code>IDBD</code> </li> <li> <code>KI</code> </li> <li> <code>SemiLinearIDBD</code> </li> <li> <code>Autostep</code> </li> <li> <strong>Policies</strong>: <code>Random</code> <code>RandomX%Bias</code> <code>Greedy</code> <code>Epsilon-greedy</code> <code>Boltzmann</code> <code>Normal</code> <code>Softmax</code> </li> <li> <strong>Dot product</strong>: An efficient implementation of the dot product for tile coding based feature representations (with culling traces). </li> <li> <strong>Benchmarking environments</strong>: <code>Mountain Car</code> <code>Mountain Car 3D</code> <code>Swinging Pendulum</code> <code>Continuous Grid World</code> <code>Bicycle</code> <code>Cart Pole</code> <code>Acrobot</code> <code>Non-Markov Pole Balancing</code> <code>Helicopter</code> </li> <li> <strong>Optimization</strong>: Optimized for very fast duty cycles (e.g., with culling traces, RLLib has been tested on <code>the Robocup 3D simulator agent</code>, and on <code>the NAO V4 (cognition thread)</code>). </li> <li> <strong>Usage</strong>: The algorithm usage is very much similar to RLPark, therefore, swift learning curve. </li> <li> <strong>Examples</strong>: There are a plethora of examples demonstrating on-policy and off-policy control experiments. </li> <li> <strong>Visualization</strong>: We provide a Qt4 based application to visualize benchmark problems.<br /> </li> </ul> <h2>Extension</h2> <p>Extension for Tiva C Series EK-TM4C123GXL LaunchPad, and Tiva C Series TM4C129 Connected LaunchPad microcontrollers. </p> <p>Tiva C series launchpad microcontrollers: <a href=""></a> </p> <h2>Demo</h2> <p><a href=""><img src="" alt="Off-PAC ContinuousGridworld"/></a> <a href=""><img src="" alt="AverageRewardActorCritic SwingPendulum (Continuous Actions)"/></a> </p> <h2>Usage</h2> <p>RLLib is a C++ template library. The header files are located in the <code>src</code> directly. You can simply include this directory from your projects, e.g., <code>-I./src</code>, to access the algorithms. </p> <p>To access the control algorithms: </p> <pre><code>#include "ControlAlgorithm.h" </code></pre><p>To access the predication algorithms: </p> <pre><code>#include "PredictorAlgorithm" </code></pre><p>To access the supervised learning algorithms: </p> <pre><code>#include "SupervisedAlgorithm.h" </code></pre><p>RLLib uses the namespace: </p> <pre><code>using namespace RLLib </code></pre> <h2>Testing</h2> <p>RLLib provides a flexible testing framework. Follow these steps to quickly write a test case. </p> <ul> <li> To access the testing framework: <code>#include "HeaderTest.h"</code> </li> </ul> <p>`<code></code>javascript </p> <h1>include "HeaderTest.h"</h1> <p>RLLIB_TEST(YourTest) </p> <p>class YourTest Test: public YourTestBase { public: YourTestTest() {} </p> <pre><code>virtual ~Test() {} void run(); </code></pre><p> private: void testYourMethod(); }; </p> <p>void YourTestBase::testYourMethod() {/*<em> Your test code </em>/} </p> <p>void YourTestBase::run() { testYourMethod(); } `<code></code> </p> <ul> <li> Add <code>YourTest</code> to the <code>test/test.cfg</code> file. </li> <li> You can use <code>@YourTest</code> to execute only <code>YourTest</code>. For example, if you need to execute only MountainCar test cases, use @MountainCarTest. </li> </ul> <h2>Test Configuration</h2> <p>The test cases are executed using: </p> <ul> <li><p>64-bit machines: </p> <ul> <li> ./configure_m64 </li> <li> make </li> <li> ./RLLibTest </li> </ul> </li> <li><p>32-bit machines: </p> <ul> <li> ./configure_m32 </li> <li> make </li> <li> ./RLLibTest </li> </ul> </li> <li><p>Debugging: </p> <ul> <li> ./configure_debug </li> <li> make </li> <li> ./RLLibTest </li> </ul> </li> </ul> <h2>Visualization</h2> <p>RLLib provides a <a href="">QT4.8</a> based Reinforcement Learning problems and algorithms visualization tool named <code>RLLibViz</code>. Currently RLLibViz visualizes following problems and algorithms: </p> <ul> <li><p>On-policy: </p> <ul> <li> SwingPendulum problem with continuous actions. We use AverageRewardActorCritic algorithm. </li> </ul> </li> <li><p>Off-policy: </p> <ul> <li> ContinuousGridworld and MountainCar problems with discrete actions. We use Off-PAC algorithm. </li> </ul> </li> <li><p>In order to run the visualization tool, you need to have QT4.8 installed in your system. </p> </li> <li><p>In order to install RLLibViz: <br /> <br /> </p> <ul> <li> Change directory to <code>visualization/RLLibViz</code> </li> <li> ./configure </li> <li> ./RLLibVizSwingPendulum </li> <li> ./RLLibVizContinuousGridworld </li> <li> ./RLLibVizMountainCar </li> </ul> </li> </ul> <pre><code> </code></pre> <h2>Documentation</h2> <ul> <li> <a href=""></a> </li> <li> <a href=""></a><br /> </li> </ul> <h2>Publication</h2> <p><a href="">Dynamic Role Assignment using General ValueFunctions</a> </p> <h2>Contact</h2> <p> Saminda Abeyruwan ( </p></html>Saminda AbeyruwanFri, 25 Apr 2014 02:58:32 -0000 policyon policyreinforcement learning librarystandard