RLPyhttp://mloss.orgUpdates and additions to RLPyenThu, 28 Aug 2014 14:34:35 -0000RLPy 1.3a<html><p>RLPy is a framework for performing sequential decision making experiments in Python. RLPy provides a fine-grained view of learning agents, breaking them into modular components and providing a library for each. Additionally, RLPy provides a wide variety of problem domains to test these agents - these are listed at bottom. </p> <p><strong>Parallelization:</strong> Easily scale experiments by running them in parallel on multiple cores of a single machine (user only needs to specify the number of cores) or on HTCondor computing cluster. </p> <p><strong>Hyperparameter Optimization:</strong> Built-in support for optimizing hyperparameters with state-of-the-art methods (using the <em>hyperopt</em> package). The user only needs to specify the parameters and their bounds. </p> <p><strong>Code Profiling:</strong> Easily identify performance bottlenecks of the code with built-in profiling support. A color-coded call graph of execution reveils slow functions. </p> <p><strong>Plotting:</strong> User specifies experimental configuration and number of runs for statistical significance. Then using tool, user need only specify quantities to appear on the graph; runs of the same configuration are automatically associated and averaged, and various configurations can be plotted simultaneously with confidence intervals. </p> <p><strong>Learning Agent Components:</strong> </p> <p>Value Function Representations: </p> <ul> <li> Bellman Error Basis Functions </li> <li> Fourier Basis Functions </li> <li> Incremental Feature Dependency Discovery (iFDD) </li> <li> Orthogonal Matching Pursuit - Temporal Difference (OMP-TD) </li> <li> Radial Basis Functions </li> <li> Tabular </li> <li> Tile Coding </li> </ul> <p>Exploration Policies: </p> <ul> <li> Epsilon-Greedy </li> <li> Fixed </li> <li> Gibbs </li> <li> Uniform Random </li> </ul> <p>Learning Algorithms: </p> <ul> <li> Greedy-GQ </li> <li> Least-Squares Policy Iteration </li> <li> Natural Actor-Critic </li> <li> Policy Iteration </li> <li> Q-Learning </li> <li> SARSA </li> <li> Trajectory-based Value Iteration </li> <li> Value Iteration </li> </ul> <p><strong>Problem Domains:</strong> </p> <ul> <li> Acrobot </li> <li> Bicycle Balancing </li> <li> BlocksWorld </li> <li> CartPole Balancing (2-state or 4-state) </li> <li> CartPole Swingup (2-state or 4-state) </li> <li> Fifty-State ChainMDP </li> <li> FlipBoard </li> <li> GridWorld </li> <li> HIV Treatment </li> <li> Helicopter Hovering </li> <li> Intruder Monitoring </li> <li> MountainCar </li> <li> MultiTrack </li> <li> Pac-Man </li> <li> Persistent Search and Track ("PST") </li> <li> Pinball </li> <li> PuddleWorld </li> <li> RC Car </li> <li> Swimmer </li> <li> System Administrator </li> </ul></html>Alborz Geramifard, Robert H. Klein, Christoph Dann, William Dabney, Jonathan P. HowThu, 28 Aug 2014 14:34:35 -0000 learning librarymodularparallelizable