--- a +++ b/examples/legacy/train.arm.ipynb @@ -0,0 +1,188 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Learning how to move a human arm\n", + "\n", + "In this tutorial we will show how to train a basic biomechanical model using `keras-rl`.\n", + "\n", + "## Installation\n", + "\n", + "To make it work, follow the instructions in\n", + "https://github.com/stanfordnmbl/osim-rl#getting-started\n", + "i.e. run\n", + "\n", + " conda create -n opensim-rl -c kidzik opensim python=3.6.1\n", + " activate opensim-rl\n", + " pip install git+https://github.com/stanfordnmbl/osim-rl.git\n", + "\n", + "Then run\n", + "\n", + " pip install keras tensorflow keras-rl jupyter\n", + " git clone https://github.com/stanfordnmbl/osim-rl.git\n", + " cd osim-rl\n", + " \n", + "follow the instructions and once jupyter is installed and type\n", + "\n", + " jupyter notebook\n", + "\n", + "This should open the browser with jupyter. Navigate to this notebook, i.e. to the file `examples/train.arm.ipynb`.\n", + "\n", + "## Preparing the environment\n", + "\n", + "The following two blocks load necessary libraries and create a simulator environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import osim\n", + "import numpy as np\n", + "import sys\n", + "\n", + "# Keras libraries \n", + "from keras.optimizers import Adam\n", + "\n", + "import numpy as np\n", + "from helpers import *\n", + "\n", + "from rl.agents import DDPGAgent\n", + "from rl.memory import SequentialMemory\n", + "from rl.random import OrnsteinUhlenbeckProcess\n", + "\n", + "from keras.optimizers import RMSprop\n", + "\n", + "import argparse\n", + "import math" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load arm environment\n", + "from osim.env import Arm2DEnv\n", + "env = Arm2DEnv(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating the actor and the critic\n", + "\n", + "The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create networks for DDPG\n", + "# Next, we build a very simple model.\n", + "actor = policy_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 32)\n", + "print(actor.summary())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "qfunc = q_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 64)\n", + "print(qfunc[0].summary())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train the actor and the critic\n", + "\n", + "We will now run `keras-rl` implementation of the DDPG algorithm which trains both networks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set up the agent for training\n", + "memory = SequentialMemory(limit=100000, window_length=1)\n", + "random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.action_space.shape)\n", + "agent = DDPGAgent(nb_actions=env.action_space.shape[0], actor=actor, critic=qfunc[0], critic_action_input=qfunc[1],\n", + " memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,\n", + " random_process=random_process, gamma=.99, target_model_update=1e-3,\n", + " delta_clip=1.)\n", + "agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Okay, now it's time to learn something! We visualize the training here for show, but this\n", + "# slows down training quite a lot. You can always safely abort the training prematurely by\n", + "# stopping the notebook\n", + "agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)\n", + "# After training is done, we save the final weights.\n", + "# agent.save_weights(args.model, overwrite=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluate the results\n", + "Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter `nb_steps` in `agent.fit`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# agent.load_weights(args.model)\n", + "# Finally, evaluate our algorithm for 2 episodes.\n", + "agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}