189 lines (188 with data), 5.6 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Learning how to move a human arm\n",
"\n",
"In this tutorial we will show how to train a basic biomechanical model using `keras-rl`.\n",
"\n",
"## Installation\n",
"\n",
"To make it work, follow the instructions in\n",
"https://github.com/stanfordnmbl/osim-rl#getting-started\n",
"i.e. run\n",
"\n",
" conda create -n opensim-rl -c kidzik opensim python=3.6.1\n",
" activate opensim-rl\n",
" pip install git+https://github.com/stanfordnmbl/osim-rl.git\n",
"\n",
"Then run\n",
"\n",
" pip install keras tensorflow keras-rl jupyter\n",
" git clone https://github.com/stanfordnmbl/osim-rl.git\n",
" cd osim-rl\n",
" \n",
"follow the instructions and once jupyter is installed and type\n",
"\n",
" jupyter notebook\n",
"\n",
"This should open the browser with jupyter. Navigate to this notebook, i.e. to the file `examples/train.arm.ipynb`.\n",
"\n",
"## Preparing the environment\n",
"\n",
"The following two blocks load necessary libraries and create a simulator environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import osim\n",
"import numpy as np\n",
"import sys\n",
"\n",
"# Keras libraries \n",
"from keras.optimizers import Adam\n",
"\n",
"import numpy as np\n",
"from helpers import *\n",
"\n",
"from rl.agents import DDPGAgent\n",
"from rl.memory import SequentialMemory\n",
"from rl.random import OrnsteinUhlenbeckProcess\n",
"\n",
"from keras.optimizers import RMSprop\n",
"\n",
"import argparse\n",
"import math"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load arm environment\n",
"from osim.env import Arm2DEnv\n",
"env = Arm2DEnv(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating the actor and the critic\n",
"\n",
"The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create networks for DDPG\n",
"# Next, we build a very simple model.\n",
"actor = policy_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 32)\n",
"print(actor.summary())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"qfunc = q_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 64)\n",
"print(qfunc[0].summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the actor and the critic\n",
"\n",
"We will now run `keras-rl` implementation of the DDPG algorithm which trains both networks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set up the agent for training\n",
"memory = SequentialMemory(limit=100000, window_length=1)\n",
"random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.action_space.shape)\n",
"agent = DDPGAgent(nb_actions=env.action_space.shape[0], actor=actor, critic=qfunc[0], critic_action_input=qfunc[1],\n",
" memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,\n",
" random_process=random_process, gamma=.99, target_model_update=1e-3,\n",
" delta_clip=1.)\n",
"agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Okay, now it's time to learn something! We visualize the training here for show, but this\n",
"# slows down training quite a lot. You can always safely abort the training prematurely by\n",
"# stopping the notebook\n",
"agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)\n",
"# After training is done, we save the final weights.\n",
"# agent.save_weights(args.model, overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate the results\n",
"Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter `nb_steps` in `agent.fit`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# agent.load_weights(args.model)\n",
"# Finally, evaluate our algorithm for 2 episodes.\n",
"agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}