osim-rl / Git / Diff of /examples/legacy/train.arm.ipynb

Models:
JoelW/
osim-rl
Downloads: 1
Diff of /examples/legacy/train.arm.ipynb [000000] .. [077a87]
Switch to side-by-side view

--- a
+++ b/examples/legacy/train.arm.ipynb
@@ -0,0 +1,188 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Learning how to move a human arm\n",
+    "\n",
+    "In this tutorial we will show how to train a basic biomechanical model using `keras-rl`.\n",
+    "\n",
+    "## Installation\n",
+    "\n",
+    "To make it work, follow the instructions in\n",
+    "https://github.com/stanfordnmbl/osim-rl#getting-started\n",
+    "i.e. run\n",
+    "\n",
+    "    conda create -n opensim-rl -c kidzik opensim python=3.6.1\n",
+    "    activate opensim-rl\n",
+    "    pip install git+https://github.com/stanfordnmbl/osim-rl.git\n",
+    "\n",
+    "Then run\n",
+    "\n",
+    "    pip install keras tensorflow keras-rl jupyter\n",
+    "    git clone https://github.com/stanfordnmbl/osim-rl.git\n",
+    "    cd osim-rl\n",
+    "    \n",
+    "follow the instructions and once jupyter is installed and type\n",
+    "\n",
+    "    jupyter notebook\n",
+    "\n",
+    "This should open the browser with jupyter. Navigate to this notebook, i.e. to the file `examples/train.arm.ipynb`.\n",
+    "\n",
+    "## Preparing the environment\n",
+    "\n",
+    "The following two blocks load necessary libraries and create a simulator environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import osim\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "\n",
+    "# Keras libraries \n",
+    "from keras.optimizers import Adam\n",
+    "\n",
+    "import numpy as np\n",
+    "from helpers import *\n",
+    "\n",
+    "from rl.agents import DDPGAgent\n",
+    "from rl.memory import SequentialMemory\n",
+    "from rl.random import OrnsteinUhlenbeckProcess\n",
+    "\n",
+    "from keras.optimizers import RMSprop\n",
+    "\n",
+    "import argparse\n",
+    "import math"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load arm environment\n",
+    "from osim.env import Arm2DEnv\n",
+    "env = Arm2DEnv(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating the actor and the critic\n",
+    "\n",
+    "The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create networks for DDPG\n",
+    "# Next, we build a very simple model.\n",
+    "actor = policy_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 32)\n",
+    "print(actor.summary())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "qfunc = q_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 64)\n",
+    "print(qfunc[0].summary())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the actor and the critic\n",
+    "\n",
+    "We will now run `keras-rl` implementation of the DDPG algorithm which trains both networks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Set up the agent for training\n",
+    "memory = SequentialMemory(limit=100000, window_length=1)\n",
+    "random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.action_space.shape)\n",
+    "agent = DDPGAgent(nb_actions=env.action_space.shape[0], actor=actor, critic=qfunc[0], critic_action_input=qfunc[1],\n",
+    "                  memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,\n",
+    "                  random_process=random_process, gamma=.99, target_model_update=1e-3,\n",
+    "                  delta_clip=1.)\n",
+    "agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Okay, now it's time to learn something! We visualize the training here for show, but this\n",
+    "# slows down training quite a lot. You can always safely abort the training prematurely by\n",
+    "# stopping the notebook\n",
+    "agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)\n",
+    "# After training is done, we save the final weights.\n",
+    "# agent.save_weights(args.model, overwrite=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Evaluate the results\n",
+    "Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter `nb_steps` in `agent.fit`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# agent.load_weights(args.model)\n",
+    "# Finally, evaluate our algorithm for 2 episodes.\n",
+    "agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}