Diff of /docs/NIPS2017.md [000000] .. [077a87]

Switch to unified view

a b/docs/NIPS2017.md
1
# NIPS2017: Learning to run
2
3
This repository contains software required for participation in the NIPS 2017 Challenge: Learning to Run. See more details about the challenge [here](https://www.crowdai.org/challenges/nips-2017-learning-to-run). **Please read about the latest changes and the logistics of the second round [here](https://github.com/stanfordnmbl/osim-rl/tree/master/docs) (last update November 6th).**
4
5
In this competition, you are tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible. You are provided with a human musculoskeletal model and a physics-based simulation environment where you can synthesize physically and physiologically accurate motion. Potential obstacles include external obstacles like steps, or a slippery floor, along with internal obstacles like muscle weakness or motor noise. You are scored based on the distance you travel through the obstacle course in a set amount of time.
6
7
![HUMAN environment](https://github.com/kidzik/osim-rl/blob/master/demo/training.gif)
8
9
To model physics and biomechanics we use [OpenSim](https://github.com/opensim-org/opensim-core) - a biomechanical physics environment for musculoskeletal simulations.
10
11
## Getting started
12
13
**Anaconda** is required to run our simulations. Anaconda will create a virtual environment with all the necessary libraries, to avoid conflicts with libraries in your operating system. You can get anaconda from here https://www.continuum.io/downloads. In the following instructions we assume that Anaconda is successfully installed.
14
15
We support Windows, Linux, and Mac OSX (all in 64-bit). To install our simulator, you first need to create a conda environment with the OpenSim package.
16
17
On **Windows**, open a command prompt and type:
18
19
    conda create -n opensim-rl -c kidzik opensim git python=2.7
20
    activate opensim-rl
21
22
On **Linux/OSX**, run:
23
24
    conda create -n opensim-rl -c kidzik opensim git python=2.7
25
    source activate opensim-rl
26
27
These commands will create a virtual environment on your computer with the necessary simulation libraries installed. Next, you need to install our python reinforcement learning environment. Type (on all platforms):
28
29
    conda install -c conda-forge lapack git
30
    pip install git+https://github.com/stanfordnmbl/osim-rl.git
31
32
If the command `python -c "import opensim"` runs smoothly, you are done! Otherwise, please refer to our [FAQ](#frequently-asked-questions) section.
33
34
Note that `source activate opensim-rl` activates the anaconda virtual environment. You need to type it every time you open a new terminal.
35
36
## Basic usage
37
38
To execute 200 iterations of the simulation enter the `python` interpreter and run the following:
39
```python
40
from osim.env import RunEnv
41
42
env = RunEnv(visualize=True)
43
observation = env.reset(difficulty = 0)
44
for i in range(200):
45
    observation, reward, done, info = env.step(env.action_space.sample())
46
```
47
![Random walk](https://github.com/stanfordnmbl/osim-rl/blob/master/demo/random.gif)
48
49
The function `env.action_space.sample()` returns a random vector for muscle activations, so, in this example, muscles are activated randomly (red indicates an active muscle and blue an inactive muscle).  Clearly with this technique we won't go too far.
50
51
Your goal is to construct a controller, i.e. a function from the state space (current positions, velocities and accelerations of joints) to action space (muscle excitations), that will enable to model to travel as far as possible in a fixed amount of time. Suppose you trained a neural network mapping observations (the current state of the model) to actions (muscle excitations), i.e. you have a function `action = my_controller(observation)`, then
52
```python
53
# ...
54
total_reward = 0.0
55
for i in range(200):
56
    # make a step given by the controller and record the state and the reward
57
    observation, reward, done, info = env.step(my_controller(observation))
58
    total_reward += reward
59
    if done:
60
        break
61
62
# Your reward is
63
print("Total reward %f" % total_reward)
64
```
65
There are many ways to construct the function `my_controller(observation)`. We will show how to do it with a DDPG (Deep Deterministic Policy Gradients) algorithm, using `keras-rl`. If you already have experience with training reinforcement learning models, you can skip the next section and go to [evaluation](#evaluation).
66
67
## Training your first model
68
69
Below we present how to train a basic controller using [keras-rl](https://github.com/matthiasplappert/keras-rl). First you need to install extra packages:
70
71
    conda install keras -c conda-forge
72
    pip install git+https://github.com/matthiasplappert/keras-rl.git
73
    git clone http://github.com/stanfordnmbl/osim-rl.git
74
75
`keras-rl` is an excellent package compatible with [OpenAI](http://openai.com/), which allows you to quickly build your first models!
76
77
Go to the `scripts` subdirectory from this repository
78
79
    cd osim-rl/scripts
80
81
There are two scripts:
82
* `example.py` for training (and testing) an agent using the DDPG algorithm.
83
* `submit.py` for submitting the result to [crowdAI.org](https://www.crowdai.org/challenges/nips-2017-learning-to-run)
84
85
### Training
86
87
    python example.py --visualize --train --model sample
88
89
### Test
90
91
and for the gait example (walk as far as possible):
92
93
    python example.py --visualize --test --model sample
94
95
### Moving forward
96
97
Note that it will take a while to train this model. You can find many tutorials, frameworks and lessons on-line. We particularly recommend:
98
99
Tutorials & Courses on Reinforcement Learning:
100
* [Berkeley Deep RL course by Sergey Levine](http://rll.berkeley.edu/deeprlcourse/)
101
* [Intro to RL on Karpathy's blog](http://karpathy.github.io/2016/05/31/rl/)
102
* [Intro to RL by Tambet Matiisen](https://www.nervanasys.com/demystifying-deep-reinforcement-learning/)
103
* [Deep RL course of David Silver](https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLHOg3HfW_teiYiq8yndRVwQ95LLPVUDJe)
104
* [A comprehensive list of deep RL resources](https://github.com/dennybritz/reinforcement-learning)
105
106
Frameworks and implementations of algorithms:
107
* [RLLAB](https://github.com/openai/rllab)
108
* [modular_rl](https://github.com/joschu/modular_rl)
109
* [keras-rl](https://github.com/matthiasplappert/keras-rl)
110
111
OpenSim and Biomechanics:
112
* [OpenSim Documentation](http://simtk-confluence.stanford.edu:8080/display/OpenSim/OpenSim+Documentation)
113
* [Muscle models](http://simtk-confluence.stanford.edu:8080/display/OpenSim/First-Order+Activation+Dynamics)
114
* [Publication describing OpenSim](http://nmbl.stanford.edu/publications/pdf/Delp2007.pdf)
115
* [Publication describing Simbody (multibody dynamics engine)](http://ac.els-cdn.com/S2210983811000241/1-s2.0-S2210983811000241-main.pdf?_tid=c22ea7d2-50ba-11e7-9f69-00000aacb361&acdnat=1497415051_124f3094c7fec3c60165f5d544a184f4)
116
117
This list is *by no means* exhaustive. If you find some resources particularly well-fit for this tutorial, please let us know!
118
119
## Evaluation
120
121
Your task is to build a function `f` which takes the current state `observation` (a 41 dimensional vector) and returns the muscle excitations `action` (18 dimensional vector) in a way that maximizes the reward.
122
123
The trial ends either if the pelvis of the model goes below `0.65` meters or if you reach `1000` iterations (corresponding to `10` seconds in the virtual environment). Your total reward is the position of the pelvis on the `x` axis after the last iteration minus a penalty for using ligament forces. Ligaments are tissues which prevent your joints from bending too much - overusing these tissues leads to injuries, so we want to avoid it. The penalty in the total reward is equal to the sum of forces generated by ligaments over the trial, divided by `10,000,000`.
124
125
After each iteration you get a reward equal to the change of the `x` axis of pelvis during this iteration minus the magnitude of the ligament forces used in that iteration.
126
127
You can test your model on your local machine. For submission, you will need to interact with the remote environment: [crowdAI](https://www.crowdai.org/challenges/nips-2017-learning-to-run) sends you the current `observation` and you need to send back the action you take in the given state. You will be evaluated at three different levels of difficulty. For details, please refer to [Details of the environment](#details-of-the-environment).
128
129
### Submission
130
131
Assuming your controller is trained and is represented as a function `my_controller(observation)` returning an `action` you can submit it to [crowdAI](https://www.crowdai.org/challenges/nips-2017-learning-to-run) through interaction with an environment there:
132
133
```python
134
import opensim as osim
135
from osim.http.client import Client
136
from osim.env import RunEnv
137
138
# Settings
139
remote_base = "http://grader.crowdai.org:1729"
140
crowdai_token = "[YOUR_CROWD_AI_TOKEN_HERE]"
141
142
client = Client(remote_base)
143
144
# Create environment
145
observation = client.env_create(crowdai_token)
146
147
# IMPLEMENTATION OF YOUR CONTROLLER
148
# my_controller = ... (for example the one trained in keras_rl)
149
150
while True:
151
    [observation, reward, done, info] = client.env_step(my_controller(observation), True)
152
    print(observation)
153
    if done:
154
        observation = client.env_reset()
155
        if not observation:
156
            break
157
158
client.submit()
159
```
160
161
In the place of `[YOUR_CROWD_AI_TOKEN_HERE]` put your token from the profile page from [crowdai.org](http://crowdai.org/) website.
162
163
Note that during the submission, the environment will get restarted. Since the environment is stochastic, you will need to submit three trials -- this way we make sure that your model is robust.
164
165
### Rules
166
167
In order to avoid overfitting to the training environment, the top participants (those who obtained 15.0 points or more) will be asked to resubmit their solutions in the second round of the challenge. Environments in the second round will have the same structure but **10 obstacles** and different seeds. In each submission, there will be **10 simulation**. Each participant will have a limit of **3 submissions**. The final ranking will be based on the results from the second round.
168
169
Additional rules:
170
* You are not allowed to use external datasets (e.g., kinematics of people walking)
171
* Organizers reserve the right to modify challenge rules as required.
172
173
## Details of the environment
174
175
In order to create an environment, use:
176
```python
177
    from osim.env import RunEnv
178
179
    env = RunEnv(visualize = True)
180
```
181
Parameters:
182
183
* `visualize` - turn the visualizer on and off
184
185
### Methods of `RunEnv`
186
187
#### `reset(difficulty = 2, seed = None)`
188
189
Restart the enivironment with a given `difficulty` level and a `seed`.
190
191
* `difficulty` - `0` - no obstacles, `1` - 3 randomly positioned obstacles (balls fixed in the ground), `2` - same as `1` but also strength of the psoas muscles (the muscles that help bend the hip joint in the model) varies. The muscle strength is set to z * 100%, where z is a normal variable with the mean 1 and the standard deviation 0.1
192
* `seed` - starting seed for the random number generator. If the seed is `None`, generation from the previous seed is continued.
193
194
Your solution will be graded in the environment with `difficulty = 2`, yet it might be easier to train your model with `difficulty = 0` first and then retrain with a higher difficulty
195
196
#### `step(action)`
197
198
Make one iteration of the simulation.
199
200
* `action` - a list of length `18` of continuous values in `[0,1]` corresponding to excitation of muscles.
201
202
The function returns:
203
204
* `observation` - a list of length `41` of real values corresponding to the current state of the model. Variables are explained in the section "Physics of the model".
205
206
* `reward` - reward gained in the last iteration. The reward is computed as a change in position of the pelvis along the x axis minus the penalty for the use of ligaments. See the "Physics of the model" section for details.
207
208
* `done` - indicates if the move was the last step of the environment. This happens if either `1000` iterations were reached or the pelvis height is below `0.65` meters.
209
210
* `info` - for compatibility with OpenAI, currently not used.
211
212
### Physics and biomechanics of the model
213
214
The model is implemented in [OpenSim](https://github.com/opensim-org/opensim-core)[1], which relies on the [Simbody](https://github.com/simbody/simbody) physics engine. Note that, given recent successes in model-free reinforcement learning, expertise in biomechanics is not required to successfully compete in this challenge.
215
216
To summarize briefly, the agent is a musculoskeletal model that include body segments for each leg, a pelvis segment, and a single segment to represent the upper half of the body (trunk, head, arms). The segments are connected with joints (e.g., knee and hip) and the motion of these joints is controlled by the excitation of muscles. The muscles in the model have complex paths (e.g., muscles can cross more than one joint and there are redundant muscles). The muscle actuators themselves are also highly nonlinear. For example, there is a first order differential equation that relates electrical signal the nervous system sends to a muscle (the excitation) to the activation of a muscle (which describes how much force a muscle will actually generate given the muscle's current force-generating capacity). Given the musculoskeletal structure of bones, joint, and muscles, at each step of the simulation (corresponding to 0.01 seconds), the engine:
217
* computes activations of muscles from the excitations vector provided to the `step()` function,
218
* actuates muscles according to these activations,
219
* computes torques generated due to muscle activations,
220
* computes forces caused by contacting the ground,
221
* computes velocities and positions of joints and bodies,
222
* generates a new state based on forces, velocities, and positions of joints.
223
224
In each action, the following 18 muscles are actuated (9 per leg):
225
* hamstrings,
226
* biceps femoris,
227
* gluteus maximus,
228
* iliopsoas,
229
* rectus femoris,
230
* vastus,
231
* gastrocnemius,
232
* soleus,
233
* tibialis anterior.
234
The action vector corresponds to these muscles in the same order (9 muscles of the right leg first, then 9 muscles of the left leg).
235
236
The observation contains 41 values:
237
* position of the pelvis (rotation, x, y)
238
* velocity of the pelvis (rotation, x, y)
239
* rotation of each ankle, knee and hip (6 values)
240
* angular velocity of each ankle, knee and hip (6 values)
241
* position of the center of mass (2 values)
242
* velocity of the center of mass (2 values)
243
* positions (x, y) of head, pelvis, torso, left and right toes, left and right talus (14 values)
244
* strength of left and right psoas: 1 for `difficulty < 2`, otherwise a random normal variable with mean 1 and standard deviation 0.1 fixed for the entire simulation
245
* next obstacle: x distance from the pelvis, y position of the center relative to the the ground, radius.
246
247
For more details on the simulation framework, please refer to [1]. For more specific information about the muscles model we use, please refer to [2] or to [OpenSim documentation](ysimtk-confluence.stanford.edu:8080/display/OpenSim/Muscle+Model+Theory+and+Publications).
248
249
[1] Delp, Scott L., et al. *"OpenSim: open-source software to create and analyze dynamic simulations of movement."* IEEE transactions on biomedical engineering 54.11 (2007): 1940-1950.
250
251
[2] Thelen, D.G. *"Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults."* ASME Journal of Biomechanical Engineering 125 (2003): 70–77.
252
253
## Frequently Asked Questions
254
255
**I'm getting 'version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference' error**
256
257
If you are getting this error:
258
259
    ImportError: /opensim-rl/lib/python2.7/site-packages/opensim/libSimTKcommon.so.3.6:
260
      symbol _ZTVNSt7__cxx1119basic_istringstreamIcSt11char_traitsIcESaIcEEE, version
261
      GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference
262
263
Try `conda install libgcc`.
264
265
**Can I use languages other than python?**
266
267
Yes, you just need to set up your own python grader and interact with it
268
https://github.com/kidzik/osim-rl-grader. Find more details here [OpenAI http client](https://github.com/openai/gym-http-api)
269
270
**Do you have a docker container?**
271
272
Yes, you can use https://hub.docker.com/r/stanfordnmbl/opensim-rl/
273
Note, that connecting a display to a docker can be tricky and it's system dependent. Nevertheless, for training your models the display is not necessary -- the docker container can be handy for using multiple machines.
274
275
**Some libraries are missing. What is required to run the environment?**
276
277
Most of the libraries by default exist in major distributions of operating systems or are automatically downloaded by the conda environment. Yet, sometimes things are still missing. The minimal set of dependencies under Linux can be installed with
278
279
    sudo apt install libquadmath0 libglu1-mesa libglu1-mesa-dev libsm6 libxi-dev libxmu-dev liblapack-dev
280
281
Please, try to find equivalent libraries for your OS and let us know -- we will put them here.
282
283
**Why there are no energy constraints?**
284
285
Please refer to the issue https://github.com/stanfordnmbl/osim-rl/issues/34.
286
287
**I have some memory leaks, what can I do?**
288
289
Please refer to
290
https://github.com/stanfordnmbl/osim-rl/issues/10
291
and to
292
https://github.com/stanfordnmbl/osim-rl/issues/58
293
294
**I see only python3 environment for Linux. How to install Windows environment?**
295
296
Please refer to
297
https://github.com/stanfordnmbl/osim-rl/issues/29
298
299
**How to visualize observations when running simulations on the server?**
300
301
Please refer to
302
https://github.com/stanfordnmbl/osim-rl/issues/59
303
304
**I still have more questions, how can I contact you?**
305
306
For questions related to the challenge please use [the challenge forum](https://www.crowdai.org/challenges/nips-2017-learning-to-run/topics).
307
For issues and problems related to installation process or to the implementation of the simulation environment feel free to create an [issue on GitHub](https://github.com/stanfordnmbl/osim-rl/issues).
308
309
## Credits
310
311
This challenge would not be possible without:
312
* [OpenSim](https://github.com/opensim-org/opensim-core)
313
* [National Center for Simulation in Rehabilitation Research](http://opensim.stanford.edu/)
314
* [Mobilize Center](http://mobilize.stanford.edu/)
315
* [CrowdAI](http://crowdai.org/)
316
* [OpenAI gym](https://gym.openai.com/)
317
* [OpenAI http client](https://github.com/openai/gym-http-api)
318
* [keras-rl](https://github.com/matthiasplappert/keras-rl)
319
* and many other teams, individuals and projects