--- a +++ b/docs/_posts/2017-10-02-osim-rl-1.4.1-released.markdown @@ -0,0 +1,17 @@ +--- +title: 'Version 1.4.1 released' +date: 2017-10-02 11:56:01 -0500 +author: kidzik +version: 1.4.1 +categories: [release] +--- + +After discussing the way the reward function is computed ([issue #43](https://github.com/stanfordnmbl/osim-rl/issues/43)), we decided to further update the environment. Uptill version 1.3, the reward received at every step was the total distance travelled from the starting point minus the ligament forces. As a result, the total reward was the cummulative sum of total distances over all steps (or discreet integral of position in time) minus the total sum of ligament forces. + +Since, this reward is unconventional in reinforcement learning, we updated the reward function at each step to the distance increment between the two steps minus the ligament forces. As a result, the total reward is the total distance travelled minus the ligament forces. + +In order to switch to the new environment you need to update the `osim-rl` scripts with the following command: + + pip install git+https://github.com/stanfordnmbl/osim-rl.git -U + +Note that this will change the order of magnitude of the total reward from ~1000 to ~10 (now measured in meters travelled). The change does not affect the API of observations and actions. Moreover the measures are strongly correlated and a good model in the old version should perform well in the current version. \ No newline at end of file