s = EGreedyLinearStrategy()plt.plot([s._epsilon_update() for _ inrange(50000)])plt.title('Epsilon-Greedy linearly decaying epsilon value')plt.xticks(rotation=45)plt.show()
s = EGreedyExpStrategy()plt.plot([s._epsilon_update() for _ inrange(50000)])plt.title('Epsilon-Greedy exponentially decaying epsilon value')plt.xticks(rotation=45)plt.show()
s = SoftMaxStrategy()plt.plot([s._update_temp() for _ inrange(50000)])plt.title('SoftMax linearly decaying temperature value')plt.xticks(rotation=45)plt.show()
el 00:00:01, ep 0000, ts 000016, ar 10 016.0±000.0, 100 016.0±000.0, ex 100 0.4±0.0, ev 019.0±000.0
el 00:01:02, ep 0167, ts 016307, ar 10 244.5±071.2, 100 139.5±083.8, ex 100 0.3±0.1, ev 298.9±099.1
el 00:02:03, ep 0209, ts 034129, ar 10 454.2±106.8, 100 281.0±159.7, ex 100 0.2±0.1, ev 384.8±117.8
el 00:03:03, ep 0245, ts 050529, ar 10 440.8±103.7, 100 388.3±147.7, ex 100 0.2±0.0, ev 458.1±085.3
el 00:03:24, ep 0257, ts 055793, ar 10 458.9±083.1, 100 419.3±130.8, ex 100 0.2±0.0, ev 477.7±068.2
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 175.96s training time, 220.58s wall-clock time.
el 00:00:00, ep 0000, ts 000034, ar 10 034.0±000.0, 100 034.0±000.0, ex 100 0.6±0.0, ev 008.0±000.0
el 00:01:00, ep 0161, ts 016773, ar 10 288.4±130.7, 100 149.3±118.1, ex 100 0.3±0.1, ev 290.7±113.8
el 00:02:01, ep 0212, ts 034365, ar 10 458.2±085.1, 100 286.4±135.4, ex 100 0.2±0.1, ev 384.4±119.0
el 00:03:03, ep 0248, ts 051154, ar 10 500.0±000.0, 100 381.8±131.7, ex 100 0.2±0.0, ev 449.1±090.9
el 00:03:35, ep 0267, ts 059611, ar 10 430.9±148.7, 100 410.9±127.3, ex 100 0.2±0.0, ev 475.5±064.4
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 187.33s training time, 231.06s wall-clock time.
el 00:00:00, ep 0000, ts 000012, ar 10 012.0±000.0, 100 012.0±000.0, ex 100 0.6±0.0, ev 010.0±000.0
el 00:01:00, ep 0161, ts 016947, ar 10 241.3±090.4, 100 151.3±104.1, ex 100 0.3±0.1, ev 284.7±092.7
el 00:02:00, ep 0211, ts 034385, ar 10 421.6±156.1, 100 285.7±140.3, ex 100 0.2±0.0, ev 365.2±120.5
el 00:03:02, ep 0247, ts 051165, ar 10 468.9±093.3, 100 373.4±150.3, ex 100 0.2±0.0, ev 432.6±111.0
el 00:03:40, ep 0268, ts 060955, ar 10 474.5±071.6, 100 423.5±128.1, ex 100 0.2±0.0, ev 478.3±064.0
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 192.58s training time, 236.86s wall-clock time.
el 00:00:00, ep 0000, ts 000012, ar 10 012.0±000.0, 100 012.0±000.0, ex 100 0.6±0.0, ev 012.0±000.0
el 00:01:01, ep 0179, ts 016714, ar 10 329.5±139.7, 100 142.5±105.5, ex 100 0.3±0.1, ev 250.7±097.5
el 00:02:02, ep 0219, ts 034188, ar 10 464.7±105.9, 100 283.3±173.5, ex 100 0.2±0.1, ev 368.3±131.1
el 00:03:04, ep 0254, ts 050476, ar 10 426.7±133.5, 100 398.1±149.5, ex 100 0.2±0.0, ev 441.3±098.8
el 00:03:36, ep 0271, ts 058449, ar 10 470.9±087.3, 100 446.0±113.1, ex 100 0.2±0.0, ev 475.3±066.1
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 488.33±51.75 in 190.46s training time, 232.66s wall-clock time.
el 00:00:00, ep 0000, ts 000044, ar 10 044.0±000.0, 100 044.0±000.0, ex 100 0.6±0.0, ev 009.0±000.0
el 00:01:00, ep 0158, ts 016269, ar 10 298.9±075.9, 100 148.5±117.7, ex 100 0.3±0.1, ev 301.5±114.7
el 00:02:00, ep 0218, ts 033489, ar 10 401.9±114.2, 100 271.9±115.6, ex 100 0.2±0.0, ev 346.2±101.7
el 00:03:00, ep 0293, ts 048735, ar 10 127.4±034.9, 100 234.6±150.4, ex 100 0.2±0.0, ev 287.6±157.1
el 00:04:01, ep 0383, ts 063261, ar 10 192.7±172.1, 100 158.0±126.9, ex 100 0.2±0.1, ev 213.0±157.0
el 00:05:02, ep 0495, ts 077925, ar 10 465.6±103.2, 100 103.8±163.3, ex 100 0.2±0.1, ev 150.3±202.5
el 00:06:03, ep 0530, ts 093857, ar 10 481.7±054.9, 100 237.5±225.7, ex 100 0.2±0.1, ev 284.0±234.1
el 00:07:04, ep 0565, ts 109795, ar 10 465.5±091.5, 100 386.5±184.9, ex 100 0.2±0.0, ev 448.1±145.2
el 00:07:15, ep 0571, ts 112795, ar 10 500.0±000.0, 100 414.6±163.3, ex 100 0.2±0.0, ev 475.4±102.0
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 499.98±0.14 in 384.84s training time, 451.81s wall-clock time.
el 00:00:00, ep 0000, ts 000016, ar 10 016.0±000.0, 100 016.0±000.0, ex 100 0.4±0.0, ev 019.0±000.0
el 00:01:00, ep 0144, ts 014010, ar 10 261.8±099.7, 100 127.3±107.8, ex 100 0.3±0.1, ev 307.9±117.8
el 00:02:01, ep 0193, ts 029087, ar 10 361.6±157.5, 100 254.7±116.1, ex 100 0.2±0.1, ev 383.2±100.0
el 00:03:02, ep 0229, ts 043657, ar 10 457.4±105.5, 100 338.5±128.9, ex 100 0.2±0.0, ev 431.9±090.2
el 00:04:03, ep 0260, ts 057328, ar 10 480.5±039.5, 100 391.2±131.8, ex 100 0.2±0.0, ev 462.7±073.3
el 00:04:26, ep 0272, ts 062544, ar 10 421.6±121.3, 100 405.1±132.6, ex 100 0.2±0.0, ev 476.1±060.2
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 236.34s training time, 281.77s wall-clock time.
el 00:00:00, ep 0000, ts 000034, ar 10 034.0±000.0, 100 034.0±000.0, ex 100 0.6±0.0, ev 008.0±000.0
el 00:01:00, ep 0141, ts 014154, ar 10 237.8±082.6, 100 130.4±103.6, ex 100 0.3±0.1, ev 289.9±112.8
el 00:02:01, ep 0201, ts 029248, ar 10 365.2±094.4, 100 239.8±093.4, ex 100 0.2±0.1, ev 347.4±109.3
el 00:03:03, ep 0233, ts 043959, ar 10 429.6±141.5, 100 317.2±139.5, ex 100 0.2±0.0, ev 392.3±115.7
el 00:04:03, ep 0264, ts 057512, ar 10 448.0±140.4, 100 386.8±143.5, ex 100 0.2±0.0, ev 452.9±092.4
el 00:04:25, ep 0275, ts 062370, ar 10 435.8±137.2, 100 410.1±139.4, ex 100 0.2±0.0, ev 475.8±067.8
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 236.32s training time, 280.46s wall-clock time.
el 00:00:00, ep 0000, ts 000012, ar 10 012.0±000.0, 100 012.0±000.0, ex 100 0.6±0.0, ev 010.0±000.0
el 00:01:01, ep 0152, ts 014310, ar 10 335.6±117.5, 100 129.8±111.8, ex 100 0.3±0.1, ev 271.8±098.0
el 00:02:02, ep 0201, ts 029513, ar 10 322.0±089.6, 100 249.8±125.9, ex 100 0.2±0.1, ev 363.1±107.5
el 00:03:02, ep 0233, ts 043967, ar 10 475.4±073.8, 100 351.5±117.4, ex 100 0.2±0.0, ev 427.4±090.7
el 00:04:04, ep 0264, ts 057819, ar 10 453.7±102.4, 100 392.4±125.3, ex 100 0.2±0.0, ev 458.2±074.3
el 00:04:38, ep 0290, ts 065214, ar 10 302.2±214.1, 100 391.6±158.9, ex 100 0.2±0.0, ev 476.8±067.6
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 247.09s training time, 295.15s wall-clock time.
el 00:00:00, ep 0000, ts 000012, ar 10 012.0±000.0, 100 012.0±000.0, ex 100 0.6±0.0, ev 012.0±000.0
el 00:01:00, ep 0156, ts 014229, ar 10 235.7±077.3, 100 127.2±091.2, ex 100 0.3±0.1, ev 252.6±100.7
el 00:02:01, ep 0196, ts 029542, ar 10 393.0±185.7, 100 256.4±150.6, ex 100 0.2±0.1, ev 368.9±122.4
el 00:03:01, ep 0227, ts 044187, ar 10 475.2±074.4, 100 359.0±151.3, ex 100 0.2±0.0, ev 437.7±098.9
el 00:03:40, ep 0246, ts 053294, ar 10 500.0±000.0, 100 414.2±130.2, ex 100 0.2±0.0, ev 476.5±062.5
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 197.22s training time, 236.05s wall-clock time.
el 00:00:00, ep 0000, ts 000044, ar 10 044.0±000.0, 100 044.0±000.0, ex 100 0.6±0.0, ev 009.0±000.0
el 00:01:00, ep 0167, ts 014030, ar 10 220.7±060.6, 100 123.7±100.3, ex 100 0.3±0.1, ev 254.3±081.7
el 00:02:01, ep 0217, ts 029624, ar 10 379.2±115.2, 100 249.9±130.2, ex 100 0.2±0.1, ev 328.4±116.2
el 00:03:02, ep 0249, ts 044346, ar 10 462.3±113.1, 100 340.8±145.3, ex 100 0.2±0.0, ev 403.1±116.0
el 00:04:04, ep 0280, ts 058662, ar 10 427.4±131.1, 100 412.4±134.3, ex 100 0.2±0.0, ev 467.7±076.9
el 00:04:14, ep 0285, ts 060832, ar 10 441.4±118.4, 100 424.1±126.4, ex 100 0.2±0.0, ev 475.2±066.7
--> reached_goal_mean_reward ✓
Training complete.
Final evaluation score 500.00±0.00 in 226.86s training time, 269.07s wall-clock time.