Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Off2OnRL

Balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples from offline dataset.
PaperReview
ReinforcementLearning
Author

Chanseok Kang

Published

October 10, 2023