Potential based reward shaping for hierarchical reinforcement learning

Yang Gao; Francesca Toni

Potential based reward shaping for hierarchical reinforcement learning

Yang Gao, Francesca Toni

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS- MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.

Original language	English
Title of host publication	IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence
Publisher	AAAI Press
Pages	3504-3510
Number of pages	7
ISBN (Electronic)	978-1-57735-738-4
Publication status	Published - 25 Jul 2015

Access to Document

Cite this

@inproceedings{b75624925a944a80888a6b6bae381eaa,

title = "Potential based reward shaping for hierarchical reinforcement learning",

abstract = "Hierarchical Reinforcement Learning (HRL) outperforms many {\textquoteleft}flat{\textquoteright} Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS- MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.",

author = "Yang Gao and Francesca Toni",

year = "2015",

month = jul,

day = "25",

language = "English",

pages = "3504--3510",

booktitle = "IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence",

publisher = "AAAI Press",

}

TY - GEN

T1 - Potential based reward shaping for hierarchical reinforcement learning

AU - Gao, Yang

AU - Toni, Francesca

PY - 2015/7/25

Y1 - 2015/7/25

N2 - Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS- MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.

AB - Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS- MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.

M3 - Conference contribution

SP - 3504

EP - 3510

BT - IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence

PB - AAAI Press

ER -