[1]LI Zhihui,MA Ying,SHANG Zhigang,et al.Dynamic Reinforcement Learning Modeling and Strategy Evolution in Pigeon Sequential Decision-making[J].Journal of Zhengzhou University (Engineering Science),2026,47(XX):1-7.[doi:10.13705/j.issn.1671-6833.2025.05.026]
Copy
Journal of Zhengzhou University (Engineering Science)[ISSN
1671-6833/CN
41-1339/T] Volume:
47
Number of periods:
2026 XX
Page number:
1-7
Column:
Public date:
2026-09-10
- Title:
-
Dynamic Reinforcement Learning Modeling and Strategy Evolution in Pigeon Sequential Decision-making
- Author(s):
-
LI Zhihui1; 2 ; MA Ying1; 2 ; SHANG Zhigang1; 2 ; YANG Lifang1; 2; 3∗
-
1. School of Electrical and Inform ation Engineering, Zhengzhou University, Zhengzhou 450001, China; 2. Henan Key Laboratory of Brain Science and Brain Computer Interface Technology, Zhengzhou 450001, China; 3. The Affiliated Encephalopathy Hospital of Zhengzhou University, Zhumadian 463000, China
-
- Keywords:
-
sequential decision-making; reinforcement learning; pigeons; learning strategies; Model-Based; Model-Free
- CLC:
-
Q811. 211
- DOI:
-
10.13705/j.issn.1671-6833.2025.05.026
- Abstract:
-
To maximize future rewards, organisms must flexibly adjust their learning strategies within complex environments. To investigate how learning strategies dynamically evolve during sequential decision-making, we used pigeons—a model species with robust cognitive capabilities—in a two-step sequential decision-making task. Behavioral data were collected throughout the entire learning process, from initial exploration to proficient performance. We developed two dynamic reinforcement learning (RL) models: a reward prediction error-driven Model-Free (MF) model and a state-transition relationship-driven Model-Based (MB) model. Using experimental data, we fitted these models and systematically analyzed the dynamic changes in key learning parameters, including learning rate (reflecting the speed of new information acquisition), discount factor (indicating the valuation of future rewards), and the inverse temperature parameter (representing choice certainty). Model comparisons revealed that pigeons predominantly utilized an MB strategy in early learning stages, focusing on acquiring relationships between states to form accurate value representations. With accumulated experience, pigeons progressively shifted toward the MF strategy, directly utilizing established value predictions for decision-making. Furthermore, analysis of model parameters showed that the learning rate gradually decreased, while both discount factor and inverse temperature increased over the learning period. These changes indicate that pigeons progressively place greater emphasis on future rewards and decision certainty, illustrating a natural shift from environmental exploration to exploitation of acquired knowledge. This study not only elucidates the mechanisms underlying adaptive strategy adjustments in biological systems during sequential decision-making but also provides valuable biological insights for parameter optimization in artificial reinforcement learning models.