Quantum reinforcement learning during human decision-making | Nature Human Behaviour

Classical reinforcement learning (CRL) has been widely applied in neuroscience and psychology; however, quantum reinforcement learning (QRL), which shows superior performance in computer simulations, has never been empirically tested on human decision-making. Moreover, all current successful quantum models for human cognition lack connections to neuroscience. Here we studied whether QRL can properly explain value-based decision-making. We compared 2 QRL and 12 CRL models by using behavioural and functional magnetic resonance imaging data from healthy and cigarette-smoking subjects performing the Iowa Gambling Task. In all groups, the QRL models performed well when compared with the best CRL models and further revealed the representation of quantum-like internal-state-related variables in the medial frontal gyrus in both healthy subjects and smokers, suggesting that value-based decision-making can be illustrated by QRL at both the behavioural and neural levels.

We thank Y. Yang, R. Zha, J. Besumeyer and N. Ma for their inspirational comments. We thank L. Acerbi, G. R. Yang, C. Gneiting, A. Miranowicz, X. Li, Z. Jin and X. Li for their helpful suggestions. This work was supported by grants from the National Key Basic Research Programme (grant nos. 2016YFA0400900 and 2018YFC0831101), the National Natural Science Foundation of China (grant nos. 31471071, 31771221, 61773360, 71671115, 71874170 and 71942003), the Fundamental Research Funds for the Central Universities of China, the MURI Center for Dynamic Magneto-Optics via the Air Force Office of Scientific Research (AFOSR; grant no. FA9550-14-1-0040), the Army Research Office (ARO; grant no. W911NF-18-1-0358), the Asian Office of Aerospace Research and Development (AOARD; grant no. FA2386-18-1-4045), the Japan Science and Technology Agency (JST; via the Q-LEAP programme and CREST grant no. JPMJCR1676), the Japan Society for the Promotion of Science (JSPS; JSPS–RFBR grant no. 17-52-50023 and JSPS–FWO grant no. VS.059.18N), the RIKEN–AIST Challenge Research Fund, the Templeton Foundation, the Foundational Questions Institute (FQXi) and the NTT PHI Laboratory, the Australian Research Council’s Discovery Projects funding scheme under Project DP190101566, the Alexander von Humboldt Foundation and the US Office of Naval Research. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the Bioinformatics Centre of the University of Science and Technology of China, School of Life Science for providing supercomputing resources for this project.

L.J.-A., Y.P. and X.Z. conceived the study. Y.L. and Z.W. provided the devices and collected the data. L.J.-A. built the models. L.J.-A. and Z.W. analysed the data. All authors participated in discussions. L.J.-A., D.D., Y.P., F.N. and X.Z. wrote the paper. X.Z. supervised the project and acquired funding.