What is it about?
Imagine teaching a computer to invest the way you teach a child to ride a bike: let it try, cheer when it balances, and guide it away from spills. That trial-and-error approach is called reinforcement learning (RL), and in over 25 years it has travelled from university game labs to trading floors and robo-advisers. Our survey distils 167 research papers published between 1996 and 2022 to track that evolution. It begins with the first “Q-learning” experiments and ends with today’s deep-neural systems that watch hundreds of prices, news flashes, and order-book signals every second. To make the topic feel familiar, we translate the RL playbook into everyday investing terms: State = what an investor can see — price charts, headlines, market depth. Action = what the investor can do — buy, sell, or rebalance. Reward = why the investor acts — higher profit or steadier returns. We compare the many ways researchers have tweaked each ingredient and reveal where those tweaks shine—or fail—in real markets. Along the way, we spotlight standout results, common pitfalls like optimistic back-tests, and five open challenges that must be cracked before RL can safely manage large, real-money portfolios.
Featured Image
Photo by Chris Li on Unsplash
Why is it important?
First complete map of the field. Dozens of reviews cover niche areas like portfolio management or market-making; ours is the only study that threads every RL building-block—state, action, reward, environment, algorithm—across all finance use-cases and benchmarks 167 papers in one place. Actionable research agenda. We don’t just recount history; we surface several open challenges—from realistic market simulators to explainable trading policies—that must be cracked before RL can run billion-dollar portfolios, and we outline concrete next steps for researchers and practitioners. Bridging two worlds. By translating deep-learning jargon into plain investing language, the survey lets quants, data scientists, portfolio managers, and even curious regulators hold a common conversation about how (and when) AI should control real money.
Perspectives
I began this review as the literature-mapping stage of my PhD. After months of bouncing between GitHub repos, hedge-fund white papers and arXiv PDFs—each proclaiming “state-of-the-art” yet rarely citing one another—I realised the field lacked a single chart that both doctoral researchers and portfolio managers could navigate without a translator. What surprised me most while cataloguing the 167 studies was how many elegant algorithms faltered once we added a few basis points of slippage or removed a subtle look-ahead leak. That convinced me the next breakthroughs won’t come merely from deeper networks, but from cleaner experimental practice and richer market simulators that let ideas fail safely before they fail expensively. If this survey saves the next PhD student six months of scavenger hunting—and helps industry quants and academic ML teams tackle the open problems I flag in Section 8—then every late-night revision will have been worthwhile.
Nikolaos Pippas
University of Warwick
Read the Original
This page is a summary of: The Evolution of Reinforcement Learning in Quantitative Finance: A Survey, ACM Computing Surveys, May 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3733714.
You can read the full text:
Contributors
The following have contributed to this page







