Reinforcement Learning for Stock Trading

This project explores how Reinforcement Learning (RL)—specifically, Deep Q-Learning—can be used to develop algorithmic trading strategies. We simulate a trading agent that learns to make buy, sell, and hold decisions based on historical price data and technical indicators.

Techniques Used:

Feature Engineering (RSI, EMA, MACD, OBV, etc.)
Custom RL Environment (OpenAI Gym style)
Policy Gradient Agent (A2C from stable-baselines3)
Performance Evaluation via backtesting

What is Algorithmic Trading?

Algorithmic trading uses automated systems to make trading decisions based on quantitative signals. In this project:

We focus on equity trading, using the S&P 500 (^GSPC) index.
Trades are simulated using hourly price data over 2 years.
The goal is to maximize cumulative returns while learning from historical data.

What is Reinforcement Learning?

Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards or penalties.

Key Concepts:

Agent: The trader
Environment: The stock market (historical data)
Action Space: Buy, Sell, Hold
State: Technical indicators and price history
Reward: Profit/loss after each action

Deep Q-Learning (DQN)

Deep Q-Learning is an RL algorithm that approximates the Q-value function using deep neural networks. It helps the agent learn:

“What is the expected reward if I take this action from this state?”

Although this project uses A2C (Advantage Actor Critic) for simplicity, the architecture and environment are compatible with DQN or PPO as well.

Technical Indicators Explained

Technical indicators help quantify trends, volatility, and momentum in price data.

Indicator	Type	What it Shows
EMA (Exponential Moving Average)	Trend	Smoothed average of prices over 7, 14, 50, 200 steps
MACD (Moving Average Convergence Divergence)	Trend/Momentum	Difference between fast and slow EMAs to detect reversals
RSI (Relative Strength Index)	Momentum	Measures overbought/oversold conditions (0–100)
OBV (On-Balance Volume)	Volume	Cumulative volume indicator to show crowd interest
Bollinger Bands (BB)	Volatility	Upper/lower bands around a moving average showing price extremes

Project Structure

Section	Purpose
`01_data_collection	Downloads 2 years of S&P 500 data from Yahoo Finance
`02_feature_engineering	Adds indicators using the `ta` library
`03_rl_environment	Custom `gym.Env` simulating a trading environment
`04_agent_training	Trains a policy-based RL agent (A2C)
`05_evaluation_and_backtesting	Simulates agent trading and plots performance

Tools & Libraries

stable-baselines3 for RL algorithms (A2C, PPO, DQN)
yfinance for historical price data
ta for computing technical indicators
gym for environment simulation
Google Colab for easy execution

Installation

Run the rl_trading notebook in Google Colab for zero setup.

Install dependencies in each notebook:

!pip install yfinance ta stable-baselines3[extra] --quiet

Results & Discussion

Example for ^GSPC (S&P 500):

Start Datetime : ('2023-05-30 13:30:00+0000', tz='UTC')
End Datetime : ('2025-05-27 17:30:00+0000', tz='UTC')

The cumulative reward plot shows promising learning progress, though there is room for improvement. The agent begins with modest initial losses, which is expected as it explores the trading environment. By step 500, it starts developing an effective strategy, driving cumulative rewards into positive territory. The most impressive gains occur between steps 1000-2000, where rewards climb steadily to reach approximately 2500, demonstrating the agent's ability to identify and exploit profitable trading opportunities. While some volatility emerges in later stages (2000-3000 steps), the overall trend remains positive, suggesting the core strategy is sound. The minor pullbacks likely represent normal market fluctuations or temporary exploration phases rather than systemic failures. With additional training and minor parameter adjustments - such as refining the reward function or balancing exploration - the agent could potentially smooth out these fluctuations while maintaining its profitable trajectory. This performance indicates a successful learning process with a strategy that, while not perfect, shows clear potential for consistent profitability.

Improvements & Future Work

Implementing more sophisticated RL algorithms (e.g., A2C, PPO).
Adding more features or using alternative state representations (e.g., candlestick patterns as images for CNNs).
More rigorous hyperparameter optimization.
Portfolio optimization for multiple assets.
Live trading integration

License

This project is licensed under the MIT License. See the LICENSE file for details.

Disclaimer

This project is for educational purposes only and should not be considered financial advice. Trading financial markets involves substantial risk of loss.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rl_trading.ipynb		rl_trading.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement Learning for Stock Trading

What is Algorithmic Trading?

What is Reinforcement Learning?

Key Concepts:

Deep Q-Learning (DQN)

Technical Indicators Explained

Project Structure

Tools & Libraries

Installation

Results & Discussion

Improvements & Future Work

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

codebywiam/trading-agent

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Stock Trading

What is Algorithmic Trading?

What is Reinforcement Learning?

Key Concepts:

Deep Q-Learning (DQN)

Technical Indicators Explained

Project Structure

Tools & Libraries

Installation

Results & Discussion

Improvements & Future Work

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages