Skip to content

This project explores how Reinforcement Learning (RL)—specifically, Deep Q-Learning—can be used to develop algorithmic trading strategies. We simulate a trading agent that learns to make buy, sell, and hold decisions based on historical price data and technical indicators.

License

Notifications You must be signed in to change notification settings

codebywiam/trading-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning for Stock Trading

Open In Colab

This project explores how Reinforcement Learning (RL)—specifically, Deep Q-Learning—can be used to develop algorithmic trading strategies. We simulate a trading agent that learns to make buy, sell, and hold decisions based on historical price data and technical indicators.

Techniques Used:

  • Feature Engineering (RSI, EMA, MACD, OBV, etc.)
  • Custom RL Environment (OpenAI Gym style)
  • Policy Gradient Agent (A2C from stable-baselines3)
  • Performance Evaluation via backtesting

What is Algorithmic Trading?

Algorithmic trading uses automated systems to make trading decisions based on quantitative signals. In this project:

  • We focus on equity trading, using the S&P 500 (^GSPC) index.
  • Trades are simulated using hourly price data over 2 years.
  • The goal is to maximize cumulative returns while learning from historical data.

What is Reinforcement Learning?

Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards or penalties.

Key Concepts:

  • Agent: The trader
  • Environment: The stock market (historical data)
  • Action Space: Buy, Sell, Hold
  • State: Technical indicators and price history
  • Reward: Profit/loss after each action

Deep Q-Learning (DQN)

Deep Q-Learning is an RL algorithm that approximates the Q-value function using deep neural networks. It helps the agent learn:

“What is the expected reward if I take this action from this state?”

Although this project uses A2C (Advantage Actor Critic) for simplicity, the architecture and environment are compatible with DQN or PPO as well.


Technical Indicators Explained

Technical indicators help quantify trends, volatility, and momentum in price data.

Indicator Type What it Shows
EMA (Exponential Moving Average) Trend Smoothed average of prices over 7, 14, 50, 200 steps
MACD (Moving Average Convergence Divergence) Trend/Momentum Difference between fast and slow EMAs to detect reversals
RSI (Relative Strength Index) Momentum Measures overbought/oversold conditions (0–100)
OBV (On-Balance Volume) Volume Cumulative volume indicator to show crowd interest
Bollinger Bands (BB) Volatility Upper/lower bands around a moving average showing price extremes

Project Structure

Section Purpose
`01_data_collection Downloads 2 years of S&P 500 data from Yahoo Finance
`02_feature_engineering Adds indicators using the ta library
`03_rl_environment Custom gym.Env simulating a trading environment
`04_agent_training Trains a policy-based RL agent (A2C)
`05_evaluation_and_backtesting Simulates agent trading and plots performance

Tools & Libraries

  • stable-baselines3 for RL algorithms (A2C, PPO, DQN)
  • yfinance for historical price data
  • ta for computing technical indicators
  • gym for environment simulation
  • Google Colab for easy execution

Installation

Run the rl_trading notebook in Google Colab for zero setup.

Install dependencies in each notebook:

!pip install yfinance ta stable-baselines3[extra] --quiet

Results & Discussion

Example for ^GSPC (S&P 500):

  • Start Datetime : ('2023-05-30 13:30:00+0000', tz='UTC')
  • End Datetime : ('2025-05-27 17:30:00+0000', tz='UTC')

alt text alt text

The cumulative reward plot shows promising learning progress, though there is room for improvement. The agent begins with modest initial losses, which is expected as it explores the trading environment. By step 500, it starts developing an effective strategy, driving cumulative rewards into positive territory. The most impressive gains occur between steps 1000-2000, where rewards climb steadily to reach approximately 2500, demonstrating the agent's ability to identify and exploit profitable trading opportunities. While some volatility emerges in later stages (2000-3000 steps), the overall trend remains positive, suggesting the core strategy is sound. The minor pullbacks likely represent normal market fluctuations or temporary exploration phases rather than systemic failures. With additional training and minor parameter adjustments - such as refining the reward function or balancing exploration - the agent could potentially smooth out these fluctuations while maintaining its profitable trajectory. This performance indicates a successful learning process with a strategy that, while not perfect, shows clear potential for consistent profitability.

Improvements & Future Work

  • Implementing more sophisticated RL algorithms (e.g., A2C, PPO).
  • Adding more features or using alternative state representations (e.g., candlestick patterns as images for CNNs).
  • More rigorous hyperparameter optimization.
  • Portfolio optimization for multiple assets.
  • Live trading integration

License

This project is licensed under the MIT License. See the LICENSE file for details.

Disclaimer

This project is for educational purposes only and should not be considered financial advice. Trading financial markets involves substantial risk of loss.

About

This project explores how Reinforcement Learning (RL)—specifically, Deep Q-Learning—can be used to develop algorithmic trading strategies. We simulate a trading agent that learns to make buy, sell, and hold decisions based on historical price data and technical indicators.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published