GSoC 2026 Draft proposal : Making Rage Fully Observable (Metrics & Context Propagation) #231

ayushman1210 · 2026-03-08T12:25:22Z

ayushman1210
Mar 8, 2026

Project: rage-rb/rage
Mentors: @cuneyter, @daz-codes
Contributor: Ayushman gupta
Project Size: ~175 hours
Difficulty: Medium

What I'm Proposing

Rage is great because it handles everything—HTTP, background jobs, WebSockets—in a single process using fibers. But as I’ve been digging into the source, I’ve found two big blind spots that make it tricky to run at scale:

Context just disappears: Since ActiveSupport::CurrentAttributes are fiber-local, you lose everything (like Current.user) when a background task moves to a new fiber. There’s currently no way to "pass the torch" of that context.
The engine is a black box: We can’t actually see how the reactor is doing. We don't know if the event loop is lagging or if the socket backlog is piling up until things actually break.

My goal is to fix both. I want to make sure the context "travels" with the task and give developers real-time metrics on how hard the engine is working.

The Problems (And why they matter)

1. The "Vanishing Context" in Deferred Tasks

If I set Current.user in a controller and then enqueue a SendEmail job, the job starts with a clean slate. I’ve looked at how Rage::Deferred::Context.build works (context.rb#L9)—it captures things like log tags, but it completely ignores CurrentAttributes.

This means you can't easily correlate background work with the request that triggered it, which is a massive pain for debugging and auditing.

2. We skip the low-level health checks

Right now, Rage::Telemetry is all about traces (spans). It’s good for seeing what happened, but not how the system is feeling. We’re missing:

Event Loop Lag: How long it takes the reactor to get around to a task.
Socket Backlog: How many connections are waiting at the front door.
GC Pressure: If Ruby is spending all its time cleaning up memory instead of running code.

My Implementation Plan

Phase 1: Context Propagation (The "Middleware Way")

Initially, I thought about hardcoding this into the core Deferred logic, but after looking at how Sidekiq handles this and talking with the mentors, a middleware-based approach is much cleaner for Rage. It keeps the core "happy path" lean for people who don't use CurrentAttributes.

I’ll build a built-in (but optional) middleware:

On Enqueue: It’ll snapshot all CurrentAttributes.subclasses and tuck them into the user_context.
On Perform: It’ll restore those attributes into the new fiber before the task actually runs.

What the config would look like:

Rage.configure do
  config.deferred.enqueue_middleware.use Rage::Deferred::Middleware::CurrentAttributes
  config.deferred.perform_middleware.use Rage::Deferred::Middleware::CurrentAttributes
end

Phase 2: Building the Metrics Engine

I need to add actual metric types (Counters, Gauges, Histograms) to Rage::Telemetry.

The big challenge here is the "Socket Backlog." I noticed Rage already tracks backlog_size for deferred jobs, but that’s just application-level. For the actual TCP socket backlog, I’m going to investigate how to pull those stats directly from the Iodine reactor (or the underlying facil.io).

For Event Loop Lag, I’ll use a timer-based prober: schedule a task for 10ms from now, and see if it actually fires in 10ms or 15ms. That delta is our lag.

Phase 3 & 4: OTel and The Dashboard

Once the metrics are living in Rage::Telemetry, I'll update the opentelemetry-instrumentation-rage gem to export them. The final piece is a "Ready to Roll" Grafana dashboard so users can see their Loop Lag, GC pressure, and Queue depths right out of the box.

Tentative Roadmap

Weeks 1-2 (Bonding): Deep dive into Iodine's internals and finalizing the middleware API.
Weeks 3-5 (Context): Implementing and testing the CurrentAttributes propagation.
Weeks 6-9 (Metrics): Building the telemetry extensions and the reactor probers (Lag/Backlog/GC).
Weeks 10-11 (OTel/Dashboard): Wiring everything up to OpenTelemetry and building the Grafana templates.
Week 12 (Polish): Docs, benchmarks, and clean-up.

Why I'm doing this

I’ve been working with Ruby and fibers for a while now, and I really like Rage’s philosophy of staying lean. These changes aren't just "nice to have"—they're what make a framework go from "cool experiment" to "production-ready." I want to help bridge that gap.

Project Risks

Serialization: Complex objects in CurrentAttributes might be hard to serialize if someone is using a disk-based backend. I'll need to handle those gracefully or document the limits.
Iodine Internals: Pulling socket backlog might require some C-level investigation into how facil.io exposes its state.

Thanks for reading! I'm looking forward to working with @cuneyter and @daz-codes to make Rage's observability rock-solid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2026 Draft proposal : Making Rage Fully Observable (Metrics & Context Propagation) #231

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

GSoC 2026 Draft proposal : Making Rage Fully Observable (Metrics & Context Propagation) #231

Uh oh!

ayushman1210 Mar 8, 2026

What I'm Proposing

The Problems (And why they matter)

1. The "Vanishing Context" in Deferred Tasks

2. We skip the low-level health checks

My Implementation Plan

Phase 1: Context Propagation (The "Middleware Way")

Phase 2: Building the Metrics Engine

Phase 3 & 4: OTel and The Dashboard

Tentative Roadmap

Why I'm doing this

Project Risks

Replies: 0 comments

ayushman1210
Mar 8, 2026