GSoC 2026 Draft proposal : Making Rage Fully Observable (Metrics & Context Propagation) #231
ayushman1210
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Project: rage-rb/rage
Mentors: @cuneyter, @daz-codes
Contributor: Ayushman gupta
Project Size: ~175 hours
Difficulty: Medium
What I'm Proposing
Rage is great because it handles everything—HTTP, background jobs, WebSockets—in a single process using fibers. But as I’ve been digging into the source, I’ve found two big blind spots that make it tricky to run at scale:
ActiveSupport::CurrentAttributesare fiber-local, you lose everything (likeCurrent.user) when a background task moves to a new fiber. There’s currently no way to "pass the torch" of that context.My goal is to fix both. I want to make sure the context "travels" with the task and give developers real-time metrics on how hard the engine is working.
The Problems (And why they matter)
1. The "Vanishing Context" in Deferred Tasks
If I set
Current.userin a controller and then enqueue aSendEmailjob, the job starts with a clean slate. I’ve looked at howRage::Deferred::Context.buildworks (context.rb#L9)—it captures things like log tags, but it completely ignoresCurrentAttributes.This means you can't easily correlate background work with the request that triggered it, which is a massive pain for debugging and auditing.
2. We skip the low-level health checks
Right now,
Rage::Telemetryis all about traces (spans). It’s good for seeing what happened, but not how the system is feeling. We’re missing:My Implementation Plan
Phase 1: Context Propagation (The "Middleware Way")
Initially, I thought about hardcoding this into the core Deferred logic, but after looking at how Sidekiq handles this and talking with the mentors, a middleware-based approach is much cleaner for Rage. It keeps the core "happy path" lean for people who don't use
CurrentAttributes.I’ll build a built-in (but optional) middleware:
CurrentAttributes.subclassesand tuck them into the user_context.What the config would look like:
Phase 2: Building the Metrics Engine
I need to add actual metric types (Counters, Gauges, Histograms) to
Rage::Telemetry.The big challenge here is the "Socket Backlog." I noticed Rage already tracks
backlog_sizefor deferred jobs, but that’s just application-level. For the actual TCP socket backlog, I’m going to investigate how to pull those stats directly from the Iodine reactor (or the underlyingfacil.io).For Event Loop Lag, I’ll use a timer-based prober: schedule a task for 10ms from now, and see if it actually fires in 10ms or 15ms. That delta is our lag.
Phase 3 & 4: OTel and The Dashboard
Once the metrics are living in
Rage::Telemetry, I'll update theopentelemetry-instrumentation-ragegem to export them. The final piece is a "Ready to Roll" Grafana dashboard so users can see their Loop Lag, GC pressure, and Queue depths right out of the box.Tentative Roadmap
CurrentAttributespropagation.Why I'm doing this
I’ve been working with Ruby and fibers for a while now, and I really like Rage’s philosophy of staying lean. These changes aren't just "nice to have"—they're what make a framework go from "cool experiment" to "production-ready." I want to help bridge that gap.
Project Risks
CurrentAttributesmight be hard to serialize if someone is using a disk-based backend. I'll need to handle those gracefully or document the limits.facil.ioexposes its state.Thanks for reading! I'm looking forward to working with @cuneyter and @daz-codes to make Rage's observability rock-solid.
Beta Was this translation helpful? Give feedback.
All reactions