Automatically delay Active Job enqueues to after commit #51426

casperisfine · 2024-03-26T14:58:46Z

A fairly common mistake with Rails is to enqueue a job from inside a transaction, with a record as argumemnt, which then lead to a RecordNotFound error when picked up by the queue.

This is even one of the arguments advanced for job runners backed by the database such as solid_queue, delayed_job or good_job.

But relying on this is undesirable in my opinion as it makes the Active Job abstraction leaky, and if in the future you need to migrate to another backend or even just move the queue to a separate database, you may experience a lot of race conditions of the sort.

To resolve this problem globally, we can make Active Job optionally transaction aware, and automatically defer job queueing to after_commit.

rafaelfranca

I wonder if this should not require the user to always explicitly create the transaction.

I find the complexity to find the right transaction too much and all usage that I know of can easily build the transaction.

Something like:

Topic.transaction do |transaction|
  transaction.after_commit { called += 1 }
end

matthewd · 2024-03-26T17:52:32Z

@rafaelfranca I think job queueing is the base example of a use case where that doesn't work: we want to defer until any currently-in-flight transactions have committed, not set up a local transaction to wrap our work.

If the transaction was expected to be as local as in your example, the caller could just move the after_commit body to be physically after the transaction block.

rafaelfranca · 2024-03-26T18:00:33Z

Yeah, Jean explained to me

casperisfine · 2024-03-27T12:10:51Z

Alright, I implemented the Active Job integration, in the end I think it's not too invasive, but I can rework it if it's seen as too tightly coupled.

I think with a small bit of polish this could move to a proper PR instead of being a PoC.

@rafaelfranca @matthewd I'd appreciate some second-opinion on the naming and implementation, etc.

activejob/lib/active_job/enqueue_after_transaction_commit.rb

activerecord/lib/active_record/connection_adapters/abstract/transaction.rb

activerecord/lib/active_record/transactions.rb

matthewd · 2024-03-27T17:07:39Z

activerecord/lib/active_record/transactions.rb

+        elsif open_transactions.size == 1
+          open_transactions.first.after_commit(&block)
+        else
+          callback = count_down_callback(open_transactions.size, &block)


Possibly tying in with my comment above 😅 -- I'm not sure what I think should happen, but I think this will behave surprisingly if any of the transactions get rolled back (and thus count never reaches zero).

That's how I see it though. If any transaction is rolled back, we don't enqueue the job at all. I think that's what we want most of the time, no?

IMO no: while we can make a best effort at having the job run at a convenient time by introducing delay (waiting for concurrent transactions to get out the way), if perform_later didn't raise, we've claimed that we have successfully enqueued the job, so we have to follow through.

(Also having a potentially completely-unrelated DB transaction slow down the enqueuing is a defensible curiosity, having it's rollback result in your job being silently dropped seems more complicated.)

I understand that we'll probably need an escape hatch, but e.g.:

transaction do user = User.create! WelcomeEmailJob.perform_later(user) # something that fails and rollback the transaction end

If the transaction is rolled back, you don't want this job enqueued.

Now maybe perform_later should be left unchanged, and it's another API that should have this behavior.

But the goal here is to reduce the behavioral difference between queues that use the same DB as Active Record, hence are transaction aware, and those that aren't.

In the above example, if you are using SolidQueue / GoodJob etc, the perform_later is acked but will be dropped. So I think we're consistent here.

activerecord/lib/active_record/transactions.rb

bensheldon

As GoodJob maintainer, I think this is a good change. It's easy enough for me to add configuration to the adapter, and easy enough for an application developer to override that on ActiveJob::Base 👍🏻

activejob/lib/active_job/enqueue_after_transaction_commit.rb

bensheldon · 2024-04-08T15:03:32Z

@casperisfine to help with context:

I think that's probably not great to return nil (it would be nice to get back some info that job enqueue was deferred for this specific reason).
Usually the value that's needed by the application is the job_id or the provider_job_id so its status can be monitored (though this is fairly advanced usage, and one could use the explicit after-transaction lambdas if they need to do something subsequent; though that does make me wonder if some kind of ignore-config-and-enqueue-immediately-despite-any-transaction behavior would be necessary). If you got back the job, it could be checked later to see if it has a provider_job_id and check...
There is successfully_enqueued and enqueue_error attributes on the job: Communicate enqueue failures to callers of perform_later #41191 (though it's not widely used afaik; the log messages were broken for 2 years and no one noticed)

byroot · 2024-04-08T15:05:55Z

Yeah, I'm also thinking returning the job makes sense. I'm just a bit unclear on a few things, so I need to investigate all the consequences. And it's definitely a bit of a breaking change I didn't foresee.

Fix: rails#51426 (comment) `perform_later` is supposed to return the Job instance on success, and `false` on error. When the `enqueue` is automatically delayed, it's of course impossible to predict if the actual queueing will succeed, but for backward compatibility reasons, it's best to assume it will. If necessary, you can hold onto the job instance and check for `#successfully_enqueued?` after the transaction has completed.

rosa · 2024-04-14T10:13:23Z

I love this change, it's something we've discussed a few times at 37signals, to introduce in Active Job 🤩 Thanks a lot!

This is even one of the arguments advanced for job runners backed by the database such as solid_queue, delayed_job or good_job.

FWIW, I think the good argument for DB-backed active job backends is slightly more subtle than simply not caring about when you're enqueuing the job. The really useful case for enqueuing a job within a transaction and having it run after the transaction commits, taking advantage of jobs living in the same DB as your app records, is that you get transactional integrity, in the sense that the transaction is rolled back if enqueuing the job fails.

But relying on this is undesirable in my opinion as it makes the Active Job abstraction leaky, and if in the future you need to migrate to another backend or even just move the queue to a separate database, you may experience a lot of race conditions of the sort.

Totally, we have a note/warning about this in the Solid Queue README, but there are some valid cases where you might want to have this behaviour instead of some complicated, ad-hoc manual "rollback" of a committed transaction if enqueuing your job after_commit fails, which is something we have to do in some cases in our apps.

…commit) (#40) This anticipates the release of a new feature in Rails 7.2 introduced here: rails/rails#51426 In Rails 7.2, ActiveJob will default `enqueue_after_transaction_commit` to `true`, and we would like to explicitly set this to `false` (retaining transactional enqueues, since `delayed` is DB-backed). This PR goes a step further and also raises a `UnsafeEnqueueError` if, during enqueue, it encounters any job type that has overridden `enqueue_after_transaction_commit` to `true`. Worth noting that this exception may still be raised _after transaction commit_ (based on the way the lifecycle hooks are handled), so it is intended first and foremost as a guard during development & testing. As part of this change I also: - Added a new Appraisal for the latest Rails main build on GitHub. - Re-generated the rubocop todo file. There are changes we can pull out & autocorrect, but we can do them in separate PRs.

Ref: rails#26103 Ref: rails#51426 A fairly common mistake with Rails is to enqueue a job from inside a transaction, and a record as argumemnt, which then lead to a RecordNotFound error when picked up by the queue. This is even one of the arguments advanced for job runners backed by the database such as `solid_queue`, `delayed_job` or `good_job`. But relying on this is undesirable iin my opinion as it makes the Active Job abstraction leaky, and if in the future you need to migrate to another backend or even just move the queue to a separate database, you may experience a lot of race conditions of the sort. But more generally, being able to defer work to after the current transaction has been a missing feature of Active Record. Right now the only way to do it is from a model callback, and this forces moving things in Active Record models that sometimes are better done elsewhere. Even as a self-proclaimed "service object skeptic", I often wanted this capability over the last decade, and I'm sure it got asked or desired by many more people. Also there's some 3rd party gems adding this capability using monkey patches. It's not a reason to upstream the capability, but it's a proof that there is demand for it. Implementation wise, this proof of concept shows that it's not really hard to implement, even with nested multi-db transactions support. Co-Authored-By: Cristian Bica <cristian.bica@gmail.com>

Fix: rails#51426 (comment) `perform_later` is supposed to return the Job instance on success, and `false` on error. When the `enqueue` is automatically delayed, it's of course impossible to predict if the actual queueing will succeed, but for backward compatibility reasons, it's best to assume it will. If necessary, you can hold onto the job instance and check for `#successfully_enqueued?` after the transaction has completed.

urubatan · 2024-05-31T12:33:07Z

This is completely wrong, for example if I'm publishing a news story and part of that process is to enqueue the job that will deliver that story, the publishing transaction should fail if the job is not enqueued.

also, RecordNotFound will only happen for perform_now, job queue will only notice the job after transaction is committed (if talking about solid_queue, delayed_job or good_job), for redis backed ones might happen if the commit is slow and queue empty but for business reasons is better to retry a job than have one not enqueued when it needed to be.

rafaelfranca · 2024-05-31T16:54:12Z

This is completely wrong, for example if I'm publishing a news story and part of that process is to enqueue the job that will deliver that story, the publishing transaction should fail if the job is not enqueued.

You can still do that. This is a config that you disable if it doesn't work for you. You can even disable per-job

also, RecordNotFound will only happen for perform_now, job queue will only notice the job after transaction is committed (if talking about solid_queue, delayed_job or good_job), for redis backed ones might happen if the commit is slow and queue empty but for business reasons is better to retry a job than have one not enqueued when it needed to be.

You are right. This change is mostly to match the behavior of non-relational queues with relational queues. But it is more common than you think. Just Google about jobs and after_create on Google and you are going to see many issues or blog posts telling people to change to after_commit.

Example:

sidekiq/sidekiq#322
https://www.justinweiss.com/articles/a-couple-callback-gotchas-and-a-rails-5-fix/
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/88632

Now, let's talk about your feedback. You could have said "This doesn't work for me" or "That is not my experience" would be fine. But Saying "completely wrong" when a simple google search shows that it is you that got it wrong is just disrespectful. I got what you mean, and I'm sure you didn't tried to be disrespectful, but please try to avoid that.

rafaelfranca · 2024-05-31T16:59:33Z

Ah, you also have a point about the chance of the job not being enqueued, let's say if the redis server is down when enqueuing the job. This does happen, and it is worse than retying a job when RecordNotFound happens. Agree! Business critical jobs should probably no use this config, that is why we made a per-job config.

Now you could argue which is the better default, and we do believe enqueuing the job in after_commit is a better default because it is the most common case. This default doesn't suites everyone, and that is fine, that is why it is also a global config.

urubatan · 2024-05-31T17:40:34Z

Thanks for the feedback. Sorry if my last message came across the wrong way. I'll try to be more professional in the future.

Ref: rails#26103 Ref: rails#51426 A fairly common mistake with Rails is to enqueue a job from inside a transaction, and a record as argumemnt, which then lead to a RecordNotFound error when picked up by the queue. This is even one of the arguments advanced for job runners backed by the database such as `solid_queue`, `delayed_job` or `good_job`. But relying on this is undesirable iin my opinion as it makes the Active Job abstraction leaky, and if in the future you need to migrate to another backend or even just move the queue to a separate database, you may experience a lot of race conditions of the sort. But more generally, being able to defer work to after the current transaction has been a missing feature of Active Record. Right now the only way to do it is from a model callback, and this forces moving things in Active Record models that sometimes are better done elsewhere. Even as a self-proclaimed "service object skeptic", I often wanted this capability over the last decade, and I'm sure it got asked or desired by many more people. Also there's some 3rd party gems adding this capability using monkey patches. It's not a reason to upstream the capability, but it's a proof that there is demand for it. Implementation wise, this proof of concept shows that it's not really hard to implement, even with nested multi-db transactions support. Co-Authored-By: Cristian Bica <cristian.bica@gmail.com>

Fix: rails#51426 (comment) `perform_later` is supposed to return the Job instance on success, and `false` on error. When the `enqueue` is automatically delayed, it's of course impossible to predict if the actual queueing will succeed, but for backward compatibility reasons, it's best to assume it will. If necessary, you can hold onto the job instance and check for `#successfully_enqueued?` after the transaction has completed.

viralpraxis · 2024-08-13T18:20:55Z

So, it's still unclear to me if this feature replaces https://github.com/Envek/after_commit_everywhere ?

rails-bot bot added the activerecord label Mar 26, 2024

rafaelfranca reviewed Mar 26, 2024

View reviewed changes

casperisfine force-pushed the transaction-callbacks-2 branch from 2fc2e5e to 9e61d74 Compare March 27, 2024 11:40

rails-bot bot added activejob railties labels Mar 27, 2024

casperisfine force-pushed the transaction-callbacks-2 branch from 9e61d74 to e3676b1 Compare March 27, 2024 12:08

rails-bot bot added the docs label Mar 27, 2024

casperisfine force-pushed the transaction-callbacks-2 branch from e3676b1 to ba8c36b Compare March 27, 2024 12:10

cristianbica reviewed Mar 27, 2024

View reviewed changes

activejob/lib/active_job/enqueue_after_transaction_commit.rb Outdated Show resolved Hide resolved

casperisfine force-pushed the transaction-callbacks-2 branch 3 times, most recently from db0589d to 3f4ae43 Compare March 27, 2024 13:19