migrate: re-run post-step callbacks on error #3

ViktorT-11 · 2025-09-25T00:56:55Z

This commit modifies the migration framework to re-attempt post-step callbacks if they error during a migration, and won't leave the database version as dirty.

Implemenation approach:
This is achieved by introducing the concept of a "post-step callback" migration version. Post-step callbacks are their corresponding SQL migration version offset by +1000000000.

During the execution of a post-step callback, the post-step callback migration version will be persisted as the database version. That way, if the post-step callback errors, the version for the database will be the post-step callback version on the next startup. The post-step callback will then be re-attempted before proceeding with the next SQL migration.

Alternative approach:
Another solution to the issue, would be to individually track if a post migration has been executed successfully or not in a separate version table for post migrations. I chose not to go for that approach as that would require individual implementation for every database backend type and therefore be a much larger change.

I'm open to change the approach though if reviewers do not prefer the approach implemented by this PR.

ffranr

Nice! Version offset approach seems very clean to me! Nothing jumps out as being unviable so far.

Can we add unit tests to this PR which exercise the changes you've made? Checks the version at failed callback. Check that callback is executed once more at failed migration etc.

General naming suggestion:

Instead of PostStepCallback, maybe we should consider using something shorter like task or hook. Just to reduce verbosity and simplify the naming as we add more to this functionality.

For example:

execTask(*Migration) vs executePostStepCallback
execTaskAtMigVersion(int) vs executePostStepCallbackForSQLMig
InTaskVersionRange(int) vs IsPostStepCallbackVersion
TaskVersionOffset vs PostStepCallbackOffset

If we already know that a task in this context is a callback which runs after a migration step, then we don't need to prefix everything with "PostStep".

We could even add a new tasks.go and tasks_test.go files for example where we can document this feature better and keep our changes minimal in migrate.go, util.go, and migrate_test.go.

migrate.go

ViktorT-11 · 2025-09-30T18:00:45Z

Thanks a lot for the review @ffranr 🙏!

Instead of PostStepCallback, maybe we should consider using something shorter like task

I definietly agree that the name PostStepCallback is confusing, so I added some additional commits which renames it to task/MigrationTask. Happy to address any feedback there if you still don't agree with my naming :).

I placed the main code for that struct/funcs in a new tasks.go function. I opted to not move the migration tests for the MigrationTask migration to a task_test.go file though, as I do believe it amkes more sense to keep all migration tests in the same file.

Can we add unit tests to this PR which exercise the changes you've made?

Good idea :). I added a new TestMigrationTaskError to the migrate_test.go file to address this!

ffranr

Thanks for the name changes, much easier to read now IMO!

I'm not sure about this: https://github.com/lightninglabs/migrate/pull/3/files#r2413777619

ffranr · 2025-10-08T12:25:54Z

migrate.go

+			return m.unlockErr(err)
+		}
+
+		curVersion, dirty, err = m.databaseDrv.Version()


In this file, I see several new calls like curVersion, dirty, err = m.databaseDrv.Version(). We don’t appear to check the dirty value for these new calls. Should we add something like this here and in similar spots?

if dirty { return m.unlockErr(ErrDirty{curVersion}) }

If not, we should replace the unused return values with _ and add a brief comment explaining why it’s safe to ignore them.

I wander if we shouldn't extract the whole if InTaskVersionRange(curVersion) { block into a new function since it's repeated ~5 times. Maybe we can call that function maybeExecTask or something like that.

I broke ut much of the repeated logic + a little more into a new ensureCleanCurrentSQLVersion to address this feedback :). Hope you like it!

ffranr · 2025-10-08T12:29:29Z

migrate.go

+	// If the current version is a clean migration task version, then we
+	// need to rerun the task for the previous version before we can
+	// continue with any SQL migration(s).


Maybe we can add more to this comment.

// If the current version is a clean migration task version, then we // need to rerun the task before we can // continue with any SQL migration(s). This is because...

because the migration process ended cleanly but the task did not complete? We know that the task didn't complete because the version is a task version and not an SQL migrate version.

Is that correct?

Addressed.

We know that the task didn't complete because the version is a task version and not an SQL migrate version.

Correct, and specifically also that the database is set to a clean state + a migration task version. That can only ever happen if the migration task was run but returned an error.

ffranr · 2025-10-08T12:34:41Z

migrate.go

-						"finished for %v\n", migr.LogString())
+				err := m.execTask(migr)
+				if err != nil {
+					return err


consider adding context message to this err using fmt.Errorf.

ffranr · 2025-10-08T12:38:06Z

migrate.go

+	task, ok := m.opts.tasks[migr.Version]
+	if ok {


I think we should return early here if !ok to save on indent. Maybe something like:

task, ok := m.opts.tasks[migr.Version] if !ok { m.logVerbosePrintf("No migration task found for %v\n", migr.LogString()) return nil }

Good idea, thanks 🙏! Implemented :)

ffranr · 2025-10-08T12:51:36Z

migrate.go

+
+		// Persist that we are in the migration task phase for this
+		// version.
+		if err := m.databaseDrv.SetVersion(taskVersion, true); err != nil {


this line and another below are over our line limit. I think the CI isn't configured to run the project's linter right now?

I think we can run the linter as currently setup for this project using command:

docker run --rm -v "$(pwd)":/app -w /app golangci/golangci-lint:v1.64.8 golangci-lint run --config .golangci.yml ./...

Doesn't seem to complain about line length, but gives output:

migrate.go:255:15: ineffectual assignment to dirty (ineffassign) curVersion, dirty, err = m.databaseDrv.Version() ^ migrate.go:298:15: ineffectual assignment to dirty (ineffassign) curVersion, dirty, err = m.databaseDrv.Version() ^ migrate.go:342:15: ineffectual assignment to dirty (ineffassign) curVersion, dirty, err = m.databaseDrv.Version() ^ migrate.go:381:15: ineffectual assignment to dirty (ineffassign) curVersion, dirty, err = m.databaseDrv.Version() ^ migrate.go:436:3: ineffectual assignment to curVersion (ineffassign) curVersion, dirty, err = m.databaseDrv.Version()

I've enabled actions/workflows on the repo, so maybe CI will run now.

I've enabled actions/workflows on the repo, so maybe CI will run now.

Awesome, thanks 🎉! The CI workflow seems to run now 🔥
Also addressed the feedback of this comment with the introduction of the function mentioned in #3 (comment)

ffranr · 2025-10-08T12:58:34Z

migrate.go

+		err := task(migr, m.databaseDrv)
+		if err != nil {
+			// Mark the database version as the taskVersion but in a
+			// clean state, to indicate that the migration task
+			// errored. We will therefore re-run the task on the
+			// next migration run.
+			if setErr := m.databaseDrv.SetVersion(taskVersion, false); setErr != nil {


Are we assuming here that the migration task rolls back any partial changes it made, so that if it returns an error we can still mark the state as “clean”? Or is that guaranteed behavior? If it’s just an assumption, can we please update the comment to clarify why the state is marked as “clean” even when the task fails?

in TAP code i see:

the callback function that should be // run _after_ the step was executed (but before the version is marked as // cleanly executed). An error returned from the callback will cause the // migration to fail and the step to be marked as dirty.

In tapd, I see that the task callback is executed within a database transaction (see makePostStepCallbacks in tapdb/post_migration_checks.go). Do we need something similar here? In other words, should we wrap the task callback in a single transaction enforced by the migrate package? Otherwise, could a task end up executing multiple migrations via m.databaseDrv, leading to non-atomic changes that we wouldn’t be able to roll back on error?

Are we assuming here that the migration task rolls back any partial changes it made, so that if it returns an error we can still mark the state as “clean”? Or is that guaranteed behavior? If it’s just an assumption, can we please update the comment to clarify why the state is marked as “clean” even when the task fails?

The reason we mark it as clean is that this is the definition for the migration task version behaviour, i.e. that if the database version is currently set to a migration task version and clean, this is a guarantee that the migration task was executed with an error on the last attempt. If the database version is set to a migration task version in a dirty state, we cannot know for sure wether the migration task was previously executed successfully or not (hence, requiring the manual intervention).

In tapd, I see that the task callback is executed within a database transaction (see makePostStepCallbacks in tapdb/post_migration_checks.go). Do we need something similar here?

I agree and do think in most scenarios it makes sense to execute the migration task in a single transaction, but I don't think that's always guaranteed to be the desired behaviour. For example:
When lnd uses this functionality to migrate from kvdb to sql through a migration task, it's likely that this will be needed to be done in multiple database transactions as it would simply be too much data to migrate in a single database transaction. So ultimately I do think it should up to the implementer of the migration task to define if the migration task should be run in a single db transaction or not, and therefore I don't think this should be up to the implementation of the migrate library.
Additionally, I'm not sure that all database backends supported in the migrate library do actually support transactions + rollbacks. Therefore I think we can't actually implement this in the migrate library, and it must be up to the caller to decide how to handle this.

Therefore, I didn't address the feedback of guaranteeing the executing of a migration task in a single db transaction through the migrate library. Do you agree with my reasoning 🙏?

ffranr · 2025-10-08T13:11:17Z

migrate.go

+
+	select {
+	case r = <-migRet:
+	case <-time.After(30 * time.Second):


I think we should add a const for this timeout. Something like:

const ( // DeafultReadMigTimeout ... DeafultReadMigTimeout = 30 * time.Second )

ffranr · 2025-10-08T13:14:04Z

migrate.go

+		// set clean state
+		if err = m.databaseDrv.SetVersion(migr.TargetVersion, false); err != nil {


Comment here needs enhancement.

ffranr · 2025-10-08T13:15:06Z

migrate.go

+		err = m.execTask(migr)
+		if err != nil {
+			return err
+		}


add context to error here with fmt.Errorf?

GustavoStingelin

This PR looks good in terms of implementation, but I’m still missing some context about the motivation behind it.

@ViktorT-11, could you share a bit more background or point to any previous discussion about the issue this aims to solve?

From what I understood, this might be related to migrations that go beyond schema changes and include data updates, which could partially complete and time out. In that case, re-running the migration would have less data to process and might succeed on a retry. If that’s the intent, I’d love to discuss if we could tackle this problem in other ways, for example:

Making migrations more granular so each one processes smaller batches of data.
Adjusting the default timeout or providing clearer guidance on how to configure it for larger nodes, or even making it dynamically configurable.
Temporarily disabling constraints and using CONCURRENTLY where it’s available.

So, at first glance, the timeout scenario seems to be the main one this change would address. Are there other use cases that motivated it?

ViktorT-11 · 2025-10-27T08:41:19Z

@GustavoStingelin thanks again for looking through this PR as well 🙏!

From what I understood, this might be related to migrations that go beyond schema changes and include data updates

Exactly. A migration task is a separate step that can execute arbitrary code, and it runs immediately after a specific SQL schema migration. This step is primarily intended to operate on the data in the database once we are certain the schema has a known structure.

To make this more concrete, here is a real-world example of when we need such a task:

We are currently working on adding support for migrating the database backends in lnd and litd from bbolt to SQL. This involves iterating over every entry in the kvdb (bbolt) and inserting the equivalent row into the SQL database.

The reason this kvdb-to-SQL migration must be performed as a migration task is that we need the SQL tables to exactly match the schema that existed when the kvdb-to-SQL migration code was written. If the schema has changed because of later SQL migrations that were added after the kvdb-to-SQL code was created, then the migration could fail due to incompatible table definitions.

Therefore, the kvdb-to-SQL migration must run immediately after a specific SQL migration version (that is, as a migration task) and cannot be executed separately after all SQL schema migrations are complete.

This explains why we need support for arbitrary migration tasks. For the kvdb-to-SQL example, we also need the ability to re-run failed tasks on the next startup, which is what this PR adds. Since it is very difficult to anticipate every possible edge case in user data for lnd and litd, it is likely that some users will see migration errors during the kvdb-to-SQL migration (i.e. the migration task errors).

If we do not support retrying the task on the next startup, those users would not be able to re-trigger the migration even after we provide a hotfix for the kvdb-to-SQL code. The only alternative would be to have them manually modify their SQL database, which is far less ideal than simply retrying the migration on every startup until it succeeds.

I hope this clarifies the motivation behind migration tasks and the added functionality to retry them on startup, which is what this PR introduces 🙏

GustavoStingelin · 2025-10-28T17:51:09Z

Ok, thanks for sharing this context, concept ACK.

I noticed that the tests are failing because we already have some migration files using versions like 1885849751, which seem to represent timestamps. To fix this, I see two possible approaches:

Rename the existing versions to be below 1000000000.
This would remove the timestamp-style versions, but slightly break compatibility with future upstream migrations from the official repository.
Increase TaskVersionOffset to 4000000000.
This gives us a valid version space until around October 2, 2096, leaving the remaining 294967295 values available for contiguous migration versions (since Migration.version uses uint).

I think option 2 is enough for now. It is not a permanent solution, but it gives plenty of time to revisit this later if needed.

To reproduce the pipe error more easily:

$ go test ./database/postgres

  --- FAIL: Test (59.11s)
   --- FAIL: Test/testMigrate (0.00s)
       --- FAIL: Test/testMigrate/postgres:13 (3.47s)
            migrate_testing.go:30: UP
           migrate_testing.go:32: migration version 1085649617 is invalid, must be < 1000000000

test data: migrate/database/postgres/examples/migrations

ViktorT-11 · 2025-11-04T17:35:05Z

Thanks for the really great feedback @GustavoStingelin 🙏!

I've raised an internal discussion of how we want to tackle this issue, as it's unfortunately not as easy to solve except if we go for approach (1). The migration version is unfortunately being passed around as an int32 on 32bit OS builds, and with the offset approach added with this PR, the values available above the offset value needs to be at least the number of values existing below the offset. So the maximum offset we can use is therefore the max value of int32 / 2. Therefore approach (2) is not really viable either unfortunately...

I'll update you when we've reached a conclusion of how we want to tackle this issue.

ViktorT-11 · 2025-11-13T01:29:19Z

Updated the approach to achieve the re-run to the following logic:

We now define a migration as that it can either be an SQL migration or a Migration task, but not both.

If a migration task errors, we will reset the database version to the version it was set to before attempting to execute the migration task. Therefore, the migration task will be re-executed once more on the next startup.

That completely removes the need for the "migration task offset" approach.

I also suggest that we now rename the "Migration task"/"post migration step" to a "code migration" instead, as it better explains what that migration type actually is now that it cannot also be an SQL migration. Reviewers, do you agree with that reasoning?

migration_task.go

lightninglabs-deploy · 2025-12-02T11:36:34Z

@ffranr: review reminder
@ziggie1984: review reminder
@GustavoStingelin: review reminder

ffranr · 2025-12-02T13:18:38Z

Updated the approach to achieve the re-run to the following logic:

We now define a migration as that it can either be an SQL migration or a Migration task, but not both.

If a migration task errors, we will reset the database version to the version it was set to before attempting to execute the migration task. Therefore, the migration task will be re-executed once more on the next startup.

That completely removes the need for the "migration task offset" approach.

I also suggest that we now rename the "Migration task"/"post migration step" to a "code migration" instead, as it better explains what that migration type actually is now that it cannot also be an SQL migration. Reviewers, do you agree with that reasoning?

"code migration" is probably fine, but sounds strange to my mind since the code is not migrated (we are not modifying code). Possible alternatives: ProgrammaticMigration, CodeDrivenMigration? I don't feel strongly about this.

ffranr · 2025-12-02T16:10:36Z

migrate.go

+// meaning that the data doesn't only contain comments/whitespace or semicolons.
+func (m *Migrate) hasSQLMigration(data []byte) (bool, error) {
+	s := string(data)


If we expect data to be interpretable as a string we should push that requirement into the function signature:

hasSQLMigration(data string) (bool, error)

Also, I don't think this function has direct unit test coverage, which might be worth adding. A misclassification might mean we skip SQL entirely.

Done, also added unit test coverage.

migrate.go

migration_task.go

ffranr · 2025-12-02T16:35:09Z

migration_task.go

+// (but before the version is marked as cleanly executed). An error returned
+// from the task will cause the migration to fail and the step to be marked as
+// dirty.
+func WithMigrationTask(version uint, task MigrationTask) Option {


this doesn't seem to be used, but you could use it in WithMigrationTasks to align test coverage.

Changed so that WithMigrationTasks uses it :)

migrate_test.go

ffranr · 2025-12-02T17:13:55Z

migrate_test.go

+	if err == nil || !strings.Contains(err.Error(),
+		"migration has both a SQL migration and a migration task set") {


i think you should introduce a new errors.New value for this case.

Should we also do a migrationSequence assert here?

Should we also do a migrationSequence assert here?

Added :)

migrate_test.go

ffranr · 2025-12-02T17:30:54Z

migrate_test.go

+	checkVersion(-1)
+}
+
+func TestMigrationTypeCombos(t *testing.T) {


do we need down migration coverage here also? is the driver state correct re (not) dirty?

ffranr · 2025-12-02T17:42:14Z

migration_task.go

+// MigrationTask is a callback function type that can be used to execute a
+// Golang based migration step after a SQL based migration step has been
+// executed. The callback function receives the migration and the database


"after a SQL based migration step has been executed." is this comment now stale?

ffranr · 2025-12-02T17:52:01Z

migrate.go

+				// Read the body so we can inspect and (re)use it.
+				data, err := io.ReadAll(migr.BufferedBody)
+				if err != nil {
+					return fmt.Errorf("read migration body: %w", err)
+				}
+
+				// Reset the reader so the driver can read it
+				migr.BufferedBody = bytes.NewReader(data)


I think this now could pull large migrations fully into memory (and duplicate them). Could you perhaps wrap migr.BufferedBody in a bufio.Reader, then Peek up to a small cap (e.g., BufferSize or a fixed max like 128 KB) to classify the migration with hasSQLMigration against the peeked bytes, then reuse the same reader for the driver (the peeked data stays in the buffer)?

Added functionality to peek the first 128 KB, but did so without using bufio.Reader. Hope you like the change 😃

Rename the PostStepCallback to ProgrammaticMigration. In the upcoming commits, we will redefine the defintion of a migration, to either be an SQL migration, or a Programmatic GoLang based migration. As the migration therefore will not be executed **post** the SQL based migration anymore, we rename it to better reflect its purpose, i.e. that a migration is either only an SQL based migration, or a Programmatic one.

ViktorT-11 · 2025-12-04T01:14:44Z

Thanks a lot for the review @ffranr! I've addressed your feedback with the latest push 🔥.

I choose to rename the MigrationTask to ProgrammaticMigration, as that seems to be the most commonly used name to describe the functionality of what this PR introduces.

GustavoStingelin · 2025-12-04T03:24:17Z

Thanks for all the explanations in Keybase @ViktorT-11, and sorry for the slow review. I spent quite some time thinking about this PR over the past weeks, and I want to revisit the idea of not using intermediary ProgrammaticMigrations for the kvdb to SQL migration.

Here's how I envision the migration path:

x.0.0          -> All data in kvdb

x1.0.0         -> Run all SQL migrations
               -> Then copy data from kvdb (one-time, at the end, retryable, fixable)
               -> Mark that kvdb migration is complete

x1.0.1         -> Run new SQL migrations
               -> Check: was kvdb migration already done? Yes -> skip copy
               -> If upgrading from x.0.0 directly: run kvdb copy
               -> We need to ensure the kvdb copy logic works with the latest SQL schema

x1.1.0         -> Same thing - new migrations, conditional kvdb copy
               -> Keep kvdb copy logic up to date with schema changes

x2.0.0/x3.0.0  -> Drop support for kvdb migration
               -> Users must upgrade to x1.x first, then upgrade to x2+

x4.0.0+        -> No more kvdb migration support
               -> We don't need to maintain kvdb migration code anymore, delete old code
               -> Back to simple SQL-only migrations

Yes, it adds some work: We need to keep the kvdb migration code up to date with schema changes during the x1.y.z release cycle.

Still, I think it’s worth it: We can run the kvdb -> SQL copy entirely in the app layer, without touching the migrate library. This keeps our fork close to upstream, makes future upgrades easier, and keeps the flow easier to follow since there are no intermediary ProgrammaticMigrations, no commented empty files, and no custom behavior on the library.

What do you think? Am I missing any important reason to keep the version-specific task execution, or any trade-offs I have not considered?

ziggie1984 · 2025-12-04T17:30:37Z

x1.0.1 -> Run new SQL migrations
-> Check: was kvdb migration already done? Yes -> skip copy
-> If upgrading from x.0.0 directly: run kvdb copy
-> We need to ensure the kvdb copy logic works with the latest SQL schema

you last point here, would mean we always need to updated migration code because there is a possibility go directly from x.00 to x1.0.1, that seems complicated ?

GustavoStingelin · 2025-12-04T18:46:19Z

@ziggie1984 I agree that when the schema has many changes, things can become complicated in some situations.

@ffranr also mentioned that tapd is in a different position. We do not need to migrate from kvdb there, but some transformations are quite complex and painful to express only with raw SQL. An example is this one: https://github.com/lightninglabs/taproot-assets/blob/c441a3b0c4b2d356e21e3fa656b9183055e47559/tapdb/post_migration_checks.go#L103. Because of that, I am feeling more open to this approach now.

For clarity, I am not trying to block this work. I only want to be sure we explored other paths before adding this pattern to the codebase. Have we already checked how Goose approaches similar cases? Maybe we can borrow some ideas from them, for example: https://github.com/pressly/goose/blob/main/examples/go-migrations/00003_add_user_no_tx.go.

ViktorT-11 · 2025-12-05T19:05:04Z

Thanks for thinking this through a lot @GustavoStingelin, and for your suggestion 🔥! I agree that we should consider this carefully before introducing this pattern.

We did consider an approach similar to your proposal before choosing this approach. What ultimately led us here is the same concern @ziggie1984 mentioned:

-> Keep kvdb copy logic up to date with schema changes

I think this is easier said than done, since it will likely be quite a long time before we fully drop kvdb support in lnd. This means the copy logic would need to be maintained for a long period, which increases the risk of bugs and inconsistencies.

With this approach, we would effectively have two sources of truth:

the SQL migrations
the kvdb → SQL migration logic

This introduces a real risk of subtle differences creeping in over time as both paths evolve independently. As a result, the behaviour for the user could vary depending on at which actual version they run the kvdb → SQL migration at, which is something we would very much like to avoid.

By keeping this as a migration version that can only run at a single schema version, we eliminate that uncertainty.

Additionally, I think one positive reason of why we choose this current approach is that it solves another one of your steps:

-> Mark that kvdb migration is complete

This will be handled automatically by the database migration versioning system. That’s exactly how we manage all existing “normal” migrations, so it makes sense to track the KVDB → SQL migration in the same way. If we handled it differently, we would need to introduce and maintain a separate flag in the application’s database layer to indicate that the migration had occurred, which is less clean IMO.

That said, I do agree that the presence of “empty SQL files” is a drawback of the current approach, and I’m definitely open to improving that if we can come up with a better solution 🙂. But as of now, this approach is the best one we've seen so far. It's also worth noting that we already have live SQL versions in lnd today, which are soley kvdb -> SQL migrations.

Have we already checked how Goose approaches similar cases? Maybe we can borrow some ideas from them, for example: https://github.com/pressly/goose/blob/main/examples/go-migrations/00003_add_user_no_tx.go

If we could introduce actual Go-migrations with actual files similar to how Goose does it, I'd be very open to that! Ultimately I think isn't really possible when using the migrate library as the base we're building on. So I think that'd ultimately require that we switch out the migration handling tool to something like Goose instead, which would be quite a bit of work.

The good thing though is that I think that the approach this PR introduces which separates SQL migration from Programmatic Migrations, together with the empty .sql files, do not block us in any way from switching over to something like that in the future. In fact, I'd rather say that clearly separating SQL migrations from Programmatic Migrations to different database versions actually improves our ability to switch over to such a functionality in the future! So I rather see that as an extra motivation for the current approach this PR enables.

@ffranr also mentioned that tapd is in a different position. We do not need to migrate from kvdb there, but some transformations are quite complex and painful to express only with raw SQL.

Yes, we already have live code migrations in tapd, which does motivate this PR alone. So I do think we can go ahead with this PR, regardless of how we ultimately implement the kvdb -> SQL migrations.

Let me know if you disagree or have any question regarding my reasoning above :)

ffranr

Nice work!

It would be useful to see CI pass on a tapd PR that bumps the migration to this branch. Not essential, but a good signal.

GustavoStingelin · 2025-12-08T15:40:25Z

Ok, I'm convinced, thanks guys!

GustavoStingelin

ACK

ziggie1984

Great PR 🫡, had some questions regarding the implementation.

migrate.go

migrate_test.go

migrate.go

util_test.go

programmatic_migration.go

This commit modifies the migration framework to re-attempt programmatic migrations if they error during a migration, on the next run. Previously, if a programmatic migration failed, but their associated SQL migration succeeded, the database version would be set to a dirty state, and require manual intervention in order to reset the SQL migration and re-attempt it + the programmatic migration. The new re-attempt mechanism is achieved by ensuring that a migration can only be either an SQL migration or a programmatic migration, but not both. This way, if a programmatic migration errors, the database version will be reset to the previous version prior to executing the programmatic migration, and the programmatic migration will be re-attempted on the next run.

The last commit introduced functionality to rewind the database to the prior version before the migration was attempted, when a programmatic migration fails. This is done to ensure that such migrations are re-run on the next startup and to ensure that the database is not left in a dirty state. However, to support the old behavior that did not re-run failed programmatic migrations, we add a new flag called RerunOnError which is part of the migration definition. When this flag is set to false, a failing programmatic migration will leave the database dirty at the migration version of the failed programmatic migration, which matches the old behavior.

ViktorT-11 · 2025-12-11T00:36:35Z

Thanks for the reviews @ffranr, @GustavoStingelin & @ziggie1984 🎉🙏!

ziggie1984

LGTM, nice work 🎉

Previously, SQL migrations and programmatic migrations have been designed to be a single migration version, where the SQL migration was run first, followed by a post step migration callback that executed the programmatic migration. Due to a re-definition of the migration design, which separates SQL and programmatic migrations, we need to update tapd to become compatible with the new migration design before updating to the new design. The change was introduced with: lightninglabs/migrate#3 This commit therefore separates the programmatic migrations from migration `33` & `37`, into separate migration files, and adds those two new migrations to the `tapdb/sqlc/migrations` directory. Additionally, since users that have already run the programmatic migrations during migration `33` & `37` will have those migrations re-executed once more after updating to this PR, we test that re-execution of those programmatic migrations will be a no-op for those users, and won't add or change any data in the database.

Bump the migrate dependency to use the latest version of the `ll-fork` branch. This main changes of this new version are: * SQL migrations & Programmatic migrations now need to be separated into different migration versions. * Programmatic migrations can now be configured to be re-run on the next startup if they error. This is done by setting the `ResetVersionOnError` flag to `true` for the specific Programmatic migration. See lightninglabs/migrate#3 for more details.

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch from ed3932c to 7599e4c Compare September 25, 2025 09:53

levmi requested review from ellemouton and ffranr September 25, 2025 14:58

ffranr reviewed Sep 26, 2025

View reviewed changes

migrate.go Outdated Show resolved Hide resolved

ffranr assigned ViktorT-11 Sep 26, 2025

ffranr added the enhancement New feature or request label Sep 26, 2025

ViktorT-11 mentioned this pull request Sep 30, 2025

tapdb: error on dirty database versions lightninglabs/taproot-assets#1826

Merged

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch from 7599e4c to 226c8d7 Compare September 30, 2025 17:53

ViktorT-11 requested a review from ffranr September 30, 2025 18:01

ViktorT-11 mentioned this pull request Oct 8, 2025

sqldb/v2 as separate package lightningnetwork/lnd#10175

Open

ffranr reviewed Oct 8, 2025

View reviewed changes

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch 2 times, most recently from 1133fc1 to b98dde9 Compare October 13, 2025 23:38

ViktorT-11 requested a review from ffranr October 13, 2025 23:49

GustavoStingelin reviewed Oct 24, 2025

View reviewed changes

ViktorT-11 requested review from ziggie1984 and removed request for ellemouton October 29, 2025 10:40

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch from b98dde9 to b2047a3 Compare November 13, 2025 01:24

ViktorT-11 requested a review from GustavoStingelin November 13, 2025 01:29

ffranr mentioned this pull request Nov 25, 2025

tapd+sqlc: separate SQL migrations from code migrations lightninglabs/taproot-assets#1881

Merged

ViktorT-11 commented Nov 25, 2025

View reviewed changes

migration_task.go Outdated Show resolved Hide resolved

ffranr reviewed Dec 2, 2025

View reviewed changes

migrate: move ProgrammaticMigration to separate file

591a757

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch from b2047a3 to 85bdead Compare December 4, 2025 01:07

ViktorT-11 requested a review from ffranr December 4, 2025 01:14

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch 2 times, most recently from 3d22c46 to ff088bb Compare December 4, 2025 17:22

ffranr approved these changes Dec 8, 2025

View reviewed changes

GustavoStingelin approved these changes Dec 8, 2025

View reviewed changes

ziggie1984 reviewed Dec 10, 2025

View reviewed changes

ViktorT-11 added 2 commits December 10, 2025 21:58

ViktorT-11 force-pushed the 2025-09-rerun-post-step-callbacks-on-error branch from ff088bb to 3faabcf Compare December 10, 2025 21:00

ViktorT-11 requested a review from ziggie1984 December 11, 2025 00:36

ziggie1984 approved these changes Dec 11, 2025

View reviewed changes

ViktorT-11 merged commit 71c1eef into lightninglabs:ll-fork Dec 11, 2025
5 checks passed

ViktorT-11 mentioned this pull request Dec 11, 2025

Use latest version of the migrate dependency lightninglabs/taproot-assets#1914

Merged

ViktorT-11 mentioned this pull request Jan 21, 2026

db: use native SQL lightninglabs/lightning-terminal#917

Open

46 tasks

		// set clean state
		if err = m.databaseDrv.SetVersion(migr.TargetVersion, false); err != nil {

		if err == nil \|\| !strings.Contains(err.Error(),
		"migration has both a SQL migration and a migration task set") {

migrate: re-run post-step callbacks on error #3

migrate: re-run post-step callbacks on error #3

Uh oh!

Conversation

ViktorT-11 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffranr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

General naming suggestion:

Uh oh!

Uh oh!

ViktorT-11 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffranr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ViktorT-11 Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GustavoStingelin left a comment

Choose a reason for hiding this comment

Uh oh!

ViktorT-11 commented Oct 27, 2025

Uh oh!

GustavoStingelin commented Oct 28, 2025

Uh oh!

ViktorT-11 commented Nov 4, 2025

Uh oh!

ViktorT-11 commented Nov 13, 2025

Uh oh!

Uh oh!

lightninglabs-deploy commented Dec 2, 2025

Uh oh!

ffranr commented Dec 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ViktorT-11 commented Sep 25, 2025 •

edited

Loading

ffranr left a comment •

edited

Loading

ViktorT-11 commented Sep 30, 2025 •

edited

Loading

ViktorT-11 Oct 13, 2025 •

edited

Loading

ViktorT-11 commented Dec 5, 2025 •

edited

Loading