-
Notifications
You must be signed in to change notification settings - Fork 1
migrate: re-run post-step callbacks on error #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate: re-run post-step callbacks on error #3
Conversation
ed3932c to
7599e4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Version offset approach seems very clean to me! Nothing jumps out as being unviable so far.
Can we add unit tests to this PR which exercise the changes you've made? Checks the version at failed callback. Check that callback is executed once more at failed migration etc.
General naming suggestion:
Instead of PostStepCallback, maybe we should consider using something shorter like task or hook. Just to reduce verbosity and simplify the naming as we add more to this functionality.
For example:
execTask(*Migration)vsexecutePostStepCallbackexecTaskAtMigVersion(int)vsexecutePostStepCallbackForSQLMigInTaskVersionRange(int)vsIsPostStepCallbackVersionTaskVersionOffsetvsPostStepCallbackOffset
If we already know that a task in this context is a callback which runs after a migration step, then we don't need to prefix everything with "PostStep".
We could even add a new tasks.go and tasks_test.go files for example where we can document this feature better and keep our changes minimal in migrate.go, util.go, and migrate_test.go.
7599e4c to
226c8d7
Compare
|
Thanks a lot for the review @ffranr 🙏!
I definietly agree that the name PostStepCallback is confusing, so I added some additional commits which renames it to I placed the main code for that struct/funcs in a new
Good idea :). I added a new |
ffranr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the name changes, much easier to read now IMO!
I'm not sure about this: https://github.com/lightninglabs/migrate/pull/3/files#r2413777619
migrate.go
Outdated
| return m.unlockErr(err) | ||
| } | ||
|
|
||
| curVersion, dirty, err = m.databaseDrv.Version() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file, I see several new calls like curVersion, dirty, err = m.databaseDrv.Version(). We don’t appear to check the dirty value for these new calls. Should we add something like this here and in similar spots?
if dirty {
return m.unlockErr(ErrDirty{curVersion})
}If not, we should replace the unused return values with _ and add a brief comment explaining why it’s safe to ignore them.
I wander if we shouldn't extract the whole if InTaskVersionRange(curVersion) { block into a new function since it's repeated ~5 times. Maybe we can call that function maybeExecTask or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke ut much of the repeated logic + a little more into a new ensureCleanCurrentSQLVersion to address this feedback :). Hope you like it!
migrate.go
Outdated
| // If the current version is a clean migration task version, then we | ||
| // need to rerun the task for the previous version before we can | ||
| // continue with any SQL migration(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add more to this comment.
// If the current version is a clean migration task version, then we
// need to rerun the task before we can
// continue with any SQL migration(s). This is because...
because the migration process ended cleanly but the task did not complete? We know that the task didn't complete because the version is a task version and not an SQL migrate version.
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
We know that the task didn't complete because the version is a task version and not an SQL migrate version.
Correct, and specifically also that the database is set to a clean state + a migration task version. That can only ever happen if the migration task was run but returned an error.
| "finished for %v\n", migr.LogString()) | ||
| err := m.execTask(migr) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding context message to this err using fmt.Errorf.
migrate.go
Outdated
| task, ok := m.opts.tasks[migr.Version] | ||
| if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return early here if !ok to save on indent. Maybe something like:
task, ok := m.opts.tasks[migr.Version]
if !ok {
m.logVerbosePrintf("No migration task found for %v\n",
migr.LogString())
return nil
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks 🙏! Implemented :)
migrate.go
Outdated
|
|
||
| // Persist that we are in the migration task phase for this | ||
| // version. | ||
| if err := m.databaseDrv.SetVersion(taskVersion, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line and another below are over our line limit. I think the CI isn't configured to run the project's linter right now?
I think we can run the linter as currently setup for this project using command:
docker run --rm -v "$(pwd)":/app -w /app golangci/golangci-lint:v1.64.8 golangci-lint run --config .golangci.yml ./...
Doesn't seem to complain about line length, but gives output:
migrate.go:255:15: ineffectual assignment to dirty (ineffassign)
curVersion, dirty, err = m.databaseDrv.Version()
^
migrate.go:298:15: ineffectual assignment to dirty (ineffassign)
curVersion, dirty, err = m.databaseDrv.Version()
^
migrate.go:342:15: ineffectual assignment to dirty (ineffassign)
curVersion, dirty, err = m.databaseDrv.Version()
^
migrate.go:381:15: ineffectual assignment to dirty (ineffassign)
curVersion, dirty, err = m.databaseDrv.Version()
^
migrate.go:436:3: ineffectual assignment to curVersion (ineffassign)
curVersion, dirty, err = m.databaseDrv.Version()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enabled actions/workflows on the repo, so maybe CI will run now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enabled actions/workflows on the repo, so maybe CI will run now.
Awesome, thanks 🎉! The CI workflow seems to run now 🔥
Also addressed the feedback of this comment with the introduction of the function mentioned in #3 (comment)
migrate.go
Outdated
| err := task(migr, m.databaseDrv) | ||
| if err != nil { | ||
| // Mark the database version as the taskVersion but in a | ||
| // clean state, to indicate that the migration task | ||
| // errored. We will therefore re-run the task on the | ||
| // next migration run. | ||
| if setErr := m.databaseDrv.SetVersion(taskVersion, false); setErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming here that the migration task rolls back any partial changes it made, so that if it returns an error we can still mark the state as “clean”? Or is that guaranteed behavior? If it’s just an assumption, can we please update the comment to clarify why the state is marked as “clean” even when the task fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in TAP code i see:
the callback function that should be
// run _after_ the step was executed (but before the version is marked as
// cleanly executed). An error returned from the callback will cause the
// migration to fail and the step to be marked as dirty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tapd, I see that the task callback is executed within a database transaction (see makePostStepCallbacks in tapdb/post_migration_checks.go). Do we need something similar here? In other words, should we wrap the task callback in a single transaction enforced by the migrate package? Otherwise, could a task end up executing multiple migrations via m.databaseDrv, leading to non-atomic changes that we wouldn’t be able to roll back on error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming here that the migration task rolls back any partial changes it made, so that if it returns an error we can still mark the state as “clean”? Or is that guaranteed behavior? If it’s just an assumption, can we please update the comment to clarify why the state is marked as “clean” even when the task fails?
The reason we mark it as clean is that this is the definition for the migration task version behaviour, i.e. that if the database version is currently set to a migration task version and clean, this is a guarantee that the migration task was executed with an error on the last attempt. If the database version is set to a migration task version in a dirty state, we cannot know for sure wether the migration task was previously executed successfully or not (hence, requiring the manual intervention).
In tapd, I see that the task callback is executed within a database transaction (see makePostStepCallbacks in tapdb/post_migration_checks.go). Do we need something similar here?
I agree and do think in most scenarios it makes sense to execute the migration task in a single transaction, but I don't think that's always guaranteed to be the desired behaviour. For example:
When lnd uses this functionality to migrate from kvdb to sql through a migration task, it's likely that this will be needed to be done in multiple database transactions as it would simply be too much data to migrate in a single database transaction. So ultimately I do think it should up to the implementer of the migration task to define if the migration task should be run in a single db transaction or not, and therefore I don't think this should be up to the implementation of the migrate library.
Additionally, I'm not sure that all database backends supported in the migrate library do actually support transactions + rollbacks. Therefore I think we can't actually implement this in the migrate library, and it must be up to the caller to decide how to handle this.
Therefore, I didn't address the feedback of guaranteeing the executing of a migration task in a single db transaction through the migrate library. Do you agree with my reasoning 🙏?
migrate.go
Outdated
|
|
||
| select { | ||
| case r = <-migRet: | ||
| case <-time.After(30 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a const for this timeout. Something like:
const (
// DeafultReadMigTimeout ...
DeafultReadMigTimeout = 30 * time.Second
)
migrate.go
Outdated
| // set clean state | ||
| if err = m.databaseDrv.SetVersion(migr.TargetVersion, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment here needs enhancement.
migrate.go
Outdated
| err = m.execTask(migr) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add context to error here with fmt.Errorf?
1133fc1 to
b98dde9
Compare
GustavoStingelin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good in terms of implementation, but I’m still missing some context about the motivation behind it.
@ViktorT-11, could you share a bit more background or point to any previous discussion about the issue this aims to solve?
From what I understood, this might be related to migrations that go beyond schema changes and include data updates, which could partially complete and time out. In that case, re-running the migration would have less data to process and might succeed on a retry. If that’s the intent, I’d love to discuss if we could tackle this problem in other ways, for example:
- Making migrations more granular so each one processes smaller batches of data.
- Adjusting the default timeout or providing clearer guidance on how to configure it for larger nodes, or even making it dynamically configurable.
- Temporarily disabling constraints and using
CONCURRENTLYwhere it’s available.
So, at first glance, the timeout scenario seems to be the main one this change would address. Are there other use cases that motivated it?
|
@GustavoStingelin thanks again for looking through this PR as well 🙏!
Exactly. A migration task is a separate step that can execute arbitrary code, and it runs immediately after a specific SQL schema migration. This step is primarily intended to operate on the data in the database once we are certain the schema has a known structure. To make this more concrete, here is a real-world example of when we need such a task: We are currently working on adding support for migrating the database backends in The reason this kvdb-to-SQL migration must be performed as a migration task is that we need the SQL tables to exactly match the schema that existed when the kvdb-to-SQL migration code was written. If the schema has changed because of later SQL migrations that were added after the kvdb-to-SQL code was created, then the migration could fail due to incompatible table definitions. Therefore, the kvdb-to-SQL migration must run immediately after a specific SQL migration version (that is, as a migration task) and cannot be executed separately after all SQL schema migrations are complete. This explains why we need support for arbitrary migration tasks. For the kvdb-to-SQL example, we also need the ability to re-run failed tasks on the next startup, which is what this PR adds. Since it is very difficult to anticipate every possible edge case in user data for If we do not support retrying the task on the next startup, those users would not be able to re-trigger the migration even after we provide a hotfix for the kvdb-to-SQL code. The only alternative would be to have them manually modify their SQL database, which is far less ideal than simply retrying the migration on every startup until it succeeds. I hope this clarifies the motivation behind migration tasks and the added functionality to retry them on startup, which is what this PR introduces 🙏 |
|
Ok, thanks for sharing this context, concept ACK. I noticed that the tests are failing because we already have some migration files using versions like
I think option 2 is enough for now. It is not a permanent solution, but it gives plenty of time to revisit this later if needed. To reproduce the pipe error more easily: test data: |
|
Thanks for the really great feedback @GustavoStingelin 🙏! I've raised an internal discussion of how we want to tackle this issue, as it's unfortunately not as easy to solve except if we go for approach (1). The migration version is unfortunately being passed around as an I'll update you when we've reached a conclusion of how we want to tackle this issue. |
b98dde9 to
b2047a3
Compare
|
Updated the approach to achieve the re-run to the following logic: We now define a migration as that it can either be an SQL migration or a Migration task, but not both. If a migration task errors, we will reset the database version to the version it was set to before attempting to execute the migration task. Therefore, the migration task will be re-executed once more on the next startup. That completely removes the need for the "migration task offset" approach. I also suggest that we now rename the "Migration task"/"post migration step" to a "code migration" instead, as it better explains what that migration type actually is now that it cannot also be an SQL migration. Reviewers, do you agree with that reasoning? |
|
@ffranr: review reminder |
"code migration" is probably fine, but sounds strange to my mind since the code is not migrated (we are not modifying code). Possible alternatives: |
migrate.go
Outdated
| // meaning that the data doesn't only contain comments/whitespace or semicolons. | ||
| func (m *Migrate) hasSQLMigration(data []byte) (bool, error) { | ||
| s := string(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect data to be interpretable as a string we should push that requirement into the function signature:
hasSQLMigration(data string) (bool, error)
Also, I don't think this function has direct unit test coverage, which might be worth adding. A misclassification might mean we skip SQL entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also added unit test coverage.
migration_task.go
Outdated
| // (but before the version is marked as cleanly executed). An error returned | ||
| // from the task will cause the migration to fail and the step to be marked as | ||
| // dirty. | ||
| func WithMigrationTask(version uint, task MigrationTask) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to be used, but you could use it in WithMigrationTasks to align test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed so that WithMigrationTasks uses it :)
migrate_test.go
Outdated
| if err == nil || !strings.Contains(err.Error(), | ||
| "migration has both a SQL migration and a migration task set") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you should introduce a new errors.New value for this case.
Should we also do a migrationSequence assert here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do a migrationSequence assert here?
Added :)
| checkVersion(-1) | ||
| } | ||
|
|
||
| func TestMigrationTypeCombos(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need down migration coverage here also? is the driver state correct re (not) dirty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 🚀
migration_task.go
Outdated
| // MigrationTask is a callback function type that can be used to execute a | ||
| // Golang based migration step after a SQL based migration step has been | ||
| // executed. The callback function receives the migration and the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"after a SQL based migration step has been executed." is this comment now stale?
migrate.go
Outdated
| // Read the body so we can inspect and (re)use it. | ||
| data, err := io.ReadAll(migr.BufferedBody) | ||
| if err != nil { | ||
| return fmt.Errorf("read migration body: %w", err) | ||
| } | ||
|
|
||
| // Reset the reader so the driver can read it | ||
| migr.BufferedBody = bytes.NewReader(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this now could pull large migrations fully into memory (and duplicate them). Could you perhaps wrap migr.BufferedBody in a bufio.Reader, then Peek up to a small cap (e.g., BufferSize or a fixed max like 128 KB) to classify the migration with hasSQLMigration against the peeked bytes, then reuse the same reader for the driver (the peeked data stays in the buffer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added functionality to peek the first 128 KB, but did so without using bufio.Reader. Hope you like the change 😃
Rename the PostStepCallback to ProgrammaticMigration. In the upcoming commits, we will redefine the defintion of a migration, to either be an SQL migration, or a Programmatic GoLang based migration. As the migration therefore will not be executed **post** the SQL based migration anymore, we rename it to better reflect its purpose, i.e. that a migration is either only an SQL based migration, or a Programmatic one.
b2047a3 to
85bdead
Compare
|
Thanks a lot for the review @ffranr! I've addressed your feedback with the latest push 🔥. I choose to rename the |
|
Thanks for all the explanations in Keybase @ViktorT-11, and sorry for the slow review. I spent quite some time thinking about this PR over the past weeks, and I want to revisit the idea of not using intermediary ProgrammaticMigrations for the kvdb to SQL migration. Here's how I envision the migration path:Yes, it adds some work: We need to keep the kvdb migration code up to date with schema changes during the x1.y.z release cycle. Still, I think it’s worth it: We can run the kvdb -> SQL copy entirely in the app layer, without touching the migrate library. This keeps our fork close to upstream, makes future upgrades easier, and keeps the flow easier to follow since there are no intermediary ProgrammaticMigrations, no commented empty files, and no custom behavior on the library. What do you think? Am I missing any important reason to keep the version-specific task execution, or any trade-offs I have not considered? |
3d22c46 to
ff088bb
Compare
you last point here, would mean we always need to updated migration code because there is a possibility go directly from x.00 to x1.0.1, that seems complicated ? |
|
@ziggie1984 I agree that when the schema has many changes, things can become complicated in some situations. @ffranr also mentioned that tapd is in a different position. We do not need to migrate from kvdb there, but some transformations are quite complex and painful to express only with raw SQL. An example is this one: https://github.com/lightninglabs/taproot-assets/blob/c441a3b0c4b2d356e21e3fa656b9183055e47559/tapdb/post_migration_checks.go#L103. Because of that, I am feeling more open to this approach now. For clarity, I am not trying to block this work. I only want to be sure we explored other paths before adding this pattern to the codebase. Have we already checked how Goose approaches similar cases? Maybe we can borrow some ideas from them, for example: https://github.com/pressly/goose/blob/main/examples/go-migrations/00003_add_user_no_tx.go. |
|
Thanks for thinking this through a lot @GustavoStingelin, and for your suggestion 🔥! I agree that we should consider this carefully before introducing this pattern. We did consider an approach similar to your proposal before choosing this approach. What ultimately led us here is the same concern @ziggie1984 mentioned:
I think this is easier said than done, since it will likely be quite a long time before we fully drop kvdb support in With this approach, we would effectively have two sources of truth:
This introduces a real risk of subtle differences creeping in over time as both paths evolve independently. As a result, the behaviour for the user could vary depending on at which actual version they run the kvdb → SQL migration at, which is something we would very much like to avoid. By keeping this as a migration version that can only run at a single schema version, we eliminate that uncertainty. Additionally, I think one positive reason of why we choose this current approach is that it solves another one of your steps:
This will be handled automatically by the database migration versioning system. That’s exactly how we manage all existing “normal” migrations, so it makes sense to track the KVDB → SQL migration in the same way. If we handled it differently, we would need to introduce and maintain a separate flag in the application’s database layer to indicate that the migration had occurred, which is less clean IMO. That said, I do agree that the presence of “empty SQL files” is a drawback of the current approach, and I’m definitely open to improving that if we can come up with a better solution 🙂. But as of now, this approach is the best one we've seen so far. It's also worth noting that we already have live SQL versions in
If we could introduce actual Go-migrations with actual files similar to how Goose does it, I'd be very open to that! Ultimately I think isn't really possible when using the The good thing though is that I think that the approach this PR introduces which separates SQL migration from Programmatic Migrations, together with the empty
Yes, we already have live code migrations in Let me know if you disagree or have any question regarding my reasoning above :) |
ffranr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
It would be useful to see CI pass on a tapd PR that bumps the migration to this branch. Not essential, but a good signal.
|
Ok, I'm convinced, thanks guys! |
GustavoStingelin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
ziggie1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR 🫡, had some questions regarding the implementation.
This commit modifies the migration framework to re-attempt programmatic migrations if they error during a migration, on the next run. Previously, if a programmatic migration failed, but their associated SQL migration succeeded, the database version would be set to a dirty state, and require manual intervention in order to reset the SQL migration and re-attempt it + the programmatic migration. The new re-attempt mechanism is achieved by ensuring that a migration can only be either an SQL migration or a programmatic migration, but not both. This way, if a programmatic migration errors, the database version will be reset to the previous version prior to executing the programmatic migration, and the programmatic migration will be re-attempted on the next run.
The last commit introduced functionality to rewind the database to the prior version before the migration was attempted, when a programmatic migration fails. This is done to ensure that such migrations are re-run on the next startup and to ensure that the database is not left in a dirty state. However, to support the old behavior that did not re-run failed programmatic migrations, we add a new flag called RerunOnError which is part of the migration definition. When this flag is set to false, a failing programmatic migration will leave the database dirty at the migration version of the failed programmatic migration, which matches the old behavior.
ff088bb to
3faabcf
Compare
|
Thanks for the reviews @ffranr, @GustavoStingelin & @ziggie1984 🎉🙏! |
ziggie1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work 🎉
Previously, SQL migrations and programmatic migrations have been designed to be a single migration version, where the SQL migration was run first, followed by a post step migration callback that executed the programmatic migration. Due to a re-definition of the migration design, which separates SQL and programmatic migrations, we need to update tapd to become compatible with the new migration design before updating to the new design. The change was introduced with: lightninglabs/migrate#3 This commit therefore separates the programmatic migrations from migration `33` & `37`, into separate migration files, and adds those two new migrations to the `tapdb/sqlc/migrations` directory. Additionally, since users that have already run the programmatic migrations during migration `33` & `37` will have those migrations re-executed once more after updating to this PR, we test that re-execution of those programmatic migrations will be a no-op for those users, and won't add or change any data in the database.
Previously, SQL migrations and programmatic migrations have been designed to be a single migration version, where the SQL migration was run first, followed by a post step migration callback that executed the programmatic migration. Due to a re-definition of the migration design, which separates SQL and programmatic migrations, we need to update tapd to become compatible with the new migration design before updating to the new design. The change was introduced with: lightninglabs/migrate#3 This commit therefore separates the programmatic migrations from migration `33` & `37`, into separate migration files, and adds those two new migrations to the `tapdb/sqlc/migrations` directory. Additionally, since users that have already run the programmatic migrations during migration `33` & `37` will have those migrations re-executed once more after updating to this PR, we test that re-execution of those programmatic migrations will be a no-op for those users, and won't add or change any data in the database.
Bump the migrate dependency to use the latest version of the `ll-fork` branch. This main changes of this new version are: * SQL migrations & Programmatic migrations now need to be separated into different migration versions. * Programmatic migrations can now be configured to be re-run on the next startup if they error. This is done by setting the `ResetVersionOnError` flag to `true` for the specific Programmatic migration. See lightninglabs/migrate#3 for more details.
Bump the migrate dependency to use the latest version of the `ll-fork` branch. This main changes of this new version are: * SQL migrations & Programmatic migrations now need to be separated into different migration versions. * Programmatic migrations can now be configured to be re-run on the next startup if they error. This is done by setting the `ResetVersionOnError` flag to `true` for the specific Programmatic migration. See lightninglabs/migrate#3 for more details.
Bump the migrate dependency to use the latest version of the `ll-fork` branch. This main changes of this new version are: * SQL migrations & Programmatic migrations now need to be separated into different migration versions. * Programmatic migrations can now be configured to be re-run on the next startup if they error. This is done by setting the `ResetVersionOnError` flag to `true` for the specific Programmatic migration. See lightninglabs/migrate#3 for more details.
This commit modifies the migration framework to re-attempt post-step callbacks if they error during a migration, and won't leave the database version as
dirty.Implemenation approach:
This is achieved by introducing the concept of a "post-step callback" migration version. Post-step callbacks are their corresponding SQL migration version offset by +1000000000.
During the execution of a post-step callback, the post-step callback migration version will be persisted as the database version. That way, if the post-step callback errors, the version for the database will be the post-step callback version on the next startup. The post-step callback will then be re-attempted before proceeding with the next SQL migration.
Alternative approach:
Another solution to the issue, would be to individually track if a post migration has been executed successfully or not in a separate version table for post migrations. I chose not to go for that approach as that would require individual implementation for every database backend type and therefore be a much larger change.
I'm open to change the approach though if reviewers do not prefer the approach implemented by this PR.