Batched background migrations
Batched background migrations should be used to perform data migrations whenever a migration exceeds the time limits in our guidelines. For example, you can use batched background migrations to migrate data that's stored in a single JSON column to a separate table instead.
NOTE: Batched background migrations replaced the legacy background migrations framework. Check that documentation in reference to any changes involving that framework.
NOTE: The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
When to use batched background migrations
Use a batched background migration when you migrate data in tables containing so many rows that the process would exceed the time limits in our guidelines if performed using a regular Rails migration.
- Batched background migrations should be used when migrating data in high-traffic tables.
- Batched background migrations may also be used when executing numerous single-row queries for every item on a large dataset. Typically, for single-record patterns, runtime is largely dependent on the size of the dataset. Split the dataset accordingly, and put it into background migrations.
- Don't use batched background migrations to perform schema migrations.
Background migrations can help when:
- Migrating events from one table to multiple separate tables.
- Populating one column based on JSON stored in another column.
- Migrating data that depends on the output of external services. (For example, an API.)
Notes
- If the batched background migration is part of an important upgrade, it must be announced in the release post. Discuss with your Project Manager if you're unsure if the migration falls into this category.
- You should use the generator to create batched background migrations, so that required files are created by default.
How batched background migrations work
Batched background migrations (BBM) are subclasses of
Gitlab::BackgroundMigration::BatchedMigrationJob
that define a perform
method.
As the first step, a regular migration creates a batched_background_migrations
record with the BBM class and the required arguments. By default,
batched_background_migrations
is in an active state, and those are picked up
by the Sidekiq worker to execute the actual batched migration.
All migration classes must be defined in the namespace Gitlab::BackgroundMigration
. Place the files
in the directory lib/gitlab/background_migration/
.
Execution mechanism
Batched background migrations are picked from the queue in the order they are enqueued. Multiple migrations are fetched and executed in parallel, as long they are in active state and do not target the same database table. The default number of migrations processed in parallel is 2, for GitLab.com this limit is configured to 4. Once migration is picked for execution, a job is created for the specific batch. After each job execution, migration's batch size may be increased or decreased, based on the performance of the last 20 jobs.
Soon as a worker is available, the BBM is processed by the runner.
Idempotence
Batched background migrations are executed in a context of a Sidekiq process. The usual Sidekiq rules apply, especially the rule that jobs should be small and idempotent. Ensure that in the case where your migration job is retried, data integrity is guaranteed.
See Sidekiq best practices guidelines for more details.
Migration optimization
After each job execution, a verification takes place to check if the migration can be optimized. The optimization underlying mechanic is based on the concept of time efficiency. It calculates the exponential moving average of time efficiencies for the last N jobs and updates the batch size of the batched background migration to its optimal value.
This mechanism, however, makes it hard for us to provide an accurate estimation for total execution time of the migration when using the database migration pipeline.
We are discussing the ways to fix this problem in this issue
Job retry mechanism
The batched background migrations retry mechanism ensures that a job is executed again in case of failure. The following diagram shows the different stages of our retry mechanism:
-
MAX_ATTEMPTS
is defined in theGitlab::Database::BackgroundMigration
class. -
can_split?
is defined in theGitlab::Database::BatchedJob
class.
Failed batched background migrations
The whole batched background migration is marked as failed
(/chatops run batched_background_migrations status MIGRATION_ID
shows
the migration as failed
) if any of the following is true:
- There are no more jobs to consume, and there are failed jobs.
- More than half of the jobs failed since the background migration was started.
Throttling batched migrations
Because batched migrations are update heavy and there have been incidents due to the heavy load from these migrations while the database was underperforming, a throttling mechanism exists to mitigate future incidents.
These database indicators are checked to throttle a migration. Upon receiving a stop signal, the migration is paused for a set time (10 minutes):
- WAL queue pending archival crossing the threshold.
- Active autovacuum on the tables on which the migration works on.
- Patroni apdex SLI dropping below the SLO.
- WAL rate crossing the threshold.
There is an ongoing effort to add more indicators to further enhance the database health check framework. For more details, see epic 7594.
Isolation
Batched background migrations must be isolated and cannot use application code (for example,
models defined in app/models
except the ApplicationRecord
classes).
Because these migrations can take a long time to run, it's possible
for new versions to deploy while the migrations are still running.
Depending on migrated data
Unlike a regular or a post migration, waiting for the next release is not enough to guarantee that the data was fully migrated.
That means that you shouldn't depend on the data until the BBM is finished. If having 100% of the data migrated is a requirement,
then, the ensure_batched_background_migration_is_finished
helper can be used to guarantee that the migration was finished and the
data fully migrated. (See an example).
How to
Generate a batched background migration
The custom generator batched_background_migration
scaffolds necessary files and
accepts table_name
, column_name
, and feature_category
as arguments. When
choosing the column_name
, ensure that you are using a column type that can be iterated over distinctly,
preferably the table's primary key. The table will be iterated over based on the column defined here.
For more information, see Batch over non-distinct columns.
Usage:
bundle exec rails g batched_background_migration my_batched_migration --table_name=<table-name> --column_name=<column-name> --feature_category=<feature-category>
This command creates the following files:
db/post_migrate/20230214231008_queue_my_batched_migration.rb
spec/migrations/20230214231008_queue_my_batched_migration_spec.rb
lib/gitlab/background_migration/my_batched_migration.rb
spec/lib/gitlab/background_migration/my_batched_migration_spec.rb
Enqueue a batched background migration
Queueing a batched background migration should be done in a post-deployment
migration. Use this queue_batched_background_migration
example, queueing the
migration to be executed in batches. Replace the class name and arguments with the values
from your migration:
queue_batched_background_migration(
JOB_CLASS_NAME,
TABLE_NAME,
JOB_ARGUMENTS,
JOB_INTERVAL
)
NOTE:
This helper raises an error if the number of provided job arguments does not match
the number of job arguments defined in JOB_CLASS_NAME
.
Make sure the newly-created data is either migrated, or saved in both the old and new version upon creation. Removals in turn can be handled by defining foreign keys with cascading deletes.
Finalize a batched background migration
Finalizing a batched background migration is done by calling
ensure_batched_background_migration_is_finished
, after at-least, one required stop from queuing it.
This ensures a smooth upgrade process for self-managed instances.
It is important to finalize all batched background migrations when it is safe to do so. Leaving around old batched background migration is a form of technical debt that needs to be maintained in tests and in application behavior. It is important to note that you cannot depend on any batched background migration being completed until after it is finalized.
We recommend that batched background migrations are finalized after all of the following conditions are met:
- The batched background migration is completed on GitLab.com
- The batched background migration was added in or before the last required stop. For example if 17.8 is a required stop and the migration was added in 17.7, the finalizing migration can be added in 17.9.
The ensure_batched_background_migration_is_finished
call must exactly match
the migration that was used to enqueue it. Pay careful attention to:
- The job arguments: Needs to exactly match or it will not find the queued migration
- The
gitlab_schema
: Needs to exactly match or it will not find the queued migration. Even if thegitlab_schema
of the table has changed fromgitlab_main
togitlab_main_cell
in the meantime you must finalize it withgitlab_main
if that's what was used when queueing the batched background migration.
When finalizing a batched background migration you also need to update the
finalized_by
in the corresponding db/docs/batched_background_migrations
file. The value should be the timestamp/version of the migration you added to
finalize it. The schema version of the RSpec tests
associated with the migration should also be set to this version to avoid having the tests fail due
to future schema changes.
See the below Examples for specific details on what the actual migration code should be.
NOTE:
If the migration is being finalized before one required stop since it was enqueued, an early finalization
error will be raised. If the migration requires to be finalized before one required stop,
use skip_early_finalization_validation: true
option to skip this check.
Deleting batched background migration code
Once a batched background migration has completed, is finalized and has not been re-queued,
the migration code in lib/gitlab/background_migration/
and its associated tests can be deleted after the next required stop following
the finalization.
Here is an example scenario:
- 17.2 and 17.5 are required stops.
- In 17.0 the batched background migration is queued.
- In 17.3 the migration may be finalized, provided that it's completed in GitLab.com.
- In 17.6 the code related to the migration may be deleted.
Batched background migration code is routinely deleted when migrations are squashed.
Re-queue batched background migrations
A batched background migration might need to be re-run for one of several reasons:
- The migration contains a bug (example).
- The migration cleaned up data but the data became de-normalized again due to a bypass in application logic (example).
- The batch size of the original migration causes the migration to fail (example).
To requeue a batched background migration, you must:
- No-op the contents of the
#up
and#down
methods of the original migration file. Otherwise, the batched background migration is created, deleted, then created again on systems that are upgrading multiple patch releases at once. - Add a new post-deployment migration that re-runs the batched background migration.
- In the new post-deployment migration, delete the existing batched background
migration using the
delete_batched_background_migration
method at the start of the#up
method to ensure that any existing runs are cleaned up. - Update the
db/docs/batched_background_migration/*.yml
file from the original migration to include information about the requeue.
Example
Original Migration:
# frozen_string_literal: true
class QueueResolveVulnerabilitiesForRemovedAnalyzers < Gitlab::Database::Migration[2.2]
milestone '17.3'
MIGRATION = "ResolveVulnerabilitiesForRemovedAnalyzers"
def up
# no-op because there was a bug in the original migration, which has been
# fixed by
end
def down
# no-op because there was a bug in the original migration, which has been
# fixed in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/162527
end
end
Requeued migration:
# frozen_string_literal: true
class RequeueResolveVulnerabilitiesForRemovedAnalyzers < Gitlab::Database::Migration[2.2]
milestone '17.4'
restrict_gitlab_migration gitlab_schema: :gitlab_main
MIGRATION = "ResolveVulnerabilitiesForRemovedAnalyzers"
DELAY_INTERVAL = 2.minutes
BATCH_SIZE = 10_000
SUB_BATCH_SIZE = 100
def up
# Clear previous background migration execution from QueueResolveVulnerabilitiesForRemovedAnalyzers
delete_batched_background_migration(MIGRATION, :vulnerability_reads, :id, [])
queue_batched_background_migration(
MIGRATION,
:vulnerability_reads,
:id,
job_interval: DELAY_INTERVAL,
batch_size: BATCH_SIZE,
sub_batch_size: SUB_BATCH_SIZE
)
end
def down
delete_batched_background_migration(MIGRATION, :vulnerability_reads, :id, [])
end
end
Batched migration dictionary:
The milestone
and queued_migration_version
should be the ones of requeued migration (in this example: RequeueResolveVulnerabilitiesForRemovedAnalyzers).
---
migration_job_name: ResolveVulnerabilitiesForRemovedAnalyzers
description: Resolves all detected vulnerabilities for removed analyzers.
feature_category: static_application_security_testing
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/162691
milestone: '17.4'
queued_migration_version: 20240814085540
finalized_by: # version of the migration that finalized this BBM
Stop and remove batched background migrations
A batched background migration in running state can be stopped and removed for several reasons:
- When the migration is no longer relevant or required as the product use case changed.
- The migration has to be superseded with another migration with a different logic.
To stop and remove an inprogress batched background migration, you must:
- In Release N, No-op the contents of the
#up
and#down
methods of the scheduling database migration.
class BackfillNamespaceType < Gitlab::Database::Migration[2.1]
# Reason why we don't need the BBM anymore. E.G: This BBM is no longer needed because it will be superseded by another BBM with different logic.
def up; end
def down; end
end
- In Release N, add a regular migration, to delete the existing batched migration. Delete the existing batched background migration using the
delete_batched_background_migration
method at the start of the#up
method to ensure that any existing runs are cleaned up.
class CleanupBackfillNamespaceType < Gitlab::Database::Migration[2.1]
MIGRATION = "MyMigrationClass"
DELAY_INTERVAL = 2.minutes
BATCH_SIZE = 50_000
restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
delete_batched_background_migration(MIGRATION, :vulnerabilities, :id, [])
end
def down
delete_batched_background_migration(MIGRATION, :vulnerabilities, :id, [])
end
end
- In Release N, also delete the migration class file (
lib/gitlab/background_migration/my_batched_migration.rb
) and its specs.
All the above steps can be implemented in a single MR.
Use job arguments
BatchedMigrationJob
provides the job_arguments
helper method for job classes to define the job arguments they need.
Batched migrations scheduled with queue_batched_background_migration
must use the helper to define the job arguments:
NOTE:
For EE migrations that define scope_to
, ensure the module extends ActiveSupport::Concern
.
Otherwise, records are processed without taking the scope into consideration.
-
In the post-deployment migration, enqueue the batched background migration:
class BackfillNamespaceType < Gitlab::Database::Migration[2.1] MIGRATION = 'BackfillNamespaceType' DELAY_INTERVAL = 2.minutes restrict_gitlab_migration gitlab_schema: :gitlab_main def up queue_batched_background_migration( MIGRATION, :namespaces, :id, job_interval: DELAY_INTERVAL ) end def down delete_batched_background_migration(MIGRATION, :namespaces, :id, []) end end
Access data for multiple databases
Background migration contrary to regular migrations does have access to multiple databases
and can be used to efficiently access and update data across them. To properly indicate
a database to be used it is desired to create ActiveRecord model inline the migration code.
Such model should use a correct ApplicationRecord
depending on which database the table is located. As such usage of ActiveRecord::Base
is disallowed as it does not describe a explicitly database to be used to access given table.
Examples
Routes use-case
The routes
table has a source_type
field that's used for a polymorphic relationship.
As part of a database redesign, we're removing the polymorphic relationship. One step of
the work is migrating data from the source_id
column into a new singular foreign key.
Because we intend to delete old rows later, there's no need to update them as part of the
background migration.
-
Start by using the generator to create batched background migration files:
bundle exec rails g batched_background_migration BackfillRouteNamespaceId --table_name=routes --column_name=id --feature_category=source_code_management
-
Update the migration job (subclass of
BatchedMigrationJob
) to copysource_id
values tonamespace_id
:class Gitlab::BackgroundMigration::BackfillRouteNamespaceId < BatchedMigrationJob # For illustration purposes, if we were to use a local model we could # define it like below, using an `ApplicationRecord` as the base class # class Route < ::ApplicationRecord # self.table_name = 'routes' # end operation_name :update_all feature_category :source_code_management def perform each_sub_batch( batching_scope: -> (relation) { relation.where("source_type <> 'UnusedType'") } ) do |sub_batch| sub_batch.update_all('namespace_id = source_id') end end end
NOTE: Job classes inherit from
BatchedMigrationJob
to ensure they are correctly handled by the batched migration framework. Any subclass ofBatchedMigrationJob
is initialized with the necessary arguments to execute the batch, and a connection to the tracking database. -
Create a database migration that adds a new trigger to the database. Example:
class AddTriggerToRoutesToCopySourceIdToNamespaceId < Gitlab::Database::Migration[2.1] FUNCTION_NAME = 'example_function' TRIGGER_NAME = 'example_trigger' def up execute(<<~SQL) CREATE OR REPLACE FUNCTION #{FUNCTION_NAME}() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NEW."namespace_id" = NEW."source_id" RETURN NEW; END; $$; CREATE TRIGGER #{TRIGGER_NAME}() AFTER INSERT OR UPDATE ON routes FOR EACH ROW EXECUTE FUNCTION #{FUNCTION_NAME}(); SQL end def down drop_trigger(TRIGGER_NAME, :routes) drop_function(FUNCTION_NAME) end end
-
Update the created post-deployment migration with required delay and batch sizes:
class QueueBackfillRoutesNamespaceId < Gitlab::Database::Migration[2.1] MIGRATION = 'BackfillRouteNamespaceId' DELAY_INTERVAL = 2.minutes BATCH_SIZE = 1000 SUB_BATCH_SIZE = 100 restrict_gitlab_migration gitlab_schema: :gitlab_main def up queue_batched_background_migration( MIGRATION, :routes, :id, job_interval: DELAY_INTERVAL, batch_size: BATCH_SIZE, sub_batch_size: SUB_BATCH_SIZE ) end def down delete_batched_background_migration(MIGRATION, :routes, :id, []) end end
# db/docs/batched_background_migrations/backfill_route_namespace_id.yml --- migration_job_name: BackfillRouteNamespaceId description: Copies source_id values from routes to namespace_id feature_category: source_code_management introduced_by_url: "https://mr_url" milestone: 16.6 queued_migration_version: 20231113120650 finalized_by: # version of the migration that ensured this bbm
NOTE: When queuing a batched background migration, you need to restrict the schema to the database where you make the actual changes. In this case, we are updating
routes
records, so we setrestrict_gitlab_migration gitlab_schema: :gitlab_main
. If, however, you need to perform a CI data migration, you would setrestrict_gitlab_migration gitlab_schema: :gitlab_ci
.After deployment, our application:
- Continues using the data as before.
- Ensures that both existing and new data are migrated.
-
Add a new post-deployment migration that checks that the batched background migration is complete. Also update
finalized_by
attribute in BBM dictionary with the version of this migration.class FinalizeBackfillRouteNamespaceId < Gitlab::Database::Migration[2.1] MIGRATION = 'BackfillRouteNamespaceId' disable_ddl_transaction! restrict_gitlab_migration gitlab_schema: :gitlab_main def up ensure_batched_background_migration_is_finished( job_class_name: MIGRATION, table_name: :routes, column_name: :id, job_arguments: [], finalize: true ) end def down # no-op end end
# db/docs/batched_background_migrations/backfill_route_namespace_id.yml --- migration_job_name: BackfillRouteNamespaceId description: Copies source_id values from routes to namespace_id feature_category: source_code_management introduced_by_url: "https://mr_url" milestone: 16.6 queued_migration_version: 20231113120650 finalized_by: 20231115120912
NOTE: If the batched background migration is not finished, the system will execute the batched background migration inline. If you don't want to see this behavior, you need to pass
finalize: false
.If the application does not depend on the data being 100% migrated (for instance, the data is advisory, and not mission-critical), then you can skip this final step. This step confirms that the migration is completed, and all of the rows were migrated.
-
Add a database migration to remove the trigger.
class RemoveNamepaceIdTriggerFromRoutes < Gitlab::Database::Migration[2.1] FUNCTION_NAME = 'example_function' TRIGGER_NAME = 'example_trigger' def up drop_trigger(TRIGGER_NAME, :routes) drop_function(FUNCTION_NAME) end def down # Should reverse the trigger and the function in the up method of the migration that added it end end
After the batched migration is completed, you can safely depend on the
data in routes.namespace_id
being populated.