In a recent series of articles on ECS, I described a methodology for provisioning and deploying a test FastAPI application on AWS ECS.
One of the more difficult aspects of it is handling database migrations. When and how should the migrations be run when deploying onto ECS? Do we run it as part of the service deployment or as a separate task?
My personal preference is to run database migration as a separate task. This would decouple the handling of database migrations as a separate operation. While it is possible to create a single task definition with the migration task running as a container before the service container starts, I always feel that this is an anti-pattern. What happens if the database migration takes a long time due to the size of the database? The service deployment would need to wait, resulting in poor feedback loop.
After some trial and error I came up with the following approach, which consists of 2 stages:
-
Running a ECS task for the database migration during initial setup.
-
Creating a lambda to run the database migration for deployment workflows.
Running database migration during setup
When setting up the infrastructure via terraform, we create a one-off task via aws cli:
The terraform_data resource calls ecs run-task to invoke the database migration. It overrides the application container image command with a call to alembic. It requires the network configuration to be specified together with the container overrides. We wait for the task to stop before proceeding to provision the initial application service and code deploy resources.
Database migration lambda
To run the database migration after provisioning, I created a separate lambda which gets invoked via a custom EventBridge rule. This would run a standalone ECS task to perform the database migration when required. By using an event-driven approach, we can run the database migration when we choose to, so long as we create the right event to put on the Event Bus.
For this example, I use the default event bus.
The rule I created is called run-database-migration and has the following event pattern:
It has a source of ecs.migration and a detail type of ECS migration task. It has a reference to the latest task definition which will be used by the lambda code to run a migration task.
In my terraform resource files, I created the following lambda resources:
The HCL code above defines the lambda which calls the migration task. It’s triggered by a custom event rule defined by aws_cloudwatch_event_rule. The event target resource links the rule to the lambda. In the console, we can link the event rule either via the lambda configuration or by editing the event rule. Here, we need to define a resource of aws_lambda_permission to link a custom rule to the lambda. The private subnets, security group used by the API service and cluster name are set as environment variables on the lambda.
The lambda code parses the event and uses it to run a one-off ECS task:
In the lambda, we use the ecs client to run a task whereby we use the same image as the application but override the command to run alembic upgrade head. In run_task, we added the additional tag of startedBy so we can query for the migration task status. I have bundled the database migration files when building the application container image. We can check the cloudwatch logs to ensure that the migration has run successfully.
Deployment Pipeline
In the github deployment workflow, to ensure that the latest database migrations are applied, we build the new image as normal since the migration files are bundled with the application container.
During the deployment stage, we add an additional stage of running the migration before deploying the new task definition via code deploy:
The events.sh script takes as parameters the task definition name and the output event json file to pass to put-events. It sleeps for 10 seconds after which we query for tasks with the ecs.migration started by tag and wait for the tasks to stop before we proceed to the code deploy stage.
The events.sh script replaces the detail body of the event with the task definition arn from a template:
To run the migration manually, you can also create an event with the matching attributes in the console or the CLI. Below is a screenshot of invoking the lambda via the EventBridge send events:
With the following setup, I was able to successfully run database migration