Skip to content

Deployment: Staging deploy workflow for EC2 via SSM#770

Merged
Ayush8923 merged 16 commits intomainfrom
feat/staging-deployment-ec2
Apr 28, 2026
Merged

Deployment: Staging deploy workflow for EC2 via SSM#770
Ayush8923 merged 16 commits intomainfrom
feat/staging-deployment-ec2

Conversation

@Ayush8923
Copy link
Copy Markdown
Collaborator

@Ayush8923 Ayush8923 commented Apr 20, 2026

Issue: #794

Summary

  • New staging deployment via EC2 + SSM
  • Dedicated staging Compose file which runs only backend and celery_worker since staging uses external RDS / EC2-hosted Redis & RabbitMQ.
  • Rename of cd-staging.ymlcd-staging-ecs.yml to make room for the new EC2-based variant.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

  • Chores
    • Added automated staging environment deployment workflow for EC2 instances.
    • Enhanced production deployment workflow with automated database migration execution.
    • Added Docker Compose configuration for staging environment with backend and worker services.

@Ayush8923 Ayush8923 self-assigned this Apr 20, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new GitHub Actions workflow to deploy staging to an EC2 instance via SSM (assume-role, send RunShellScript that pulls, builds, runs migrations, and starts containers), updates the production workflow to tag images and run Alembic migrations in a one-off ECS Fargate task, and adds a staging docker-compose file with backend and Celery services.

Changes

Cohort / File(s) Summary
Staging EC2 deployment workflow
.github/workflows/cd-staging-ec2.yml
New workflow that assumes secrets.STAGING_EC2_DEPLOY_ROLE_ARN, sends AWS-RunShellScript via SSM to secrets.STAGING_EC2_INSTANCE_ID, captures CommandId, waits for completion, and outputs status/stdout/stderr. Script runs git fetch/pull, docker compose -f docker-compose.staging.yml build, an ephemeral container to run uv run alembic upgrade head, docker compose up -d --remove-orphans, and docker image prune -f. Uses environment: AWS_STAGING_ENV, concurrency and OIDC permissions.
Production workflow — image tagging & DB migrations
.github/workflows/cd-production.yml
Builds/pushes backend image with an additional tag for github.ref_name and adds a step that launches a one-off ECS Fargate task using the existing task definition with an overridden container command alembic upgrade head; waits for task completion, checks container exit code, and fails the job on non-zero exit.
Staging Compose
docker-compose.staging.yml
New staging Compose config with backend (Uvicorn via uv, healthcheck, optional develop.watch sync, port mapping) and celery_worker (Celery worker dependent on backend health). Both auto-restart and log to CloudWatch (awslogs).

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant STS as AWS STS (AssumeRole)
    participant SSM as AWS SSM (SendCommand / Wait)
    participant EC2 as EC2 Instance (ssm-agent + Docker)

    GH->>STS: Assume role using `STAGING_EC2_DEPLOY_ROLE_ARN`
    STS-->>GH: Temporary credentials
    GH->>SSM: SendCommand (AWS-RunShellScript) to `STAGING_EC2_INSTANCE_ID` (script)
    SSM-->>GH: CommandId
    SSM->>EC2: Deliver RunShellScript to ssm-agent
    EC2->>EC2: execute script: git fetch/pull, docker compose build, run migration container, docker compose up -d, docker image prune
    EC2-->>SSM: Command execution status, stdout/stderr
    GH->>SSM: aws ssm wait command-executed (CommandId)
    SSM-->>GH: final invocation status and logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • kartpop
  • AkhileshNegi
  • Prajna1999

Poem

🐇 I hopped through logs and code tonight,
Told EC2 to pull and set things right,
Migrations hummed and containers came alive,
Staging stretched its paws and started to thrive.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly matches the main change: introducing a new staging deployment workflow for EC2 via SSM, which is the primary focus of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/staging-deployment-ec2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/cd-staging-ec2.yml (1)

44-56: Always fetch SSM output, even when the waiter fails.

If aws ssm wait command-executed exits non-zero, the step can stop before printing Stdout/Stderr, which makes failed deploys harder to debug. Capture the waiter status, fetch invocation output, then exit with the original status.

🔎 Proposed diagnostic improvement
       - name: Wait for SSM command to finish
         env:
           INSTANCE_ID: ${{ vars.STAGING_EC2_INSTANCE_ID }}
           CMD_ID: ${{ steps.ssm.outputs.cmd_id }}
         run: |
-          aws ssm wait command-executed \
+          set +e
+          aws ssm wait command-executed \
             --command-id "$CMD_ID" \
             --instance-id "$INSTANCE_ID"
+          WAIT_STATUS=$?
           aws ssm get-command-invocation \
             --command-id "$CMD_ID" \
             --instance-id "$INSTANCE_ID" \
             --query '{Status:Status,Stdout:StandardOutputContent,Stderr:StandardErrorContent}' \
             --output json
+          GET_STATUS=$?
+          if [ "$WAIT_STATUS" -ne 0 ]; then
+            exit "$WAIT_STATUS"
+          fi
+          exit "$GET_STATUS"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-staging-ec2.yml around lines 44 - 56, The step
currently runs aws ssm wait command-executed and then aws ssm
get-command-invocation, but if the waiter exits non-zero the step can stop
before printing Stdout/Stderr; update the run block to capture the waiter's exit
status (e.g., save $? into a variable after running aws ssm wait
command-executed using the env vars INSTANCE_ID and CMD_ID), then always call
aws ssm get-command-invocation with the same --command-id "$CMD_ID" and
--instance-id "$INSTANCE_ID" to print Status/Stdout/Stderr, and finally exit
with the saved waiter exit code so the job preserves the original failure
status.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-staging-ec2.yml:
- Line 38: Replace the docker-compose flag --abort-on-container-exit with
--exit-code-from prestart in the long commands parameter so the prestart
service's exit code is propagated and will cause the SSM command to fail; update
the commands string that runs docker compose (the parameter containing "docker
compose --profile prestart up prestart --build ...") to use "--exit-code-from
prestart" (which implies abort-on-container-exit) so failures in the prestart
service prevent the subsequent "docker compose up -d" step from running while
keeping the surrounding "set -eux" behavior.
- Around line 3-6: The workflow is checking out origin main instead of the
commit that triggered the run; update the checkout step (where actions/checkout
is used) to use the triggering commit instead of hardcoding main — e.g., replace
any ref: main with ref: ${{ github.sha }} (or remove ref to let the action
default to the event commit) and ensure fetch-depth/refs are set appropriately
(fetch-depth: 0 if you need full history) so the deployed revision matches the
triggering push on feat/staging-deployment-ec2.

---

Nitpick comments:
In @.github/workflows/cd-staging-ec2.yml:
- Around line 44-56: The step currently runs aws ssm wait command-executed and
then aws ssm get-command-invocation, but if the waiter exits non-zero the step
can stop before printing Stdout/Stderr; update the run block to capture the
waiter's exit status (e.g., save $? into a variable after running aws ssm wait
command-executed using the env vars INSTANCE_ID and CMD_ID), then always call
aws ssm get-command-invocation with the same --command-id "$CMD_ID" and
--instance-id "$INSTANCE_ID" to print Status/Stdout/Stderr, and finally exit
with the saved waiter exit code so the job preserves the original failure
status.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47e35146-1215-4326-bf15-ffdfec3b6967

📥 Commits

Reviewing files that changed from the base of the PR and between d512861 and 3835163.

📒 Files selected for processing (1)
  • .github/workflows/cd-staging-ec2.yml

Comment thread .github/workflows/cd-staging-ec2.yml
Comment thread .github/workflows/cd-staging-ec2.yml Outdated
--instance-ids "$INSTANCE_ID" \
--document-name "AWS-RunShellScript" \
--comment "Deploy kaapi-backend @ ${GIT_SHA::7}" \
--parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose --profile prestart up prestart --build --abort-on-container-exit && docker compose up -d --build --remove-orphans && docker image prune -f\""]' \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/cd-staging-ec2.yml | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2142


🌐 Web query:

Docker compose up --exit-code-from documentation

💡 Result:

The --exit-code-from option in docker compose up returns the exit code of the selected service container, implying --abort-on-container-exit which stops all containers if the specified one stops. Usage: docker compose up --exit-code-from SERVICE [OPTIONS] [SERVICE...]. This is from the official Docker documentation for docker compose up. It is incompatible with --detach (-d). Without this flag, docker compose up exits with code 1 on errors or 0 on SIGINT/SIGTERM interruption, regardless of individual container exit codes.

Citations:


Replace --abort-on-container-exit with --exit-code-from prestart to propagate prestart failures.

When prestart fails, the deploy gate should exit with its exit code. The --exit-code-from prestart flag returns the prestart service's exit code and implies --abort-on-container-exit, ensuring failed migrations or prestart scripts fail the SSM command before the new app starts. Combined with set -eux, this guarantees the subsequent docker compose up -d won't execute on prestart failures.

Proposed fix
-            --parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose --profile prestart up prestart --build --abort-on-container-exit && docker compose up -d --build --remove-orphans && docker image prune -f\""]' \
+            --parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose --profile prestart up prestart --build --exit-code-from prestart && docker compose up -d --build --remove-orphans && docker image prune -f\""]' \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose --profile prestart up prestart --build --abort-on-container-exit && docker compose up -d --build --remove-orphans && docker image prune -f\""]' \
--parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose --profile prestart up prestart --build --exit-code-from prestart && docker compose up -d --build --remove-orphans && docker image prune -f\""]' \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-staging-ec2.yml at line 38, Replace the docker-compose
flag --abort-on-container-exit with --exit-code-from prestart in the long
commands parameter so the prestart service's exit code is propagated and will
cause the SSM command to fail; update the commands string that runs docker
compose (the parameter containing "docker compose --profile prestart up prestart
--build ...") to use "--exit-code-from prestart" (which implies
abort-on-container-exit) so failures in the prestart service prevent the
subsequent "docker compose up -d" step from running while keeping the
surrounding "set -eux" behavior.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 10:12 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 11:08 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 11:12 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 16:08 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 16:13 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 20, 2026 16:29 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 21, 2026 03:19 — with GitHub Actions Inactive
Comment thread .github/workflows/cd-staging-ec2.yml Outdated
--instance-ids "$INSTANCE_ID" \
--document-name "AWS-RunShellScript" \
--comment "Deploy kaapi-backend kaapi-staging" \
--parameters commands='["set -eux","sudo -iu ubuntu bash -lc \"cd /data/kaapi-backend && git fetch --all && git pull origin main && docker compose build && docker compose run --rm backend uv run alembic upgrade head && docker compose up -d --remove-orphans && docker image prune -f\""]' \
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this script only for backend, celery processes.

@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 27, 2026 08:37 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV_VARS April 27, 2026 08:42 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
.github/workflows/cd-production.yml (2)

66-72: Exit-code check is fragile when the task never started.

If the migration task fails before the container runs (image pull failure, ENI/networking issues, capacity), containers[?name==…].exitCode returns None. The current check correctly fails the job ("None" != "0"), but the diagnostic message is misleading and you lose the real reason. Consider also surfacing stopCode / stoppedReason and the container's reason for faster triage.

♻️ Proposed change
           EXIT_CODE=$(aws ecs describe-tasks --cluster $CLUSTER --tasks "$TASK_ARN" \
             --query "tasks[0].containers[?name=='$CONTAINER_NAME'].exitCode | [0]" --output text)

           if [ "$EXIT_CODE" != "0" ]; then
-            echo "Migration failed with exit code $EXIT_CODE"
+            echo "Migration failed with exit code $EXIT_CODE"
+            aws ecs describe-tasks --cluster $CLUSTER --tasks "$TASK_ARN" \
+              --query "tasks[0].{stopCode:stopCode,stoppedReason:stoppedReason,containerReason:containers[0].reason}" \
+              --output table
             exit 1
           fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-production.yml around lines 66 - 72, The current
EXIT_CODE check treats a missing exitCode ("None") the same as a non-zero exit,
losing real failure reasons; update the aws ecs describe-tasks call to also
fetch task.stopCode and task.stoppedReason and the container's reason (e.g.,
containers[?name=='$CONTAINER_NAME'].reason) alongside
containers[?name=='$CONTAINER_NAME'].exitCode, then change the logic that reads
EXIT_CODE to detect a "None"/empty exitCode and print a clear diagnostic
containing stopCode, stoppedReason and container reason (falling back to
exitCode when present) before failing; reference variables: EXIT_CODE, CLUSTER,
TASK_ARN, CONTAINER_NAME and fields stopCode, stoppedReason,
containers[].reason.

33-40: Production image is only tagged latest — release tag is discarded.

TAG: ${{ github.ref_name }} is defined but never used; only :latest is built and pushed. This loses traceability of which vX.Y.Z is actually running in production and makes rollbacks awkward (you cannot docker pull <repo>:v1.2.3). Also, the migration step on lines 53-61 reuses the existing task definition (which presumably points to :latest), so whether the new image is migrated depends entirely on the task def — pinning would make this deterministic.

♻️ Proposed change
         run: |
-          docker build -t $REGISTRY/$REPOSITORY:latest ./backend
-          docker push $REGISTRY/$REPOSITORY:latest
+          docker build -t $REGISTRY/$REPOSITORY:latest -t $REGISTRY/$REPOSITORY:$TAG ./backend
+          docker push $REGISTRY/$REPOSITORY:latest
+          docker push $REGISTRY/$REPOSITORY:$TAG
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-production.yml around lines 33 - 40, The workflow
currently builds and pushes only :latest; update the docker build/push commands
to also tag and push the release tag (use the existing TAG variable) so the
image is published as $REGISTRY/$REPOSITORY:$TAG (and optionally also :latest),
i.e. run docker build -t $REGISTRY/$REPOSITORY:$TAG ... and docker push
$REGISTRY/$REPOSITORY:$TAG (plus push :latest if wanted). Then change the
migration/deployment steps that reference the task definition to pin the image
to $REGISTRY/$REPOSITORY:$TAG (update the task definition JSON or the ECS update
step to set the container image to the $TAGged image) so migrations and service
updates use the exact release image instead of an implicit :latest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-production.yml:
- Around line 53-61: The run-task invocation passes --network-configuration
twice, which causes the first derived NETWORK_CONFIG to be ignored; fix by using
a single network configuration source: either remove the inline
awsvpcConfiguration JSON and reuse the existing $NETWORK_CONFIG (and if needed
add assignPublicIp=ENABLED into how $NETWORK_CONFIG is built), or drop
$NETWORK_CONFIG and keep the inline awsvpcConfiguration but validate that
secrets.SUBNET_IDS and secrets.SECURITY_GROUP_IDS exist; update the aws ecs
run-task command (TASK_ARN assignment) to include only one
--network-configuration and ensure CONTAINER_NAME and the chosen network config
are consistent.
- Around line 42-46: The workflow's migration step is pointing at staging
resources (env vars CLUSTER, SERVICE, TASK_DEF) so `alembic upgrade head` will
run against staging; update these env vars in cd-production.yml to reference the
correct production cluster/service/task used by the Deploy to ECS step (replace
the `-staging-` suffix with the production resource names or the exact task
family name used in the Deploy to ECS step) so migrations run against the
production DB before deployment.
- Around line 42-74: The "Run database migrations" step uses staging resource
names; change the env values CLUSTER, SERVICE, TASK_DEF in that step from `${{
vars.AWS_RESOURCE_PREFIX }}-staging-cluster`, `-staging-service`,
`-staging-task` to the production names `${{ vars.AWS_RESOURCE_PREFIX
}}-cluster`, `${{ vars.AWS_RESOURCE_PREFIX }}-service`, `${{
vars.AWS_RESOURCE_PREFIX }}-task`; after updating, verify that
secrets.SUBNET_IDS and secrets.SECURITY_GROUP_IDS are present and
comma-separated for production, ensure the production TASK_DEF's container image
tag matches the image pushed by the build (or is pinned, e.g., :latest as
intended), and confirm the IAM role
`arn:aws:iam::024209611402:role/github-action-role` has ecs:RunTask,
ecs:DescribeTasks, ecs:DescribeServices, ecs:DescribeTaskDefinition and
iam:PassRole permissions.

---

Nitpick comments:
In @.github/workflows/cd-production.yml:
- Around line 66-72: The current EXIT_CODE check treats a missing exitCode
("None") the same as a non-zero exit, losing real failure reasons; update the
aws ecs describe-tasks call to also fetch task.stopCode and task.stoppedReason
and the container's reason (e.g., containers[?name=='$CONTAINER_NAME'].reason)
alongside containers[?name=='$CONTAINER_NAME'].exitCode, then change the logic
that reads EXIT_CODE to detect a "None"/empty exitCode and print a clear
diagnostic containing stopCode, stoppedReason and container reason (falling back
to exitCode when present) before failing; reference variables: EXIT_CODE,
CLUSTER, TASK_ARN, CONTAINER_NAME and fields stopCode, stoppedReason,
containers[].reason.
- Around line 33-40: The workflow currently builds and pushes only :latest;
update the docker build/push commands to also tag and push the release tag (use
the existing TAG variable) so the image is published as
$REGISTRY/$REPOSITORY:$TAG (and optionally also :latest), i.e. run docker build
-t $REGISTRY/$REPOSITORY:$TAG ... and docker push $REGISTRY/$REPOSITORY:$TAG
(plus push :latest if wanted). Then change the migration/deployment steps that
reference the task definition to pin the image to $REGISTRY/$REPOSITORY:$TAG
(update the task definition JSON or the ECS update step to set the container
image to the $TAGged image) so migrations and service updates use the exact
release image instead of an implicit :latest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44088606-9df7-491b-a4f7-889a87c659ec

📥 Commits

Reviewing files that changed from the base of the PR and between d302669 and 585ee53.

📒 Files selected for processing (1)
  • .github/workflows/cd-production.yml

Comment thread .github/workflows/cd-production.yml Outdated
Comment thread .github/workflows/cd-production.yml
Comment thread .github/workflows/cd-production.yml
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV April 27, 2026 09:03 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV April 27, 2026 09:13 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/cd-production.yml (1)

52-52: Hard-coding containerDefinitions[0] is brittle if a sidecar is ever added.

Picking the first container by index works today but silently breaks the moment a log-router/sidecar (FireLens, Datadog agent, etc.) is added to the task definition — run-task would then issue alembic upgrade head to the wrong container and the migration would either fail or, worse, succeed as a no-op. Consider matching by a known name (e.g. backend) or by essential: true:

-          CONTAINER_NAME=$(aws ecs describe-task-definition --task-definition $TASK_DEF --query 'taskDefinition.containerDefinitions[0].name' --output text)
+          CONTAINER_NAME=$(aws ecs describe-task-definition --task-definition $TASK_DEF \
+            --query "taskDefinition.containerDefinitions[?name=='backend'].name | [0]" --output text)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-production.yml at line 52, The current CONTAINER_NAME
extraction uses taskDefinition.containerDefinitions[0].name which breaks if a
sidecar is added; update the aws ecs describe-task-definition --query to select
the intended container by a stable identifier (e.g. match name == 'backend' or
filter essential==`true`) instead of index 0 so run-task executes alembic
against the real app container (use a JMESPath filter such as
taskDefinition.containerDefinitions[?name=='backend'].name | [0] or
taskDefinition.containerDefinitions[?essential==`true`].name | [0] when setting
CONTAINER_NAME).
.github/workflows/cd-staging-ec2.yml (1)

53-57: StandardOutputContent/StandardErrorContent are truncated; surface the CloudWatch log location too.

get-command-invocation truncates StandardOutputContent/StandardErrorContent to ~2500 chars, which is easy to overflow with a docker compose build. Since you've already enabled --cloud-watch-output-config CloudWatchOutputEnabled=true on line 38, also echo the log group/stream so failures are debuggable from the workflow run without round-tripping to the AWS console:

           aws ssm get-command-invocation \
             --command-id "$CMD_ID" \
             --instance-id "$INSTANCE_ID" \
             --query '{Status:Status,Stdout:StandardOutputContent,Stderr:StandardErrorContent}' \
             --output json
+
+          echo "Full output: CloudWatch log group /aws/ssm/AWS-RunShellScript, stream ${CMD_ID}/${INSTANCE_ID}/aws-runShellScript/stdout (and .../stderr)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-staging-ec2.yml around lines 53 - 57,
get-command-invocation currently only prints truncated
StandardOutputContent/StandardErrorContent; update the aws ssm
get-command-invocation call to also return and echo the CloudWatch log location
so failures are debuggable without visiting the console (you already enable
--cloud-watch-output-config CloudWatchOutputEnabled=true). Modify the --query
used in the get-command-invocation invocation to include the CloudWatch
output/log group and stream fields (from the get-command-invocation response
object such as the CloudWatch output config/log identifiers) so the workflow
prints the CloudWatch log group/stream (or URL) alongside Status/Stdout/Stderr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-staging-ec2.yml:
- Around line 47-59: Replace the unreliable aws ssm waiter with a manual polling
loop that repeatedly calls aws ssm get-command-invocation using the existing
CMD_ID and INSTANCE_ID until the invocation Status is one of the terminal values
(Success, Failed, Cancelled, TimedOut); implement a sleep between polls (e.g.,
5s) and enforce a maximum timeout equal to the SSM document executionTimeout
(3600s) to avoid infinite waiting, then exit with code 0 on Success and non‑zero
on Failed/Cancelled/TimedOut (propagate that exit code) instead of using the
WAIT_EXIT waiter approach.

---

Nitpick comments:
In @.github/workflows/cd-production.yml:
- Line 52: The current CONTAINER_NAME extraction uses
taskDefinition.containerDefinitions[0].name which breaks if a sidecar is added;
update the aws ecs describe-task-definition --query to select the intended
container by a stable identifier (e.g. match name == 'backend' or filter
essential==`true`) instead of index 0 so run-task executes alembic against the
real app container (use a JMESPath filter such as
taskDefinition.containerDefinitions[?name=='backend'].name | [0] or
taskDefinition.containerDefinitions[?essential==`true`].name | [0] when setting
CONTAINER_NAME).

In @.github/workflows/cd-staging-ec2.yml:
- Around line 53-57: get-command-invocation currently only prints truncated
StandardOutputContent/StandardErrorContent; update the aws ssm
get-command-invocation call to also return and echo the CloudWatch log location
so failures are debuggable without visiting the console (you already enable
--cloud-watch-output-config CloudWatchOutputEnabled=true). Modify the --query
used in the get-command-invocation invocation to include the CloudWatch
output/log group and stream fields (from the get-command-invocation response
object such as the CloudWatch output config/log identifiers) so the workflow
prints the CloudWatch log group/stream (or URL) alongside Status/Stdout/Stderr.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 815aae9e-5812-4105-90c9-cb8d941d4110

📥 Commits

Reviewing files that changed from the base of the PR and between 585ee53 and 38a18e9.

📒 Files selected for processing (3)
  • .github/workflows/cd-production.yml
  • .github/workflows/cd-staging-ec2.yml
  • docker-compose.staging.yml
✅ Files skipped from review due to trivial changes (1)
  • docker-compose.staging.yml

Comment thread .github/workflows/cd-staging-ec2.yml
@Ayush8923 Ayush8923 requested review from AkhileshNegi and vprashrex and removed request for AkhileshNegi April 28, 2026 04:45
@Ayush8923 Ayush8923 removed the request for review from vprashrex April 28, 2026 04:46
@Ayush8923 Ayush8923 temporarily deployed to AWS_STAGING_ENV April 28, 2026 05:51 — with GitHub Actions Inactive
@Ayush8923 Ayush8923 linked an issue Apr 28, 2026 that may be closed by this pull request
@Ayush8923 Ayush8923 merged commit ce5346a into main Apr 28, 2026
3 checks passed
@Ayush8923 Ayush8923 deleted the feat/staging-deployment-ec2 branch April 28, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deployment: Automate CD workflow

2 participants