Automated load testing framework for audit checks with Claude Code assistance.
- Access to k-repo repository
- S3 access to
klaviyo-data-platform-orchestration-v1 - Airflow access for manual DAG triggers
- PyCharm (for reviewing config diffs)
cd ~/Klaviyo/Repos/audit-load-test
# Configure target repo path
cp .env.example .env
# Edit .env to set TARGET_REPO path
# Ensure k-repo is on the test branch
cd ~/Klaviyo/Repos/k-repo
git checkout main
git pull
git checkout -b audit_checks_load_testing
# Login to AWS
s2a-loginEnter into audit load test repo
cd ~/Klaviyo/Repos/audit-load-testLogin to Claude
claudeFirst-time setup (if Claude doesn't know the context):
"Read README.md, start 2x load test"
This single command tells Claude to:
- Read this README to understand the entire workflow
- Begin the load test process
Tell Claude:
Starting tests:
"Read README.md, start 2x load test"- Start 2x for 15m (default) - USE THIS FIRST TIME"Start 4x load test for hourly"- Start 4x for hourly schedule"Start 8x load test for daily"- Start 8x for daily schedule
During test execution:
"I am going to run another round of test"- Captures timestamp for round tracking"I just run the hourly dag"- Captures timestamp for hourly schedule test
After test completes:
"Test is done, generate report"- Same cluster (Claude remembers)"Test is done, cluster id is j-XXXXX, generate report"- Different cluster"Same cluster, generate report"- Alternative for same cluster
Progressing to next load level:
"Commit results, start 4x load test"- Move from 2x to 4x (any schedule)"Commit results, start 8x load test for hourly"- Move to 8x for hourly
Note:
- Load multipliers (2x, 4x, 8x) work for any schedule (15m, hourly, daily)
- Following Load Test Workflow explains the whole test procedures, including how human and Claude collaboration
- This is an example of Claude generated checks file for 4x load test - https://github.com/klaviyo/k-repo/pull/19659
cd ~/Klaviyo/Repos/audit-load-test
s2a-login
# Ensure k-repo is on correct branch
cd ~/Klaviyo/Repos/k-repo
git status # Should be on audit_checks_load_testing branchClaude checkpoint: Verify k-repo is on audit_checks_load_testing branch
Input: Load multiplier (e.g., 2x, 4x, 8x)
Claude will:
- Read
.envto findTARGET_REPOandCHECKS_PATH - Read existing configs from:
${TARGET_REPO}/${CHECKS_PATH}/ - Modify configs in-place with load multiplier
- Prompt human to review changes
Config Location:
${TARGET_REPO}/python/klaviyo/data_platform/table_config_service/
unified_table_definitions/checks/iceberg/
Modification Rules:
| Check Type | File Pattern | Modification | Example (2x) |
|---|---|---|---|
| 15m | *_15m.yml |
Multiply INTERVAL 'X' hour by load factor |
INTERVAL '1' hour → INTERVAL '2' hourINTERVAL '3' hour → INTERVAL '6' hour |
| Hourly | *_hourly.yml |
NO CHANGES | Skip - uses Airflow runtime params (ts_start, ts_end) |
| Daily | *_daily.yml |
Multiply INTERVAL 'X' day by load factor |
INTERVAL '1' day → INTERVAL '2' dayINTERVAL '31' day → INTERVAL '62' day |
Example Modifications (2x load):
# 15m check - BEFORE
where: created_at >= current_timestamp - INTERVAL '3' hour and created_at <= current_timestamp
# 15m check - AFTER (2x)
where: created_at >= current_timestamp - INTERVAL '6' hour and created_at <= current_timestamp# Daily check - BEFORE
where: matched_at < current_timestamp - INTERVAL '31' day
# Daily check - AFTER (2x)
where: matched_at < current_timestamp - INTERVAL '62' dayClaude output:
✓ Modified 47 config files for 2x load
- 23 files in *_15m.yml
- 0 files in *_hourly.yml (skipped - uses runtime params)
- 24 files in *_daily.yml
Changes location: ${TARGET_REPO}/python/klaviyo/.../checks/iceberg/
→ HUMAN: Open k-repo in PyCharm and review git diff
→ Verify all interval multiplications are correct
→ Reply "approved" to continue, or tell me about any issues to fix
HUMAN CHECKPOINT:
- Open k-repo in PyCharm
- Review git diff for modified files
- Verify interval changes are correct
- Reply "approved" or report issues
If configs approved, Claude will:
./scripts/upload_checks.sh 2xScript behavior:
- Reads
TARGET_REPOandCHECKS_PATHfrom.env - Uploads modified configs from k-repo to S3:
*_15m.yml,*_15min.yml,*_freshness.yml→s3://.../checks_load_test/configs/15m/2x/*_daily.yml→s3://.../checks_load_test/configs/daily/2x/*_hourly.yml,*_uniqueness.yml→s3://.../checks_load_test/configs/hourly/2x/
- Copies
*_table_metrics.ymlfrom prod only to hourly folders (1x, 2x, 4x)
./scripts/upload_dags.shScript behavior:
- Checks current LOAD variable in
yang_audit_load_test.py - Prompts for LOAD value (1, 2, 4, 8, etc.)
- This is CRITICAL: LOAD must match your config upload level
- LOAD = 2 → reads from
s3://.../configs/{schedule}/2x/ - If you just press Enter, it keeps the current value
- Updates LOAD variable in the DAG file if changed
- Checks if
s3://.../dags/datalake/audit_loadtest/exists - If exists: Prompts "Folder exists. Overwrite? (y/n)"
- Uploads from
./dags/:yang_audit_load_test.py(with updated LOAD variable)cluster_configs/yang_audit_load_test_cluster_15m.jsoncluster_configs/yang_audit_load_test_cluster_hourly.jsoncluster_configs/yang_audit_load_test_cluster_daily.json
HUMAN CHECKPOINT: Confirm both uploads succeeded
Actions:
- Open Airflow UI
- Find DAG:
yang_audit_load_test_15m(or_hourly,_daily) - Trigger the DAG manually
- Wait for test to complete
After test completes, tell Claude:
Test is done
Or if using the same cluster from previous conversation:
Same cluster, generate report
Or provide a different cluster ID:
Cluster ID: j-XXXXXXXXXXXXX, generate report
Note: You don't need to tell Claude the load level - Claude already knows it from the previous steps.
Claude will:
- Monitor EMR cluster status using AWS CLI
- Poll until all steps complete
- Fetch step execution metadata:
- Step name
- Start timestamp
- End timestamp
- Elapsed time
- Status (SUCCESS/FAILED)
- Generate performance report
Report includes:
- Summary statistics (total steps, success/failure counts, total runtime)
- Top 20 slowest steps sorted by elapsed time
- Performance metrics (average, median, P95, P99)
- Failed steps with error details
- Optimization suggestions
Report saved to: ./reports/{load_level}_{timestamp}/report.md
If you need to run multiple test rounds on the same cluster, use timestamp filtering:
Before triggering new test round:
# Get current timestamp to mark the start of this round
./scripts/generate_report.py --current-timestamp
# Output: 2025-12-08T15:30:00.123456-05:00Important: The timestamp includes timezone (e.g., -05:00 for EST in December, -04:00 for EDT in summer). The script automatically uses your system's current timezone to match AWS EMR timestamps.
After test completes:
# Generate report only for steps created after the timestamp
./scripts/generate_report.py \
--cluster-id j-12LC3PZVYNXDJ \
--load 2x \
--after-timestamp "2025-12-08T15:30:00.123456-05:00"Note: Use the complete timestamp with timezone as output by --current-timestamp. This ensures accurate filtering that matches AWS EMR's timestamp format.
This approach filters steps by creation time, ensuring you only see results from the current test round.
Example Report:
# Load Test Report - 2x Load
Date: 2025-12-07 15:30:00
Cluster: j-2ABCDEFG12345
## Summary
- Total Steps: 150
- Successful: 148
- Failed: 2
- Total Runtime: 2h 34m 18s
## Top 20 Slowest Steps
| Rank | Step Name | Elapsed | Status | Start Time |
|------|-----------|---------|--------|------------|
| 1 | check_events_daily_2x | 45m 23s | SUCCESS | 15:30:05 |
| 2 | check_profiles_hourly_2x | 38m 12s | SUCCESS | 15:31:22 |
...
## Performance Metrics
- Average: 5m 12s
- Median (P50): 3m 45s
- P95: 32m 45s
- P99: 42m 18s
## Optimization Suggestions
1. check_events_daily (45m) - Consider increasing executor memory
2. check_profiles_hourly (38m) - Review join strategy
...HUMAN CHECKPOINT: Review report → decide next action
Human: "The config for check_events_15m has wrong interval, should be 4 hours not 6"
Claude: I'll fix that issue.
Updating: ${TARGET_REPO}/.../check_events_15m.yml
Change: INTERVAL '6' hour → INTERVAL '4' hour
✓ Fixed
→ Review the change and reply "approved" to continue
Human: "Performance is slow. I'll update cluster config and rerun 2x"
Claude: Understood. Here's the iteration workflow:
1. Keep check configs as-is (already uploaded to S3 2x folder)
2. Update cluster configs in: ./dags/cluster_configs/*.json
3. Run: ./scripts/upload_dags.sh (re-upload with new cluster configs)
4. Trigger DAG again in Airflow (uses existing 2x check configs from S3)
5. Provide new cluster ID to Claude for monitoring
→ Ready to update cluster configs now?```
Human: "2x test complete. Commit results and start 4x"
Claude: Recording 2x test results:
In k-repo (commit modified configs):
cd ${TARGET_REPO}
git add ./python/klaviyo/.../checks/
git commit -m "Load test 2x configs - 2025-12-07"
In audit-load-test repo (commit report):
cd ~/Klaviyo/Repos/audit-load-test
git add ./reports/2x_*
git commit -m "Load test 2x results - [key findings]"
✓ Ready for 4x load test
→ Proceed to generate 4x configs?
Human: "Revert configs to baseline"
Claude: Reverting configs in k-repo:
cd ${TARGET_REPO}
git checkout ./python/klaviyo/.../checks/iceberg/
✓ All configs reverted to baseline
→ Ready for new load test or other operations
s3://klaviyo-data-platform-orchestration-v1/
├── checks_load_test/
│ ├── configs/ # Check configs by load level
│ │ ├── 15m/
│ │ │ ├── 1x/ # Baseline
│ │ │ ├── 2x/
│ │ │ └── 4x/
│ │ ├── hourly/
│ │ │ ├── 1x/
│ │ │ ├── 2x/
│ │ │ └── 4x/ # ← includes *_table_metrics.yml
│ │ └── daily/
│ │ ├── 1x/
│ │ ├── 2x/
│ │ └── 4x/
│ └── results/ # EMR outputs (not used for timing)
│ └── ...
│
└── dags/datalake/audit_loadtest/ # DAG and cluster configs
├── yang_audit_load_test.py
├── cluster_configs/
│ ├── yang_audit_load_test_cluster_15m.json
│ ├── yang_audit_load_test_cluster_hourly.json
│ └── yang_audit_load_test_cluster_daily.json
audit-load-test/
├── README.md # This file
├── .env # Local config (not committed)
├── .env.example # Template
├── .gitignore
├── scripts/
│ ├── upload_checks.sh # Upload check configs to S3
│ └── upload_dags.sh # Upload DAG + cluster configs to S3
├── dags/
│ ├── yang_audit_load_test.py # Airflow DAG
│ └── cluster_configs/
│ ├── yang_audit_load_test_cluster_15m.json
│ ├── yang_audit_load_test_cluster_hourly.json
│ └── yang_audit_load_test_cluster_daily.json
└── reports/ # Generated reports
├── 2x_20251207_153000/
│ └── report.md
└── 4x_20251207_183000/
└── report.md
# Target repository path
TARGET_REPO=~/Klaviyo/Repos/k-repo
# Path to checks configs within target repo
CHECKS_PATH=python/klaviyo/data_platform/table_config_service/unified_table_definitions/checks/iceberg
# AWS S3 bucket
S3_BUCKET=klaviyo-data-platform-orchestration-v1
# Source for table_metrics configs
PROD_HOURLY_CHECKS=s3://klaviyo-data-platform-orchestration-v1/checks/configs/prod/hourly# Make scripts executable
chmod +x ./scripts/*.sh
# Refresh AWS credentials
s2a-login# Check current branch
cd ${TARGET_REPO}
git status
# Switch to test branch
git checkout audit_checks_load_testing
git pull# Test AWS CLI access
aws s3 ls s3://klaviyo-data-platform-orchestration-v1/
# Check credentials
aws sts get-caller-identity# Verify cluster ID format
aws emr list-clusters --active
# Check region
aws configure get regionProblem: Claude modified wrong intervals
Solution: Tell Claude the specific issue:
"Fix check_events_15m.yml: change INTERVAL '6' hour back to INTERVAL '4' hour"
- Always review git diff before uploading configs
- Commit after each successful load test for record keeping
- Use descriptive commit messages with key findings
- Don't merge test branches - keep them separate for load testing only
- Update cluster configs iteratively based on performance reports
- Use PyCharm for better diff visualization
- Keep reports in version control for historical comparison
- Test incrementally - start with 2x before jumping to 4x or higher
Q: Can I run multiple load levels in parallel?
A: No, run one load level at a time to get accurate performance measurements.
Q: Do I need to upload DAGs every time?
A: Only when cluster configs change. Check configs can be uploaded independently.
Q: What if I want to test 3x or 5x load?
A: Just tell Claude "start 3x load test" - it works with any multiplier.
Q: Can I revert configs without Claude?
A: Yes, just run git checkout . in k-repo to revert all changes.
Q: Where can I find audit test results?
A: Check CloudWatch Logs or S3 at s3://.../checks_load_test/results/
Q: How do I compare two load test results?
A: Ask Claude: "Compare reports from 2x and 4x load tests"
- v1.0 (2025-12-07) - Initial framework with Claude Code integration