Skip to content

Yang-Klaviyo/audit-load-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audit Checks Load Testing

Automated load testing framework for audit checks with Claude Code assistance.

Prerequisites

  • Access to k-repo repository
  • S3 access to klaviyo-data-platform-orchestration-v1
  • Airflow access for manual DAG triggers
  • PyCharm (for reviewing config diffs)

Setup (One-Time)

cd ~/Klaviyo/Repos/audit-load-test

# Configure target repo path
cp .env.example .env
# Edit .env to set TARGET_REPO path

# Ensure k-repo is on the test branch
cd ~/Klaviyo/Repos/k-repo
git checkout main
git pull
git checkout -b audit_checks_load_testing


# Login to AWS
s2a-login

Quick Start

Enter into audit load test repo

cd ~/Klaviyo/Repos/audit-load-test

Login to Claude

claude

First-time setup (if Claude doesn't know the context):

"Read README.md, start 2x load test"

This single command tells Claude to:

  1. Read this README to understand the entire workflow
  2. Begin the load test process

Tell Claude:

Starting tests:

  • "Read README.md, start 2x load test" - Start 2x for 15m (default) - USE THIS FIRST TIME
  • "Start 4x load test for hourly" - Start 4x for hourly schedule
  • "Start 8x load test for daily" - Start 8x for daily schedule

During test execution:

  • "I am going to run another round of test" - Captures timestamp for round tracking
  • "I just run the hourly dag" - Captures timestamp for hourly schedule test

After test completes:

  • "Test is done, generate report" - Same cluster (Claude remembers)
  • "Test is done, cluster id is j-XXXXX, generate report" - Different cluster
  • "Same cluster, generate report" - Alternative for same cluster

Progressing to next load level:

  • "Commit results, start 4x load test" - Move from 2x to 4x (any schedule)
  • "Commit results, start 8x load test for hourly" - Move to 8x for hourly

Note:

  1. Load multipliers (2x, 4x, 8x) work for any schedule (15m, hourly, daily)
  2. Following Load Test Workflow explains the whole test procedures, including how human and Claude collaboration
  3. This is an example of Claude generated checks file for 4x load test - https://github.com/klaviyo/k-repo/pull/19659

Load Test Workflow

Phase 1: Preparation (HUMAN)

cd ~/Klaviyo/Repos/audit-load-test
s2a-login

# Ensure k-repo is on correct branch
cd ~/Klaviyo/Repos/k-repo
git status  # Should be on audit_checks_load_testing branch

Claude checkpoint: Verify k-repo is on audit_checks_load_testing branch


Phase 2: Generate Load Test Configs (CLAUDE)

Input: Load multiplier (e.g., 2x, 4x, 8x)

Claude will:

  1. Read .env to find TARGET_REPO and CHECKS_PATH
  2. Read existing configs from: ${TARGET_REPO}/${CHECKS_PATH}/
  3. Modify configs in-place with load multiplier
  4. Prompt human to review changes

Config Location:

${TARGET_REPO}/python/klaviyo/data_platform/table_config_service/
unified_table_definitions/checks/iceberg/

Modification Rules:

Check Type File Pattern Modification Example (2x)
15m *_15m.yml Multiply INTERVAL 'X' hour by load factor INTERVAL '1' hourINTERVAL '2' hour
INTERVAL '3' hourINTERVAL '6' hour
Hourly *_hourly.yml NO CHANGES Skip - uses Airflow runtime params (ts_start, ts_end)
Daily *_daily.yml Multiply INTERVAL 'X' day by load factor INTERVAL '1' dayINTERVAL '2' day
INTERVAL '31' dayINTERVAL '62' day

Example Modifications (2x load):

# 15m check - BEFORE
where: created_at >= current_timestamp - INTERVAL '3' hour and created_at <= current_timestamp

# 15m check - AFTER (2x)
where: created_at >= current_timestamp - INTERVAL '6' hour and created_at <= current_timestamp
# Daily check - BEFORE
where: matched_at < current_timestamp - INTERVAL '31' day

# Daily check - AFTER (2x)
where: matched_at < current_timestamp - INTERVAL '62' day

Claude output:

✓ Modified 47 config files for 2x load
  - 23 files in *_15m.yml
  - 0 files in *_hourly.yml (skipped - uses runtime params)
  - 24 files in *_daily.yml

Changes location: ${TARGET_REPO}/python/klaviyo/.../checks/iceberg/

→ HUMAN: Open k-repo in PyCharm and review git diff
→ Verify all interval multiplications are correct
→ Reply "approved" to continue, or tell me about any issues to fix

HUMAN CHECKPOINT:

  • Open k-repo in PyCharm
  • Review git diff for modified files
  • Verify interval changes are correct
  • Reply "approved" or report issues

Phase 3: Upload Configs to S3 (CLAUDE)

If configs approved, Claude will:

3a. Upload Check Configs

./scripts/upload_checks.sh 2x

Script behavior:

  1. Reads TARGET_REPO and CHECKS_PATH from .env
  2. Uploads modified configs from k-repo to S3:
    • *_15m.yml, *_15min.yml, *_freshness.ymls3://.../checks_load_test/configs/15m/2x/
    • *_daily.ymls3://.../checks_load_test/configs/daily/2x/
    • *_hourly.yml, *_uniqueness.ymls3://.../checks_load_test/configs/hourly/2x/
  3. Copies *_table_metrics.yml from prod only to hourly folders (1x, 2x, 4x)

3b. Upload DAG and Cluster Configs

./scripts/upload_dags.sh

Script behavior:

  1. Checks current LOAD variable in yang_audit_load_test.py
  2. Prompts for LOAD value (1, 2, 4, 8, etc.)
    • This is CRITICAL: LOAD must match your config upload level
    • LOAD = 2 → reads from s3://.../configs/{schedule}/2x/
    • If you just press Enter, it keeps the current value
  3. Updates LOAD variable in the DAG file if changed
  4. Checks if s3://.../dags/datalake/audit_loadtest/ exists
  5. If exists: Prompts "Folder exists. Overwrite? (y/n)"
  6. Uploads from ./dags/:
    • yang_audit_load_test.py (with updated LOAD variable)
    • cluster_configs/yang_audit_load_test_cluster_15m.json
    • cluster_configs/yang_audit_load_test_cluster_hourly.json
    • cluster_configs/yang_audit_load_test_cluster_daily.json

HUMAN CHECKPOINT: Confirm both uploads succeeded


Phase 4: Trigger Load Test (HUMAN)

Actions:

  1. Open Airflow UI
  2. Find DAG: yang_audit_load_test_15m (or _hourly, _daily)
  3. Trigger the DAG manually
  4. Wait for test to complete

After test completes, tell Claude:

Test is done

Or if using the same cluster from previous conversation:

Same cluster, generate report

Or provide a different cluster ID:

Cluster ID: j-XXXXXXXXXXXXX, generate report

Note: You don't need to tell Claude the load level - Claude already knows it from the previous steps.


Phase 5: Monitor & Generate Report (CLAUDE)

Claude will:

  1. Monitor EMR cluster status using AWS CLI
  2. Poll until all steps complete
  3. Fetch step execution metadata:
    • Step name
    • Start timestamp
    • End timestamp
    • Elapsed time
    • Status (SUCCESS/FAILED)
  4. Generate performance report

Report includes:

  • Summary statistics (total steps, success/failure counts, total runtime)
  • Top 20 slowest steps sorted by elapsed time
  • Performance metrics (average, median, P95, P99)
  • Failed steps with error details
  • Optimization suggestions

Report saved to: ./reports/{load_level}_{timestamp}/report.md

Multiple Test Rounds on Same Cluster

If you need to run multiple test rounds on the same cluster, use timestamp filtering:

Before triggering new test round:

# Get current timestamp to mark the start of this round
./scripts/generate_report.py --current-timestamp
# Output: 2025-12-08T15:30:00.123456-05:00

Important: The timestamp includes timezone (e.g., -05:00 for EST in December, -04:00 for EDT in summer). The script automatically uses your system's current timezone to match AWS EMR timestamps.

After test completes:

# Generate report only for steps created after the timestamp
./scripts/generate_report.py \
  --cluster-id j-12LC3PZVYNXDJ \
  --load 2x \
  --after-timestamp "2025-12-08T15:30:00.123456-05:00"

Note: Use the complete timestamp with timezone as output by --current-timestamp. This ensures accurate filtering that matches AWS EMR's timestamp format.

This approach filters steps by creation time, ensuring you only see results from the current test round.

Example Report:

# Load Test Report - 2x Load
Date: 2025-12-07 15:30:00
Cluster: j-2ABCDEFG12345

## Summary
- Total Steps: 150
- Successful: 148
- Failed: 2
- Total Runtime: 2h 34m 18s

## Top 20 Slowest Steps
| Rank | Step Name | Elapsed | Status | Start Time |
|------|-----------|---------|--------|------------|
| 1 | check_events_daily_2x | 45m 23s | SUCCESS | 15:30:05 |
| 2 | check_profiles_hourly_2x | 38m 12s | SUCCESS | 15:31:22 |
...

## Performance Metrics
- Average: 5m 12s
- Median (P50): 3m 45s
- P95: 32m 45s
- P99: 42m 18s

## Optimization Suggestions
1. check_events_daily (45m) - Consider increasing executor memory
2. check_profiles_hourly (38m) - Review join strategy
...

HUMAN CHECKPOINT: Review report → decide next action


Iteration Workflows

Scenario 1: Fix Config Generation Issues

Human: "The config for check_events_15m has wrong interval, should be 4 hours not 6"

Claude: I'll fix that issue.

        Updating: ${TARGET_REPO}/.../check_events_15m.yml
        Change: INTERVAL '6' hour → INTERVAL '4' hour

        ✓ Fixed
        → Review the change and reply "approved" to continue

Scenario 2: Tune Cluster Performance and Rerun

Human: "Performance is slow. I'll update cluster config and rerun 2x"

Claude: Understood. Here's the iteration workflow:

        1. Keep check configs as-is (already uploaded to S3 2x folder)
        2. Update cluster configs in: ./dags/cluster_configs/*.json
        3. Run: ./scripts/upload_dags.sh (re-upload with new cluster configs)
        4. Trigger DAG again in Airflow (uses existing 2x check configs from S3)
        5. Provide new cluster ID to Claude for monitoring

        → Ready to update cluster configs now?```

Scenario 3: Move to Next Load Level

Human: "2x test complete. Commit results and start 4x"

Claude: Recording 2x test results:

        In k-repo (commit modified configs):
        cd ${TARGET_REPO}
        git add ./python/klaviyo/.../checks/
        git commit -m "Load test 2x configs - 2025-12-07"

        In audit-load-test repo (commit report):
        cd ~/Klaviyo/Repos/audit-load-test
        git add ./reports/2x_*
        git commit -m "Load test 2x results - [key findings]"

        ✓ Ready for 4x load test
        → Proceed to generate 4x configs?

Scenario 4: Revert Configs to Baseline

Human: "Revert configs to baseline"

Claude: Reverting configs in k-repo:

        cd ${TARGET_REPO}
        git checkout ./python/klaviyo/.../checks/iceberg/

        ✓ All configs reverted to baseline
        → Ready for new load test or other operations

S3 Structure

s3://klaviyo-data-platform-orchestration-v1/

├── checks_load_test/
│   ├── configs/                         # Check configs by load level
│   │   ├── 15m/
│   │   │   ├── 1x/                      # Baseline
│   │   │   ├── 2x/
│   │   │   └── 4x/
│   │   ├── hourly/
│   │   │   ├── 1x/
│   │   │   ├── 2x/
│   │   │   └── 4x/                      # ← includes *_table_metrics.yml
│   │   └── daily/
│   │       ├── 1x/
│   │       ├── 2x/
│   │       └── 4x/
│   └── results/                         # EMR outputs (not used for timing)
│       └── ...
│
└── dags/datalake/audit_loadtest/        # DAG and cluster configs
    ├── yang_audit_load_test.py
    ├── cluster_configs/
    │   ├── yang_audit_load_test_cluster_15m.json
    │   ├── yang_audit_load_test_cluster_hourly.json
    │   └── yang_audit_load_test_cluster_daily.json

Local Repo Structure

audit-load-test/
├── README.md                            # This file
├── .env                                 # Local config (not committed)
├── .env.example                         # Template
├── .gitignore
├── scripts/
│   ├── upload_checks.sh                 # Upload check configs to S3
│   └── upload_dags.sh                   # Upload DAG + cluster configs to S3
├── dags/
│   ├── yang_audit_load_test.py          # Airflow DAG
│   └── cluster_configs/
│       ├── yang_audit_load_test_cluster_15m.json
│       ├── yang_audit_load_test_cluster_hourly.json
│       └── yang_audit_load_test_cluster_daily.json
└── reports/                             # Generated reports
    ├── 2x_20251207_153000/
    │   └── report.md
    └── 4x_20251207_183000/
        └── report.md

Configuration Reference

.env File

# Target repository path
TARGET_REPO=~/Klaviyo/Repos/k-repo

# Path to checks configs within target repo
CHECKS_PATH=python/klaviyo/data_platform/table_config_service/unified_table_definitions/checks/iceberg

# AWS S3 bucket
S3_BUCKET=klaviyo-data-platform-orchestration-v1

# Source for table_metrics configs
PROD_HOURLY_CHECKS=s3://klaviyo-data-platform-orchestration-v1/checks/configs/prod/hourly

Troubleshooting

Permission Issues

# Make scripts executable
chmod +x ./scripts/*.sh

# Refresh AWS credentials
s2a-login

Git Branch Issues

# Check current branch
cd ${TARGET_REPO}
git status

# Switch to test branch
git checkout audit_checks_load_testing
git pull

S3 Upload Failures

# Test AWS CLI access
aws s3 ls s3://klaviyo-data-platform-orchestration-v1/

# Check credentials
aws sts get-caller-identity

EMR Cluster Not Found

# Verify cluster ID format
aws emr list-clusters --active

# Check region
aws configure get region

Config Modification Issues

Problem: Claude modified wrong intervals

Solution: Tell Claude the specific issue:

"Fix check_events_15m.yml: change INTERVAL '6' hour back to INTERVAL '4' hour"

Tips and Best Practices

  1. Always review git diff before uploading configs
  2. Commit after each successful load test for record keeping
  3. Use descriptive commit messages with key findings
  4. Don't merge test branches - keep them separate for load testing only
  5. Update cluster configs iteratively based on performance reports
  6. Use PyCharm for better diff visualization
  7. Keep reports in version control for historical comparison
  8. Test incrementally - start with 2x before jumping to 4x or higher

FAQ

Q: Can I run multiple load levels in parallel?
A: No, run one load level at a time to get accurate performance measurements.

Q: Do I need to upload DAGs every time?
A: Only when cluster configs change. Check configs can be uploaded independently.

Q: What if I want to test 3x or 5x load?
A: Just tell Claude "start 3x load test" - it works with any multiplier.

Q: Can I revert configs without Claude?
A: Yes, just run git checkout . in k-repo to revert all changes.

Q: Where can I find audit test results?
A: Check CloudWatch Logs or S3 at s3://.../checks_load_test/results/

Q: How do I compare two load test results?
A: Ask Claude: "Compare reports from 2x and 4x load tests"


Version History

  • v1.0 (2025-12-07) - Initial framework with Claude Code integration

About

This repo has prompts and code that used for audit load test collaborating with Claude

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors