feat: Add recipe validation integ test for HP-ModelCustomization-RecipeValidator pipeline by mollyheamazon · Pull Request #5779 · aws/sagemaker-python-sdk

mollyheamazon · 2026-04-20T21:34:31Z

Summary

Adds a pytest-based recipe validation test that will be invoked by the HP-ModelCustomization-RecipeValidator pipeline to validate that new/modified recipes in a private SageMaker Hub can be fetched, parsed, and used to instantiate the correct sagemaker.train Trainer class.

Design doc: https://tiny.amazon.com/mn08ehy8/quipubV4Desi

What does this change do?

When the RecipeValidator pipeline detects new or modified recipes, a CodeBuild project clones this repo and runs test_new_recipes_create_valid_trainers.

The test:

Reads HYPERPOD_HUB_NAME from the environment (set by the pipeline's CodeBuildTrigger)
Lists all models in the private hub via list_hub_contents
For each model, parses the RecipeCollection and filters for FineTuning recipes
Detects training type (SFT/DPO/RLAIF/RLVR) and LoRA vs full fine-tuning from the recipe name
Instantiates the corresponding Trainer (SFTTrainer, DPOTrainer, RLAIFTrainer, RLVRTrainer)
Collects all errors across all models and reports them in a single assertion

If the test fails, the pipeline halts and recipes don't reach JumpStart.

Files changed

sagemaker-train/tests/integ/train/recipe_tests/__init__.py — new package
sagemaker-train/tests/integ/train/recipe_tests/test_recipe_validation.py — new test

Testing

Validated against SageMakerPublicHub in us-west-2 — test successfully iterated all models and validated fine-tuning recipes for gated Llama models (SFT, DPO, RLAIF, RLVR).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

namannandan · 2026-04-21T22:09:44Z

+                training_type_enum = detect_lora_or_full(recipe_name)
+                trainer_class = TRAINER_MAPPING[training_type]
+
+                trainer = trainer_class(


Is it sufficient to check here that we can instantiate a trainer class? Could we also submit a test job and verify that interaction with smjobs/k8s will work?

We can potentially use a small/dummy dataset so that the job doesn't run for long but still verify that the end to customer interaction via PySDK will work for new recipes.

Instantiation-only is the right scope for this validation step — it catches the most likely breakages: schema mismatches, missing fields, and unsupported training types in the hub content fetch → recipe parsing → Trainer construction path.

Running real training jobs would require significant infrastructure changes to the validation account — GPU instance quotas, CreateTrainingJob permissions, per-technique dummy datasets, and cleanup logic, none of which exist today. We do already have e2e integ tests in the PySDK repo that submit real training jobs for a subset of recipes, so the full job path is partially covered. If we want broader e2e coverage for all new recipes, I'd suggest scoping that as a follow-up with its own infrastructure workstream.

Yes, we do want to be able to test that the job is able to start/run to verify the customer workflow before launch. Could you please add a Note here as a follow up task?

namannandan

Looks good. Please add a note to include job submission as well to the test as a follow up task.

namannandan · 2026-04-21T23:16:27Z

+                training_type_enum = detect_lora_or_full(recipe_name)
+                trainer_class = TRAINER_MAPPING[training_type]
+
+                trainer = trainer_class(


Yes, we do want to be able to test that the job is able to start/run to verify the customer workflow before launch. Could you please add a Note here as a follow up task?

mollyheamazon · 2026-04-22T18:44:31Z

This test will be live in HP-ModelCustomization-PySDKValidation package.

feat: recipe validation integ test

e809336

mollyheamazon marked this pull request as ready for review April 20, 2026 21:34

mollyheamazon temporarily deployed to auto-approve April 20, 2026 21:34 — with GitHub Actions Inactive

Move recipe validation integ test to proper location

3c52524

mollyheamazon temporarily deployed to auto-approve April 21, 2026 00:12 — with GitHub Actions Inactive

Update env var to SAGEMAKER_HUB_NAME

54d1618

mollyheamazon temporarily deployed to auto-approve April 21, 2026 19:10 — with GitHub Actions Inactive

namannandan reviewed Apr 21, 2026

View reviewed changes

namannandan approved these changes Apr 21, 2026

View reviewed changes

mollyheamazon closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add recipe validation integ test for HP-ModelCustomization-RecipeValidator pipeline#5779

feat: Add recipe validation integ test for HP-ModelCustomization-RecipeValidator pipeline#5779
mollyheamazon wants to merge 3 commits intoaws:masterfrom
mollyheamazon:feat/recipe-integ

mollyheamazon commented Apr 20, 2026

Uh oh!

namannandan Apr 21, 2026

Uh oh!

mollyheamazon Apr 21, 2026

Uh oh!

namannandan Apr 21, 2026

Uh oh!

namannandan left a comment

Uh oh!

namannandan Apr 21, 2026

Uh oh!

mollyheamazon commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mollyheamazon commented Apr 20, 2026

Summary

What does this change do?

Files changed

Testing

Uh oh!

namannandan Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

mollyheamazon Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

namannandan Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

namannandan left a comment

Choose a reason for hiding this comment

Uh oh!

namannandan Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

mollyheamazon commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants