Add build → score pipeline example#253
Open
jammastergirish wants to merge 4 commits intoadd_yaml-mediated_search_pipelinefrom
Open
Add build → score pipeline example#253jammastergirish wants to merge 4 commits intoadd_yaml-mediated_search_pipelinefrom
jammastergirish wants to merge 4 commits intoadd_yaml-mediated_search_pipelinefrom
Conversation
With projection_dim=0 the per-token gradient P retains the full flattened module-gradient dimension, so the default per-module Adafactor preconditioner update `P.mT @ P` materializes a (full_grad_dim x full_grad_dim) matrix — multiple TB even for GPT-2's c_attn — and OOMs immediately. The no-compression scoring path uses raw dot products of full gradients, not preconditioned ones, so the example does not need preconditioners. Set skip_preconditioners: true on the build step to bypass the OOM-prone allocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The score step also runs the gradient collector with projection_dim=0, so it hits the same multi-TB `P.mT @ P` allocation on the first backward pass. Mirror the build step and set skip_preconditioners: true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52d766c to
0a05763
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a runnable two-step pipeline example demonstrating the no-compression scoring path:
Step 1 (
build) creates a one-gradient on-disk query index withprojection_dim: 0andaggregation: mean. Step 2 (score) loads that query into memory and dot-products it against each item in a small slice ofNeelNanda/pile-10k, producing per-item influence scores. Until now, this required two CLI invocations — this single YAML now drives the whole flow throughbergson pipeline.Why
bergson build+bergson score) — now reducible to a singlebergson pipelineinvocation.score— having an end-to-end no-compression scoring pipeline in YAML form is a prerequisite for then layering Hessian application onto the uncompressed gradients (Lewis's worker has the existing reference for that), without entangling Hessian work with the rest of the codebase.Files
examples/pipelines/build_then_score.yaml— the example pipeline (small model:gpt2, small dataset:NeelNanda/pile-10kwithtrain[:20]for the query andtrain[:100]for scoring,chunk_length: 1024).README.md— extends the "Run a Multi-Step Pipeline" section introduced in ELE-11: Add YAML-mediated search pipeline to Bergson #246 with a one-sentence reference tobuild_then_score.yamlas a second example covering the no-compression scoring path. (The section itself, the description ofbergson pipeline, and the link tohessian_then_build.yamlall live in ELE-11: Add YAML-mediated search pipeline to Bergson #246.)tests/test_yaml_pipeline.py—test_build_then_score_example_parsesasserts the shipped YAML hydrates into the rightBuild/Scorecommands with the expected configs (parse-only, no GPU needed).Configuration notes
skip_preconditioners: trueis set on both build and score steps. Withprojection_dim: 0the per-token gradientPretains the full flattened module-gradient dimension; the default per-module Adafactor preconditioner updateP.mT @ Pwould otherwise materialize a(full_grad_dim × full_grad_dim)matrix — multi-TB even for GPT-2'sc_attn— and OOM immediately. The no-compression scoring path uses raw dot products of full gradients, so preconditioners aren't needed here. See issue Default skip_preconditioners=True for score (or have projection_dim=0 imply it) #255 for a suggested follow-up to make this implicit.unit_normalizeand--preconditioner_pathfrom the CLI command in the Linear task aren't replicated here. Those belong to the compressed/preconditioned path. This example is deliberately the no-compression demo.Dependencies
add_yaml-mediated_search_pipeline) — this branch sits on top of the YAML pipeline runner introduced there. The PR base reflects that.Add length column to tokenize_and_chunk output) to run successfully. Without it the build step crashes atValueError: Column 'length' doesn't exist.becausetokenize_and_chunk(triggered bychunk_length: 1024) doesn't add the column expected by the build/score workers. Please merge Add length column to tokenize_and_chunk output #252 before this.torch>=2.4) — the CLI fails to import on torch < 2.4 due to FSDP2fully_shard. Not blocking for review but blocks running the example on stale environments.