Add build → score pipeline example by jammastergirish · Pull Request #253 · EleutherAI/bergson

jammastergirish · 2026-04-29T16:37:00Z

Summary

Adds a runnable two-step pipeline example demonstrating the no-compression scoring path:

bergson pipeline examples/pipelines/build_then_score.yaml

Step 1 (build) creates a one-gradient on-disk query index with projection_dim: 0 and aggregation: mean. Step 2 (score) loads that query into memory and dot-products it against each item in a small slice of NeelNanda/pile-10k, producing per-item influence scores. Until now, this required two CLI invocations — this single YAML now drives the whole flow through bergson pipeline.

Why

Showcase the YAML pipeline functionality introduced in ELE-11: Add YAML-mediated search pipeline to Bergson #246 with a real, useful example.
Demonstrate a feature that previously required multiple commands (bergson build + bergson score) — now reducible to a single bergson pipeline invocation.
Lay groundwork for EK-FAC support in score — having an end-to-end no-compression scoring pipeline in YAML form is a prerequisite for then layering Hessian application onto the uncompressed gradients (Lewis's worker has the existing reference for that), without entangling Hessian work with the rest of the codebase.

Files

examples/pipelines/build_then_score.yaml — the example pipeline (small model: gpt2, small dataset: NeelNanda/pile-10k with train[:20] for the query and train[:100] for scoring, chunk_length: 1024).
README.md — extends the "Run a Multi-Step Pipeline" section introduced in ELE-11: Add YAML-mediated search pipeline to Bergson #246 with a one-sentence reference to build_then_score.yaml as a second example covering the no-compression scoring path. (The section itself, the description of bergson pipeline, and the link to hessian_then_build.yaml all live in ELE-11: Add YAML-mediated search pipeline to Bergson #246.)
tests/test_yaml_pipeline.py — test_build_then_score_example_parses asserts the shipped YAML hydrates into the right Build/Score commands with the expected configs (parse-only, no GPU needed).

Configuration notes

skip_preconditioners: true is set on both build and score steps. With projection_dim: 0 the per-token gradient P retains the full flattened module-gradient dimension; the default per-module Adafactor preconditioner update P.mT @ P would otherwise materialize a (full_grad_dim × full_grad_dim) matrix — multi-TB even for GPT-2's c_attn — and OOM immediately. The no-compression scoring path uses raw dot products of full gradients, so preconditioners aren't needed here. See issue Default skip_preconditioners=True for score (or have projection_dim=0 imply it) #255 for a suggested follow-up to make this implicit.
unit_normalize and --preconditioner_path from the CLI command in the Linear task aren't replicated here. Those belong to the compressed/preconditioned path. This example is deliberately the no-compression demo.

Dependencies

Stacked on ELE-11: Add YAML-mediated search pipeline to Bergson #246 (add_yaml-mediated_search_pipeline) — this branch sits on top of the YAML pipeline runner introduced there. The PR base reflects that.
Requires Add length column to tokenize_and_chunk output #252 (Add length column to tokenize_and_chunk output) to run successfully. Without it the build step crashes at ValueError: Column 'length' doesn't exist. because tokenize_and_chunk (triggered by chunk_length: 1024) doesn't add the column expected by the build/score workers. Please merge Add length column to tokenize_and_chunk output #252 before this.
Requires Require torch>=2.4 for FSDP2 fully_shard #251 (torch>=2.4) — the CLI fails to import on torch < 2.4 due to FSDP2 fully_shard. Not blocking for review but blocks running the example on stale environments.

With projection_dim=0 the per-token gradient P retains the full flattened module-gradient dimension, so the default per-module Adafactor preconditioner update `P.mT @ P` materializes a (full_grad_dim x full_grad_dim) matrix — multiple TB even for GPT-2's c_attn — and OOMs immediately. The no-compression scoring path uses raw dot products of full gradients, not preconditioned ones, so the example does not need preconditioners. Set skip_preconditioners: true on the build step to bypass the OOM-prone allocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The score step also runs the gradient collector with projection_dim=0, so it hits the same multi-TB `P.mT @ P` allocation on the first backward pass. Mirror the build step and set skip_preconditioners: true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jammastergirish requested a review from luciaquirke April 29, 2026 16:38

This was referenced Apr 29, 2026

KFAC/EKFAC fails on HuggingFace Conv1D layers (e.g. GPT-2 c_attn) due to weight-shape ordering assumption #254

Open

Default skip_preconditioners=True for score (or have projection_dim=0 imply it) #255

Open

jammastergirish and others added 4 commits April 29, 2026 11:12

Adds build then score

aa454b0

Update README.md

3be6e44

jammastergirish force-pushed the add_build_score_pipeline_example_yaml branch from 52d766c to 0a05763 Compare April 29, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add build → score pipeline example#253

Add build → score pipeline example#253
jammastergirish wants to merge 4 commits intoadd_yaml-mediated_search_pipelinefrom
add_build_score_pipeline_example_yaml

jammastergirish commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jammastergirish commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Files

Configuration notes

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jammastergirish commented Apr 29, 2026 •

edited

Loading