Unifies Hessian computation; renames preconditioner → hessian #245
Unifies Hessian computation; renames preconditioner → hessian #245jammastergirish wants to merge 6 commits intoEleutherAI:mainfrom
Conversation
This is effectively a find/replace. However, some file names were called precondition*.pft. They've been renamed. However, if they can't be found in gradients.py, the old file name is sought instead.
|
Looking awesome. Two things: Where you insert backwards compatibility could you please add a Comment so I can sweep through and remove it at some point when people are safely migrated. Second, there's a user-facing flag in config.py formerly called preconditioner_path that's used here I think it should be called "processor_path" because it's actually a path to a GradientProcessor. And its docstring was """Path to a precomputed preconditioner.""" I think it should be """Path to a precomputed gradient processor. Set to apply a Hessian approximation.""" I think this is 95% ready |
|
@LouisYRYJ @SrGonao @norabelrose we're interested in making this refactor that makes a small change that nonetheless touches many files to move towards unified handling for our various Hessian approximations. But if this change seems too large or risky, the first commit (unifying the command line tools and nothing else) could be merged now along with the |
|
|
||
|
|
||
| BERGSON_FACTOR_TYPES = {"normalizer", "preconditioner", "kfac", "ekfac"} | ||
| BERGSON_FACTOR_TYPES = {"normalizer", "hessian", "kfac", "ekfac"} |
There was a problem hiding this comment.
here I think we should say something like "auto_correlation_hessian" or maybe "ac_hessian", probably there are better names out there
There was a problem hiding this comment.
autocorrelation make sense to me since the others don't have the _hessian suffix
|
Thanks a lot Girish, this has been laying around for quite some time and I am happy to see this being picked up.
(Importantly, we are at no point computing the actual true Hessian in this library, but always approximations thereof. This makes it generally difficult to be precise when naming things and talking about them, but I have found no good solution for this and also defaulted to calling some functions/files "hessian_sth_sth" for the sake of brevity even though that is technically not correct.)
Looking forward to your future contributions :) |
|
Thank you so much for this @LouisYRYJ. I'll discuss with @luciaquirke the language to use. Agree that Let's deal with that redundancy in another PR. And thanks on the proper noun capitaliztion! |
|
Yeah this is somewhat gnarly but it would be really good to find anywhere where we did a "preconditioner" -> "hessian" rename where it would be more appropriate to do "preconditioner" -> "autocorrelation" (like if we're specifically referring to the autocorrelation approximation). I will sweep through and try to do this now. Re: calling our Hessian approximations "hessian" rather than "hessian_approx", it's definitely not ideal but probably worth it for the conciseness gains. |
| @HookCollectorBase.split_attention_heads | ||
| def backward_hook(self, module: nn.Module, g: Float[Tensor, "N S O"]): | ||
| """Compute per-sample gradient, accumulate preconditioner, and store.""" | ||
| """Compute per-sample gradient, accumulate hessian, and store.""" |
There was a problem hiding this comment.
"autocorrelation matrix" in this file
There was a problem hiding this comment.
"accumulate autocorrelation matrix" or "autocorrelation matrix"?
There was a problem hiding this comment.
accumulate for sure, the accumulation is the part where we sum all the outer product matrices to compute an autocorrelation matrix
Unifies all Hessian estimation methods under one umbrella. The old
PreconditionersCLI subcommand now just onemethodchoice on the existingHessiansubcommand, and naming across the codebase follows: "preconditioner" → "hessian".What changed:
bergson preconditionerssubcommand removed;folded into
bergson hessian(1588e35).HessianConfig.methodgains"autocorrelation"and makes it the new default.
bergson hessian --method autocorrelationdoes whatbergson preconditionersused to do;
--method kfac|tkfac|shampooare unchanged.bergson preconditioners ...must move tobergson hessian --method autocorrelation ....Codebase-wide rename:
preconditioner*→hessian*(andprecondshorthand →hess)(546019d). Verb forms (
preconditioning,preconditioned) preserved as descriptive mathlanguage. Public API renames worth flagging:
GradientProcessor(preconditioners=…, preconditioners_eigen=…)→hessians=,hessians_eigen=GradientProcessor.load(skip_preconditioners=…)→skip_hessians=bergson.mix_preconditioners→bergson.mix_hessiansbergson.process_preconditioners→bergson.process_hessians;bergson.collector.dist_pre conditioners_gradient_collector→bergson.collector.dist_hessians_gradient_collector;examples.semantic.preconditioners→examples.semantic.hessians.**
GradientProcessor.load()is backward compatiblehessians_eigen.pthare absent it falls back topreconditioners.pth/preconditioners_eigen.pth.Also defensively migrates any legacy
preconditioner*keys appearing in
processor_config.yaml.