Fixed minimizer behaviour on exceeded iterations#231
Conversation
henrikjacobsenfys
left a comment
There was a problem hiding this comment.
Looks good to me apart from a tiny nitpick.
This may also be a good opportunity to make a __repr__ method for the FitResults class?
Good idea! lines = [
f"FitResults(success={self.success}",
f" n_pars={self.n_pars}, n_points={len(self.x)}",
f" chi2={self.chi2:.4g}, reduced_chi={self.reduced_chi:.4g}",
f" n_evaluations={self.n_evaluations}",
f" minimizer={self.minimizer_engine.__name__ if self.minimizer_engine else None}",
] |
damskii9992
left a comment
There was a problem hiding this comment.
Generally looks good. I have some questions/potential changes before approval.
Also, do we want to raise an error or a warning when a fit does not converge withing max_iterations?
| if getattr(results, name, False): | ||
| setattr(results, name, value) | ||
| results.success = not bool(fit_results.flag) | ||
| results.success = fit_results.flag == fit_results.EXIT_SUCCESS |
There was a problem hiding this comment.
Can you explain this? What is fit_results.EXIT_SUCCESS?
There was a problem hiding this comment.
And should we raise a warning if the fit did not succeed?
There was a problem hiding this comment.
In DFO-LS, EXIT_SUCCESS is 0 when “objective is sufficiently small” or “rho has reached rhoend”. Positive codes are warning exits, and negative codes are hard errors.
whether to raise a warning: not for every non-success. Hard failures already become an exception in minimizer_dfo, so warning on those would be redundant. But for warning-style exits that still return results, especially EXIT_MAXFUN_WARNING, adding a UserWarning would be reasonable if we want consistency with BUMPS, which warns when it hits the evaluation budget.
There was a problem hiding this comment.
Added warning and unit test
There was a problem hiding this comment.
Okay, but what about the fit_results.flag? I assume this can be other values than 0. Do we risk this returning True when the flag and EXIT_SUCCESS are identical values different from 0? What is EXIT_SUCCESS value when we hit a EXIT_MAXFUN_WARNING?
There was a problem hiding this comment.
No, there is no risk from the value being nonzero by itself.
This is a straight equality check:
results.success = fit_results.flag == fit_results.EXIT_SUCCESS
It only returns True when the flag exactly matches success const. It does not depend on success being 0 specifically. Even if a library used 137 for success, this comparison would still be correct as long as fit_results.flag and fit_results.EXIT_SUCCESS were both 137 for a real success.
For DFO we have:
EXIT_SUCCESS = 0
EXIT_MAXFUN_WARNING = 1
EXIT_SLOW_WARNING = 2
EXIT_FALSE_SUCCESS_WARNING = 3
hard errors have negative values
The line does what we want:
normal success -> True
maxfun warning -> False
any other non-success flag -> False
I added a comment above the assignment to spell it out.
There was a problem hiding this comment.
But what about fit_results.flag, what values can that take? Only 0 and 1?
There was a problem hiding this comment.
But what about
fit_results.flag, what values can that take? Only 0 and 1?
Line 271: DFO results object is returned, if results.flag is either 0 (success) or 1 (MAXFUN_WARNING), the results object is returned. Otherwise it is assumed the fit failed and assert raised.
So in line 219, where this is actually being checked, flag_results.flag is only 0 or 1.
There was a problem hiding this comment.
Ahh, now I think I get it. DFO sets fit_results.flag to some integer depending on if the fit failed (negative), succeeded cleanly (0) or succeeded with a warning (positive). To remember what each of these integers means, they're also assigned to the attributes EXIT_SUCCESS, EXIT_MAXFUN_WARNING etc.
In other words, the flag attribute is a variable and the EXIT attributes are static constants.
So when you compare, you're just checking the flag against the static attribute.
I guess this is done so that they can change the flag integers for different fit terminations without breaking code, or for more easily readable syntax . . . Hmm, interesting.
Could you boil my realization down and add as a comment? Either instead of your comment or as an addendum :)
|
|
||
| if 'Success' not in results.msg: | ||
| raise FitError(f'Fit failed with message: {results.msg}') | ||
| if results.flag in {results.EXIT_SUCCESS, results.EXIT_MAXFUN_WARNING}: |
There was a problem hiding this comment.
Could you also explain this? Maybe add a comment?
| except FitError: | ||
| for key in self._cached_pars.keys(): | ||
| self._cached_pars[key].value = self._cached_pars_vals[key][0] | ||
| raise |
There was a problem hiding this comment.
Codecov says this misses a test :)
There was a problem hiding this comment.
test added. you know we shouldn't really aim for 100% coverage? :)
| results.y_calc = fit_results.best_fit | ||
| results.y_err = 1 / fit_results.weights | ||
| results.n_evaluations = fit_results.nfev | ||
| results.message = fit_results.message |
There was a problem hiding this comment.
We should probably either raise an error or a warning when max_evaluations have been reached.
There was a problem hiding this comment.
As mentioned earlier, I would prefer to let the caller decide on what to do with this information. Let's spit a warning and be gracious wrt. max_evaluations.
|
|
||
| assert result.success is False | ||
| assert result.n_evaluations is not None | ||
| assert result.n_evaluations > 0 |
There was a problem hiding this comment.
Since the fit reached max evaluations, shouldn't this be:
assert result.n_evaluations == 3There was a problem hiding this comment.
n_evaluations is the actual backend-reported evaluation count, not necessarily the configured budget. In this case, n_evaluations is 10.
There was a problem hiding this comment.
But doesn't that mean that setting max_evaluations doesn't currently work? If the max is 3, the number of iterations should be 3 unless it converged in under 3 iterations (doubtful). I guess this relates to my earlier comment that BUMPS runs multiple function evaluations before actually starting the fit.
There was a problem hiding this comment.
yes, BUMPS is interesting.
Our n_evaluations comes from _EvalCounter.call, which counts every objective invocation.
Simplex nethod evaluates multiple points per optimizer iteration + does work before the main loop starts by building the initial simplex.
So with steps = 3 getting 10 n_evaluations is plausible. The solver is not running 10 iterations but rather using up to 3 simplex steps, and those steps require multiple objective calls.
f.max_evaluations = 3 for BUMPS currently means: maximum optimizer steps
result.n_evaluations means: actual objective-function calls observed.
That is why I changed the BUMPS message to say “maximum optimizer steps” rather than “maximum evaluations”.
The real issue is naming: in BUMPS, max_evaluations is currently mapped to steps, so the public API name is more general than the actual behavior.
And we can't set the actual mac number of optimizer steps explicitly...
| assert domain_fit_results.y_calc == 'evaluate' | ||
| assert domain_fit_results.y_err == 'dy' | ||
| assert domain_fit_results.n_evaluations == 7 | ||
| assert domain_fit_results.message == 'Fit stopped: reached maximum evaluations (3)' |
There was a problem hiding this comment.
Huh? max evaluations was 3 but it got 7 iterations? What?
There was a problem hiding this comment.
minimizer._eval_counter = MagicMock(count=7) in line 205.
I would prefer to just emit the warning and let the caller check the results object for convergence. |
|
|
||
| def __call__(self, *args, **kwargs): | ||
| self.count += 1 | ||
| return self._fn(*args, **kwargs) |
There was a problem hiding this comment.
Thinking about this now, but you're tracking fit iterations by tracking how many times the model function is evaluated. This is not a correct representation of the number of fit iterations, as some fitting routines evaluate the model function multiple times per fitting step in order to best choose the next fitting step.
Optimizers like BUMPS, for example, run multiple model evaluations at the start of each fit.
This might be the best we CAN do, if we can't get the actual fit iterations from the back engine, but it's worth mentioning.
There was a problem hiding this comment.
You might want to add to the integration tests where we actually run the fits and check the iteration counters.
There was a problem hiding this comment.
This is the best we can currently do.
The actual iteration counters might differ from machine to machine, from OS to OS, I don't think it's going to be easy to catch the right situation, but I'll give it a try
There was a problem hiding this comment.
I guess we may use nit, assuming BUMPS is consistent with it.
|
I assume that this is now ready for merge |

This adds improvements to the fitting result reporting and evaluation tracking across all minimizer engines, ensuring that the number of function evaluations and relevant messages are consistently captured and propagated.
All minimizer engines (
Bumps,DFO,LMFit) now populaten_evaluations(number of function evaluations) andmessage(status or error message) fields inFitResults, and these fields are propagated in multi-dataset fitting.The
Bumpsminimizer uses a new_EvalCounterwrapper to count function evaluations and detect when the maximum evaluation budget is reached, updating thesuccessflag and message accordingly.The
DFOminimizer now raisesFitErroronly for real failures, and considers both success and max function evaluation warnings as successful fits, improving error reporting and testability.