ENH: cov: expose correction and weights parameters by bruAristimunha · Pull Request #690 · data-apis/array-api-extra

bruAristimunha · 2026-04-17T10:43:02Z

Resolves #688.

Summary

Adds axis, correction, frequency_weights, and weights parameters to xpx.cov, unlocking the degrees-of-freedom and weighted variants that numpy.cov and torch.cov already support.
Naming follows array-api conventions (axis, correction) used elsewhere in this library rather than numpy's (rowvar, bias, ddof). The docstring includes a one-to-one mapping for users migrating from numpy.cov.

Design

The delegation moves observations to the last axis via xp.moveaxis, which collapses rowvar out of backend dispatch entirely — only ddof (numpy/cupy/dask/jax) vs correction (torch) differs between branches.

Fallbacks to the generic implementation (_funcs.cov):

m.ndim > 2 (batched input, not supported by any native).
Non-integer correction (rejected by numpy.cov's ddof).
Dask with weights — dask.array.cov forces .compute() on a lazy 0-D scalar via its internal if fact <= 0 check. The generic path stays fully lazy because its weighted branch doesn't compare fact to zero (noted in docstring).

Weighted formula in _funcs.cov matches numpy's (algebraically): c = (m_c · w) @ m_c.T / (v1 - correction · v2 / v1).

Tests

New TestCov cases validate against np.cov as reference:

test_correction (integer ddof)
test_correction_float (generic-path-only, hand-computed reference)
test_axis / test_axis_with_weights / test_axis_out_of_bounds
test_frequency_weights / test_weights / test_both_weights
test_batch_with_weights

Test plan

pytest tests/test_funcs.py::TestCov — 126 passed across numpy, torch, jax, dask, array-api-strict
pytest tests/test_funcs.py full — 4263 passed, 0 failed
lefthook run pre-commit — ruff, numpydoc, mypy, pyright, typos all green
Dask laziness verified — lazy_xp_function(cov) asserts 0 .compute() calls, holds for weighted path via the fallback

Resolves data-apis#688. Adds `axis`, `correction`, `frequency_weights`, and `weights` to `cov`, giving users control over the degrees-of-freedom correction and the observation-axis / weighted variants that `numpy.cov` and `torch.cov` already support. Naming follows array-api conventions (`axis`, `correction`) rather than numpy's (`rowvar`, `bias`, `ddof`); the docstring includes a one-to-one mapping. The delegation moves observations to the last axis via `xp.moveaxis`, collapsing `rowvar` out of the backend dispatch — only `ddof` vs `correction` differs between branches. Dask's native `cov` forces `.compute()` on a lazy scalar when any weights are given, so weighted dask inputs fall through to the generic implementation, which is fully lazy.

betatim · 2026-04-20T09:16:47Z

It looks like the cov you are adding follows the pytorch signature, can you explain a bit why you chose that? In my PR I thought following the Numpy API makes sense because it seems that most libraries use that.

The PR description mentions that other functions in this library already use correction and axis. Which is a good reason to also do it here? Interested in your thinking.

betatim · 2026-04-20T09:35:23Z

+    # `numpy.cov` (and cupy/dask/jax) require integer `ddof`; `torch.cov`
+    # requires integer `correction`. For non-integer-valued `correction`,
+    # fall through to the generic implementation.
+    integer_correction = isinstance(correction, int) or correction.is_integer()


Why do we allow non integer corrections in the first place? Is this to allow people to pass correction=1. instead of raising an error? Or do people really use corrections that are correction=1.234 (I'm not familiar with advanced uses)

Here, I follow the xp.var approach to allow int and float, https://data-apis.org/array-api/latest/API_specification/generated/array_api.var.html.

Think about the case use, I don't know either. We can reduce the scope of the var and allow only integers. I think it is a good idea!

Maybe @lucascolley has an idea for use-cases?

If observations have weights, the unbiased correction is often not n-1. Instead, it depends on the sum of weights and their dispersion. Another instance where the correction is not an integer is for autocorrelated data.

One of the reasons for not using ddof was to get away from the implicit integer-encoded mental model of correction factors.

many thanks for the input @kgryte 🙏🏽

bruAristimunha · 2026-04-20T09:45:06Z

Hey @betatim!

This was a little hard decision that I had to make, but I can be more strict with numpy if you prefer.

I basically looked at what was already implemented on the API array and how they handle the parameter names that I was trying to implement.

Like, for each parameter that I was trying to introduce, I checked how it was made in the past here from numpy to: the bias, the rowvar, the ddof, the fweights, and the aweights.

Basically, for the bias, ddof to become correction, I notice that in the functions xp.var, xp.std, and think xp.sum, they change the default names to the array api specification name.

https://data-apis.org/array-api/latest/API_specification/generated/array_api.var.html
https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html

There was a discussion on how to use correction instead of bias+ddof on these functions. Here was introduced data-apis/array-api#10, and then, later, they made some interesting discussions here: data-apis/array-api#695; it was @kgryte who led the discussion.

For the case of the rowvar becoming the axis, I just follow the signature of the other functions. seems like the axis was how they followed.

And for the frequency_weights and weights, it was my experience in Pyriemann that made the decisions. I think the only place that I remember using something similar was the statsmodels (freq_weights, var_weights) that uses https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html#statsmodels.genmod.generalized_linear_model.GLM.freq_weights

I think in scikit you guys use sample_weight more, but I can accommodate any request about this.

betatim

What is your thinking on validating the weights passed in? Things like checking the shapes make sense, that they are all positive (is this actually required? how does it fit with being lazy?)

bruAristimunha · 2026-04-20T09:55:43Z

I liked this idea a lot @betatim! I think it will make the check in the library that use api array extra much lighter.

bruAristimunha · 2026-04-20T10:12:36Z

FYI @qbarthelemy and @agramfort

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

betatim · 2026-04-20T11:59:18Z

Thanks a lot for the detailed answer in #690 (comment) - I didn't realise there was precedent for using correction in functions like var. I think it makes sense to copy that and use correction for cov as well. Worth making the translation!

What is the "temporary deployed" thing that keeps happening?

bruAristimunha · 2026-04-20T12:01:01Z

it is not me @betatim, i think it something that @lucascolley is pushing in pushing here: #699

bruAristimunha · 2026-04-20T12:01:58Z

Happy that you liked the response @betatim :)

I think I addressed all the points from you and @qbarthelemy, can we merge?

lucascolley · 2026-04-20T12:07:24Z

What is the "temporary deployed" thing that keeps happening?

fixed in bd3652a

lucascolley

I took an initial look, seems pretty good!

One high level comment @bruAristimunha — could you demonstrate that this works as expected when used in a branch of sklearn? You should be able to change https://github.com/scikit-learn/scikit-learn/blob/06aded051fe6c7c9970b7e13c3669f952a799831/maint_tools/vendor_array_api_extra.sh#L8-L9 to point to this branch and commit hash.

lucascolley · 2026-04-20T13:17:52Z

-    m = xp.asarray(m, copy=True)
+    m = xp.asarray(m)


why do we drop this?

It was mostly one small optimization that I made, as the new code for the covariance doesn't mutate in-place anymore.

Like, if I understand correctly, we need to copy because of this line:

m -= avg

But now, as we are doing:

m_c = m - avg m_w = m_c if w is None else m_c * w m_cT = xp.matrix_transpose(m_c) c = (m_w @ m_cT) / fact

I noticed this by accident, then I was testing the speed test, and I noticed some small regression. I think it's worth disabling copying.

lucascolley · 2026-04-20T13:25:05Z

+    # Validate axis against m.ndim.
+    ndim = max(m.ndim, 1)
+    if not -ndim <= axis < ndim:
+        msg = f"axis {axis} is out of bounds for array of dimension {m.ndim}"
+        raise IndexError(msg)


just a thought, maybe some common logic can be extracted out from this and

array-api-extra/src/array_api_extra/_delegation.py

Lines 313 to 316 in af12cd5

if axis != () and (min(axis) < -ndim or max(axis) >= ndim):

err_msg = (

f"a provided axis position is out of bounds for array of dimension {a.ndim}"

)

that is for a tuple of axes though, so maybe not

I couldn't think of something to optimize this. The only thing that i could think here is the normalize_axis_index from numpy, but then we would need another PR to introduce here in the library ;p

https://numpy.org/doc/2.1/reference/generated/numpy.lib.array_utils.normalize_axis_index.html#numpy.lib.array_utils.normalize_axis_index

lucascolley · 2026-04-20T13:26:28Z

+    # Validate weight shapes (eager metadata, lazy-safe). Value-based
+    # checks (non-negative, integer dtype) are intentionally skipped so
+    # lazy backends don't trigger compute -- same tradeoff as dask.cov.
+    n_obs = m.shape[-1]
+    for name, w in (("fweights", fweights), ("aweights", aweights)):
+        if w is None:
+            continue
+        if w.ndim != 1:
+            msg = f"`{name}` must be 1-D, got ndim={w.ndim}"
+            raise ValueError(msg)
+        if w.shape[0] != n_obs:
+            msg = (
+                f"`{name}` has length {w.shape[0]} but `m` has {n_obs} observations"
+            )
+            raise ValueError(msg)


must this happen at the delegation layer? What happens if we just let the backends to which we delegate error out instead?

I think all the back-end (numpy, torch, jax, dask), except the 0-D scalar fweights for dask, will raise, and should be fine.

I mostly push this to address the suggestion of @betatim (#690 (review)), but if you prefer, I can let the backend handle this, as they already validate.

let's me validate only in the case of dask

moving this for the _funcs.

bruAristimunha · 2026-04-20T14:11:36Z

hey @betatim,

As you have the first covariance PR on scikit, can you help with this small test as requested by @lucascolley?

One high level comment @bruAristimunha — could you demonstrate that this works as expected when used in a branch of sklearn? You should be able to change https://github.com/scikit-learn/scikit-learn/blob/06aded051fe6c7c9970b7e13c3669f952a799831/maint_tools/vendor_array_api_extra.sh#L8-L9 to point to this branch and commit hash.

bruAristimunha · 2026-04-20T16:27:29Z

hey @lucascolley,

I made in my branch that was built on top of @betatim's work for scikit-learn first covariance, you can check more here: scikit-learn/scikit-learn#33600

bruAristimunha mentioned this pull request Apr 17, 2026

Add bias keyword argument to cov #691

Closed

betatim reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_delegation.py

betatim reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_lib/_funcs.py Outdated

betatim reviewed Apr 20, 2026

View reviewed changes

MNT: drop device= in cov weights

d9701e0

bruAristimunha force-pushed the cov_parameters branch from 83b7e1b to d9701e0 Compare April 20, 2026 10:10

STY: formatter

72e2d61

TST: add bias tests from data-apis#691

c0a20b0

bruAristimunha temporarily deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Inactive

bruAristimunha deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Active

bruAristimunha temporarily deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Inactive

lucascolley added the enhancement New feature or request label Apr 20, 2026

qbarthelemy reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_lib/_funcs.py Outdated

Update _funcs.py

c621a74

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

qbarthelemy reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_delegation.py Outdated

bruAristimunha and others added 2 commits April 20, 2026 13:39

Update _delegation.py

98f216a

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

MNT: rename weights params to fweights/aweights

06b4007

ENH: validate weights shape in cov

8b5c471

lucascolley changed the title ~~ENH: expose correction and weights parameters in cov~~ ENH: cov: expose correction and weights parameters Apr 20, 2026

lucascolley reviewed Apr 20, 2026

View reviewed changes

bruAristimunha added 2 commits April 20, 2026 15:35

MNT: address lucascolley review

cb717b0

MNT: move weights validation to generic cov

e34d415

	if axis != () and (min(axis) < -ndim or max(axis) >= ndim):
	err_msg = (
	f"a provided axis position is out of bounds for array of dimension {a.ndim}"
	)

Conversation

bruAristimunha commented Apr 17, 2026

Summary

Design

Tests

Test plan

Uh oh!

betatim commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

betatim commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

lucascolley commented Apr 20, 2026

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants