Skip to content

End-to-end support for coarser-than-daily count signals#794

Merged
cdc-mitzimorris merged 56 commits intomainfrom
mem_789_temporal_aggregation
Apr 28, 2026
Merged

End-to-end support for coarser-than-daily count signals#794
cdc-mitzimorris merged 56 commits intomainfrom
mem_789_temporal_aggregation

Conversation

@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator

@cdc-mitzimorris cdc-mitzimorris commented Apr 21, 2026

Overview

Adds support for count observations aggregated to a weekly grid while the renewal equation continues to be evaluated daily. Two design pieces:

  1. Weekly observation likelihood path. Daily predicted counts are summed to the weekly observation grid before the likelihood is scored. Aggregation lives inside the numpyro-traced graph so the noise model and the likelihood operate on the same scale.
  2. Independent parameter cadence. The temporal process for $\mathcal{R}(t)$ may be parameterized daily or weekly (or any stepwise cadence) and is broadcast to the daily axis. The renewal equation and all observation transforms run daily either way.

End-to-end flow for a weekly signal

  • $\mathcal{R}(t)$ sampled by a temporal process at the chosen cadence and broadcast to the daily model axis if needed.
  • Daily renewal equation consumes daily $\mathcal{R}(t)$ and produces the daily infection trajectory.
  • Daily predicted counts computed via ascertainment + delay convolution.
  • Daily predicted counts summed to the weekly observation grid via pyrenew.time.daily_to_weekly.
  • Likelihood scored at weekly scale against weekly observations.

Relationship to pyrenew-hew

The production pyrenew-hew model parameterizes $\mathcal{R}(t)$ weekly and aggregates daily predicted hospital admissions to a weekly grid. This PR brings the same capability into PyRenew while making parameter cadence a user choice rather than a fixed coupling:

  • Match pyrenew-hew today: wrap the inner temporal process in StepwiseTemporalProcess(step_size=7, alignment="calendar_week", week_start_dow=...).
  • Daily parameterization for the same signals: drop the wrapper and use the inner process directly.
  • Mixed cadences: weekly hospital admissions and daily ED visits in one model regardless of the Rt parameter cadence.

Observation cadence and parameter cadence are independent design choices; the builder no longer enforces a pairing rule between them.

Reviewer guide

Review bottom-up through the dependency chain. Each unit's changes are self-contained.

1. Synthetic data refresh (120 → 126 days)

  • Files: datagen_he_CA_126.py, synthetic_CA_126/*.csv, synthetic_data.py, test_datagen_he_CA_126.py, test_datasets_synthetic.py.
  • Six extra days yields 18 full weeks. No API change.
  • Verify true_parameters.json matches the generating process.

2. Observation base validators — pyrenew/observation/base.py

  • New: _validate_aggregation_params, _compute_period_offset, _validate_period_end_times.
  • Generalized _validate_shapes_match replaces _validate_obs_times_shape.
  • _validate_dow_effect now uses the shared require_shape helper.
  • Focus: offset arithmetic (period_end_dow + 1 - first_day_dow) % 7 and period-boundary alignment.

3. Latent: temporal processes — pyrenew/latent/temporal_processes.py

  • New class StepwiseTemporalProcess; step_size attr added to the TemporalProcess Protocol and to AR1 / DifferencedAR1 / RandomWalk.
  • Alignment options: "model_index" (default) starts blocks at model index 0; "calendar_week" aligns weekly blocks to a declared week_start_dow.
  • Calendar-week broadcast delegates to pyrenew.time.weekly_to_daily; coarse trajectory recorded as {name_prefix}_coarse.
  • first_day_dow threaded through the Protocol so calendar-aligned wrappers can use the model-axis day-of-week; standard processes ignore it.

4. Latent: shape contracts — pyrenew/latent/{population,subpopulation}_infections.py

  • Both sample() methods accept first_day_dow and forward it to their temporal processes.
  • Output shape of every temporal process is validated via the shared pyrenew.arrayutils.require_shape helper.

5. Count observation path — pyrenew/observation/count_observations.py

  • New ctor params aggregation_period, reporting_schedule, period_end_dow.
  • New _aggregate wraps pyrenew.time.daily_to_weekly.
  • Branching in validate_data / sample for regular vs. irregular schedules.
  • Confirm all in-tree SubpopulationCounts call sites updated; timesperiod_end_times rename consistent.

6. Measurement observations — pyrenew/observation/measurement_observations.py

  • One-line refactor to use the renamed shape validator.

7. Builder + model coherence — pyrenew/model/{pyrenew_builder,multisignal_model}.py

  • MultiSignalModel.sample() accepts first_day_dow and forwards it to the latent process.
  • _validate_coherence enforces calendar-anchor and structural coherence:
    • All observations sharing an aggregation_period > 1 must agree on period_end_dow.
    • Every temporal-process step_size must be a positive integer.
    • Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.
  • first_day_dow required at validate_data when any obs has aggregation_period > 1.
  • The previous "step_size ≤ finest aggregation_period" rule was deliberately removed — see the in-body comment.

8. Integration test — test/integration/test_population_infections_he_weekly.py

  • End-to-end MCMC: weekly admissions + daily ED visits, calendar-week-aligned weekly Rt.
  • Prior-predictive shape check and posterior recovery for I0 + baseline R(t).
  • Uses numpyro.enable_x64() + set_host_device_count(4).

9. Test fixtures + config — test/conftest.py, pyproject.toml, _typos.toml

  • 196-line conftest promotes repeated setup to fixtures, including reusable temporal-process stubs (WrongShapeTemporalProcess, ConstantTemporalProcess, InvalidStepSizeTemporalProcess).
  • Pytest integration marker added so -m "not integration" skips MCMC tests.

10. Tutorial — docs/tutorials/building_multisignal_models.qmd

  • New section explaining the three independent choices: parameter cadence, model time axis, observation cadence.
  • Worked example pairing weekly Rt (alignment="calendar_week", week_start_dow=6) with weekly observations (period_end_dow=5).

Where to focus review attention

  • Offset math in _compute_period_offset and the period-boundary check in _validate_period_end_times — these govern correctness of weekly alignment.
  • StepwiseTemporalProcess calendar-week broadcasting — weekly_to_daily is reused; sanity check with n_timepoints=17, first_day_dow=3, week_start_dow=6 produces 3 coarse samples broadcast to [c0×3, c1×7, c2×7].
  • The three coherence rules in _validate_coherence — each has pass + distinct-failure tests in test_pyrenew_builder.py.
  • CountObservation._aggregate runs inside the numpyro-traced graph — likelihood scoring is at weekly scale, not post-hoc.

Incidental fixes (not directly tied to #789)

Found while implementing aggregation; small enough that separating them would create churn.

  • _validate_index_array empty-array guard (observation/base.py) — jnp.any(indices < 0) on empty arrays returned False; now returns early on size == 0.
  • _validate_index_array / _validate_obs_dense ndim check — previously accepted non-1D arrays and relied on silent broadcasting; now rejects with a clear error.
  • AR(p) stationary-SD bound in test_ar_process_asymptotics — the bound |long_ts[-1]| < 3 * noise_sd was too tight (stationary SD strictly > innovation SD when autoregressive coefficients are non-zero) and latently flaky; replaced with closed-form stationary SD per order.
  • _validate_obs_times_shape_validate_shapes_match rename — prerequisite refactor; same shape-match logic needed for (obs, period_end_times) pairs.
  • _typos.toml: allow dows. pyproject.toml: register integration pytest marker.

Comment thread pyrenew/latent/temporal_processes.py Outdated
Comment thread pyrenew/arrayutils.py
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/latent/temporal_processes.py Outdated
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/model/pyrenew_builder.py Outdated
@dylanhmorris
Copy link
Copy Markdown
Collaborator

dylanhmorris commented Apr 24, 2026

All observations sharing an aggregation_period > 1 must agree on period_end_dow.
Every temporal-process step_size must be a positive integer.
Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.

High-level comment: do we definitely need to enforce all weekly quantities sharing the same week?

I agree that in many cases a user will want this, but it's not a given. For example, you could imagine two weekly aggregate observables, one reported in MMWR epiweeks, the other in isoweeks. Similarly, while matching temporal process weeks to observation weeks makes sense to me as a default, I don't think we should enforce it.
https://github.com/CDCgov/PyRenew/pull/794/changes#r3140086415

Comment thread test/test_ar_process.py Outdated
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/latent/base.py Outdated
Comment thread pyrenew/latent/population_infections.py
Comment thread test/conftest.py
Comment thread test/integration/test_population_infections_he_weekly_rt.py
Comment thread test/integration/test_population_infections_he_weekly_rt.py
@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator Author

@damonbayer and @dylanhmorris - I have addressed all comments.
substantive changes:

  • weekly quantities do not need to share the same weekly alignment
  • split "StepwiseTemporalProcesses" into "WeeklyTemporalProcesses" and "StepwiseTemporalProcesses" - the former handles weekly process logic including start dow; the latter simply handles signals observed at some number of timesteps > 1.

ready for re-review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +554 to +560
n_steps = self._resolve_n_coarse(n_timepoints)
coarse = self.inner.sample(
n_timepoints=n_steps,
initial_value=initial_value,
n_processes=n_processes,
name_prefix=name_prefix,
)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StepwiseTemporalProcess.sample accepts first_day_dow but does not forward it to the inner process. This breaks valid compositions like StepwiseTemporalProcess(inner=WeeklyTemporalProcess(...)) where the inner requires first_day_dow at sample time. Forward first_day_dow to inner.sample (it can be ignored by processes that don't use it) or explicitly forbid inners that require a calendar anchor with a clear error.

Copilot uses AI. Check for mistakes.
Comment on lines +831 to +855
if self.reporting_schedule == "regular":
if obs is None:
return
n_periods = self._n_periods(n_total, first_day_dow)
obs = jnp.asarray(obs)
if obs.ndim != 2:
raise ValueError(
f"Observation '{self.name}': regular-schedule obs must "
f"be 2D (n_periods, n_observed_subpops); got shape {obs.shape}"
)
if obs.shape[0] != n_periods:
raise ValueError(
f"Observation '{self.name}': obs dimension 0 length "
f"{obs.shape[0]} must equal n_periods ({n_periods}). "
f"Pad with NaN for unobserved periods."
)
if subpop_indices is not None:
n_observed = jnp.asarray(subpop_indices).shape[0]
if obs.shape[1] != n_observed:
raise ValueError(
f"Observation '{self.name}': obs dimension 1 length "
f"{obs.shape[1]} must equal len(subpop_indices) "
f"({n_observed})"
)
return
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubpopulationCounts.validate_data allows obs to be provided with subpop_indices=None (regular and irregular schedules), but SubpopulationCounts.sample always raises when subpop_indices is missing. This can let validate_data pass while sampling fails later. Either require subpop_indices in validate_data whenever obs is not None, or define a default behavior (e.g., all subpops) and implement it consistently in sample/validate_data.

Copilot uses AI. Check for mistakes.
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread pyrenew/latent/temporal_processes.py Outdated
Copy link
Copy Markdown
Collaborator

@damonbayer damonbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion. Thanks @cdc-mitzimorris!

@cdc-mitzimorris cdc-mitzimorris merged commit c7a447d into main Apr 28, 2026
8 checks passed
@cdc-mitzimorris cdc-mitzimorris deleted the mem_789_temporal_aggregation branch April 28, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants