Stream processing checkpointing and recovery hardening

Back to Changelog
AUG 19, 2025
FixVideo Data

We invested in the durability story for the Flink stream processor after observing missed views during deploys.

  • Checkpointing intervals, tolerable failure counts and state backend settings were tuned and externalized as environment variables.
  • The Kafka source offset behavior was made explicit so the first deployment after a restart resumes at the right position instead of skipping ahead to latest.
  • View finalization logic was rewritten to ensure each session is emitted exactly once, even across operator failures and replays.
  • Eliminates the class of incidents where views were not captured during Flink downtime or when the service came back up.