Stream processing checkpointing and recovery hardening
Back to Changelog
We invested in the durability story for the Flink stream processor after observing missed views during deploys.
- Checkpointing intervals, tolerable failure counts and state backend settings were tuned and externalized as environment variables.
- The Kafka source offset behavior was made explicit so the first deployment after a restart resumes at the right position instead of skipping ahead to
latest. - View finalization logic was rewritten to ensure each session is emitted exactly once, even across operator failures and replays.
- Eliminates the class of incidents where views were not captured during Flink downtime or when the service came back up.