Closing the race window in SSE log streaming
After moving from backend-to-database polling into a broker, I found another small gap in the log flow…
The previous version was already much better because live logs no longer depended on polling PostgreSQL every 500ms. But there was still a race window between catching up missed logs from the database and subscribing to the broker. In that small gap, a new event could arrive and never reach the browser.
So I changed the order of the SSE flow: subscribe first, catch up missed logs from the database, drain any broker events that arrived during catch-up, then continue with the live stream. That sequence feels a lot safer because the handoff from persisted logs to real-time logs is now much tighter.
I also added support for Last-Event-ID, so when the browser reconnects it can continue from the last log it already received instead of starting blindly from scratch. That makes the SSE setup feel more like a real stream now, not just a best-effort live feed.
Another small thing I fixed here was handling slow subscribers. If a subscriber cannot keep up, I now close that channel and let the client reconnect again using Last-Event-ID. It felt better to do that than to keep a lagging subscriber around and silently drop events forever.
There was also a bit of cleanup in the job code itself. I extracted helper flows for failing and finishing a deployment, and I made some command failure paths return earlier so the deployment status, logs, and broker events stay more consistent.
So this one was less about adding something new on the surface, and more about making the real-time path less fragile. The main insight here is that even after switching to pub/sub, the tricky part is still the boundary between history and live events… that handoff needs to be handled carefully or logs can still go missing.