Status
All systems operational.
Last updated 2026-04-23 08:44 UTC · Checked every 30 seconds.
API
OperationalWeb app
OperationalWebhooks
OperationalDocs
OperationalAuthentication (SSO, SAML, SCIM)
OperationalRecent incidents
All with full post-mortem. We publish these voluntarily; it's the only way to earn trust back after breaking something.
Delayed webhook delivery in EU region
Outbound webhooks from the EU (eu-frankfurt-1) region were delayed by 6–18 minutes for 44 minutes.
Post-mortem
A stuck consumer in our webhook delivery service backed up the EU queue. We caught it on an alert and restarted the consumer group. All queued webhooks delivered; no loss. Action items: added a saturation alert two standard deviations below the existing one, and an automated restart of stuck consumers after five minutes.
Partial outage on /api/v1/issues (POST)
POST /api/v1/issues returned 502 for approximately 11% of requests for 37 minutes.
Post-mortem
A deploy introduced a regression in our request validation middleware that caused a panic when the `cycle` field contained a specific nil-ish value. Rolled back at 10:21 UTC. Root-cause: an earlier refactor replaced a null-check with optional chaining that still evaluated the right-hand side. Action items: added a test case for the nil-ish cycle value, blocked the regression from landing via a pre-commit linting rule.
Login latency elevated globally
Login p95 latency rose from 280ms to 2.4s for 38 minutes.
Post-mortem
An index we rely on in the session table had not been rebuilt after the previous weekend's migration. Rebuilt the index and latency returned to baseline within three minutes. Action items: automated index-health check in the pre-deploy pipeline; weekly vacuum analyse for the sessions table.