System Status — Reference
The System Status surface is the internal-operator dashboard for platform health. It surfaces uptime indicators, per-service health checks, recent incidents, and the bridge to the externally published status page. Anything that answers the "is this part of SGEN healthy right now?" question for an operator traces back to a view managed here.
This page is a reference for platform engineers and integrators who need to understand the surface before extending it, scripting against it, or wiring it into an internal dashboard. Customer-facing how-tos live in the customer docs set; this page describes the shape of the surface, not the steps to drive it.
Overview
System Status lives under the System Status module in SG-Admin. The module renders three primary views — the overview dashboard (per-service traffic-light grid), the incident timeline (recent and active incidents with timestamps and operator notes), and the configuration page (status-page sync settings, notification routes) — and exposes a small set of write operations for incident open, incident update, incident close, mute a service, and unmute.
The module reads from two upstream sources. Service health is consumed from the platform's internal health-check signal — a periodic probe of each service that returns a structured pass / degraded / fail verdict. Recent incidents are consumed from the incident store, which is appended to by operators during recovery work and by the automated incident-detection layer.
A second surface — status-page sync — bridges System Status to the externally published status page. When sync is enabled, incident lifecycle events (open, update, close) are mirrored to the public status page, with operator-supplied summaries softened to customer-facing language before publication.
Where it lives in SG-Admin:
- Sidebar: SG-Admin → System Status
- URL prefix:
/sg-admin/system-status/ - View templates:
application/views/Admin/SystemStatus/
┌──────────────────────────────────────────────────────────────────────┐│ SG-Admin → System Status Last refresh: 0:32 │├──────────────────────────────────────────────────────────────────────┤│ Service Status Last check Uptime (30d) ││ ─────────────────── ───────── ─────────── ──────────────── ││ Admin (SG-Admin) ● healthy 0:31 ago 99.98% ││ Public site renderer ● healthy 0:30 ago 99.99% ││ Builder (SG-Builder) ● healthy 0:32 ago 99.95% ││ Media pipeline ◐ degraded 0:29 ago 99.71% ← active ││ Backups worker ● healthy 0:30 ago 99.99% ││ Email delivery ● healthy 0:31 ago 99.92% ││ Activity Log ingest ● healthy 0:32 ago 99.99% ││ ││ Active incident: Media pipeline — increased latency [Open ticket] │└──────────────────────────────────────────────────────────────────────┘Actions
The System Status surface exposes the following operations. Each is described by what it does to the data, not by its internal method name.
Read overview
Returns the current health verdict for every monitored service, the last-check timestamp per service, and the rolling uptime percentages for 24-hour and 30-day windows. Health verdicts are consumed from the upstream health-check signal and are typically less than two minutes stale.
Read incident timeline
Returns the recent incident records, ordered newest first, with open / closed status, affected services, summary, and operator notes. Supports filtering by status (active only, closed only, all) and by service.
Open an incident
Creates a new incident record. Required at minimum: affected service, summary, severity (sev1, sev2, sev3, sev4). Optional: detail body, expected resolution time, mirror-to-status-page flag. On submit, the record is written and — if mirroring is enabled — the status page is updated.
Update an incident
Appends a new note to an existing incident. Notes carry their own timestamp and operator identifier, so the incident reads as a timeline of operator actions. If the incident is mirrored to the public status page, the note is softened to customer-facing language and posted as an update.
Close an incident
Marks the incident closed and records the resolution time. The summary may be edited at close to reflect the final understanding of what happened. Mirrored incidents post a final update to the status page on close.
Mute a service
Suppresses notifications for a single service for a configured duration. Used during planned maintenance to avoid noise from expected failures. The service's health verdict continues to be recorded; only the operator notifications are paused.
Unmute
Reverses a mute. Notifications resume immediately on the next health verdict.
Set status-page sync
Stores the configuration for the status-page bridge: target status-page URL, authentication token, mirror-enabled flag, and the severity threshold for auto-mirroring (sev1 always mirrors, sev2 and below are operator-choice per incident).
Set notification routes
Stores where status-change alerts are sent: email distribution list, internal chat channel, on-call rotation reference. Routes are evaluated per incident based on affected service and severity.
Data model
An incident record carries the following fields. Field names below are the conceptual shape — the on-disk column names match closely but are not contractually stable across releases.
| Field | Type | Notes |
|---|---|---|
id | integer | Primary key. Stable across edits. |
service | string | Service slug. One of the monitored service identifiers. |
severity | enum | One of sev1, sev2, sev3, sev4. |
status | enum | One of active, closed. |
summary | string | Short label, customer-readable when softened. |
detail | string | Longer operator body. Not surfaced on the public status page. |
opened_at | timestamp | Set on initial open. |
closed_at | timestamp | Set on close, NULL while active. |
opened_by | integer | User identifier. |
mirror_to_status_page | boolean | When true, this incident's lifecycle posts to the public status page. |
| Field | Type | Notes |
|---|---|---|
incident_id | integer | Foreign key. |
note | string | Body. |
posted_at | timestamp | When the note was added. |
posted_by | integer | User identifier. |
mirrored | boolean | Whether this note was posted to the public status page. |
| Field | Type | Notes |
|---|---|---|
service | string | Service slug. |
verdict | enum | One of healthy, degraded, failed. |
checked_at | timestamp | Probe time. |
latency_ms | integer | Optional, when the check measures response time. |
error_code | string | Optional, when the verdict is degraded or failed. |
INCIDENT RECORD├── id integer primary key├── service string slug├── severity enum sev1 | sev2 | sev3 | sev4├── status enum active | closed├── summary string softened for public mirror├── detail string operator body, not mirrored├── opened_at / closed_at timestamp├── opened_by integer└── mirror_to_status_page booleanINCIDENT NOTES (append-only)├── incident_id (FK)├── note · posted_at · posted_by└── mirrored (boolean)SERVICE HEALTH (read-only stream)├── service · verdict (healthy | degraded | failed)├── checked_at · latency_ms└── error_code (optional)Permissions
Access to the System Status surface is gated at two layers.
Layer 1 — admin gate. Every action under SG-Admin passes through the platform's standard admin access check at request entry. An unauthenticated request never reaches the System Status surface.
Layer 2 — per-action capability. Within SG-Admin, each System Status action checks a capability associated with the calling operator's role. The default role configuration ships with three roles — Administrator, Editor, Viewer — and the capability map is:
| Capability | Administrator | Editor | Viewer |
|---|---|---|---|
| Read overview | ✔ | ✔ | ✔ |
| Read incident timeline | ✔ | ✔ | ✔ |
| Open an incident | ✔ | ✔ | — |
| Update an incident | ✔ | ✔ | — |
| Close an incident | ✔ | ✔ | — |
| Mute a service | ✔ | — | — |
| Unmute | ✔ | — | — |
| Set status-page sync | ✔ | — | — |
| Set notification routes | ✔ | — | — |
Severity rules. sev1 incidents always mirror to the public status page regardless of the per-incident mirror flag — the platform contract requires customer transparency for the highest severity. The mirror flag controls behavior for sev2 through sev4 only.
Audit. Every write — open, update, close, mute, unmute, sync configuration change, notification-route change — emits an Activity Log entry. The log records the acting operator, the affected service, and the change shape. The incident note stream is its own audit trail and is preserved alongside the Activity Log.
HEALTH SIGNAL│▼┌───────────────────────────┐│ Verdict = degraded/failed │ auto-incident OR operator-opened└─────────────┬─────────────┘▼┌───────────────────────────┐│ Incident OPEN │ severity assigned · routes notified│ │ mirror flag evaluated└─────────────┬─────────────┘▼┌──────────────┐│ sev1 OR ││ mirror=true? │└──────┬───────┘no │ yes▼ ▼private PUBLIC STATUS PAGEincident posts initial update│▼operator notes appended│▼incident CLOSED│▼final mirror update(if mirrored)Related references
- Activity Log — Reference. Records every System Status write. Notification routes and sync configuration changes show up there.
- Notifications — Reference. Owns the message-delivery surface that System Status calls into. Notification route slugs resolve against the Notifications module.
- Settings — Reference. Hosts the role definitions and the global on-call rotation reference. Changes there reshape the System Status permission map.
- Tools — Reference. Internal operator tooling that pairs with System Status during incident response — log search, diagnostic snapshots, escalation routing.
- Cron Jobs — Reference. The health-check pipeline runs as a cron job. Schedule changes there change the freshness of the verdict stream.
- API Keys — Reference. The status-page sync token is stored alongside other outbound integration credentials.
/docs/system-status customer doc, because the customer surface is the status page.