Monitoring — logs and sync health
A healthy Polaris Express install has two log streams worth watching:
the server stream (Pino on stdout from the web container) and
the device stream (structured logs shipped from iOS through
Sync runs). This runbook covers day-to-day
inspection, what “healthy” looks like, and which signals matter.
When to run this
Section titled “When to run this”- Daily, briefly — glance at error counts in the admin console.
- After a release — tail logs for 10–15 minutes following a
docker compose up -d. - On a user report — pull the affected device’s log timeline.
- Weekly — check
device_logstable size and retention.
What you’re looking at
Section titled “What you’re looking at”Two producers, one wire format (OTelLogRecord):
Directoryweb (server)
- Pino → stdout →
docker logs
- Pino → stdout →
DirectoryiOS app (device)
- swift-log → JSONL ring buffer
- →
POST /api/devices/me/state/sync - → Postgres
device_logs - → SSE
/api/admin/devices/{id}/logs-stream
Both sides scrub PII (emails, JWTs, bearer tokens, E.164 phone
numbers, Authorization / Cookie / card_* keys) before anything
hits disk or wire. You do not need to add a scrubber.
Server logs
Section titled “Server logs”Tail the web container
Section titled “Tail the web container”docker compose logs -f --tail=200 webPino emits JSON. For human-readable output, pipe through
pino-pretty:
docker compose logs -f --tail=200 web | docker run -i --rm node:20 \ npx -y pino-prettySeverity levels
Section titled “Severity levels”severity_text | Pino name | When to care |
|---|---|---|
| TRACE / DEBUG | trace/debug | Only with LOG_LEVEL
= info set lower. Noisy. |
| INFO | info | Normal traffic. Sample, don’t read. |
| WARN | warn | Worth a scan. Often recoverable. |
| ERROR | error | Page-worthy if sustained. Always investigate. |
| FATAL | fatal | Process is dying. Restart loops likely. |
Useful filters
Section titled “Useful filters”# Errors and worse, last 1000 linesdocker compose logs --tail=1000 web | jq 'select(.severity_number >= 17)'
# Anything tagged with a specific request iddocker compose logs --tail=5000 web | jq 'select(.attributes.req_id == "abc-123")'
# Slow handlersdocker compose logs --tail=5000 web | \ jq 'select(.attributes.duration_ms > 1000)'Device logs
Section titled “Device logs”iOS devices buffer logs locally in a 5 MB ring and flush up to 100
records on each Sync run. The server inserts
them into device_logs keyed by (device_id, seq) and they
appear in the admin console.
Find them in the admin console
Section titled “Find them in the admin console”- Sign in to the admin console.
- Open Devices → [device].
- Scroll to the Logs card.
- Use the severity, category, and time-range filters at the top.
- Click Live tail to open the SSE stream.
The card paginates with keyset pagination (beforeSeq / afterSeq),
so jumping deep into history is cheap.
Query the API directly
Section titled “Query the API directly”For scripted checks:
# Most recent 100 errors+ for a devicecurl -s --cookie "$ADMIN_COOKIE" \ "https://admin.example.com/api/admin/devices/$DEVICE_ID/logs?severity=ERROR&limit=100" \ | jq '.logs[] | {ts: .observed_timestamp, msg: .body, cat: .attributes.category}'# Live tail with curl (Ctrl-C to stop)curl -N --cookie "$ADMIN_COOKIE" \ "https://admin.example.com/api/admin/devices/$DEVICE_ID/logs-stream"Sync health
Section titled “Sync health”Device logs only arrive if devices are syncing. Watch these signals.
”Healthy” looks like
Section titled “”Healthy” looks like”- Each managed device produces at least one log batch per 15 minutes during business hours.
device_logs.observed_tsfor the latest row per device is within the device’s expected Adaptive Cadence window (default ≤ 5 min for active devices).- Server-side
device_logs_size_alarmcron has not fired in the last 24 h.
Quiet-device query
Section titled “Quiet-device query”Find devices that haven’t synced in an hour:
SELECT d.id, d.name, COALESCE(MAX(dl.observed_ts), 'never'::text::timestamptz) AS last_logFROM devices dLEFT JOIN device_logs dl ON dl.device_id = d.idGROUP BY d.id, d.nameHAVING COALESCE(MAX(dl.observed_ts), '1970-01-01'::timestamptz) < now() - interval '1 hour'ORDER BY last_log NULLS FIRST;Causes, in rough order of likelihood:
- Device is offline / app backgrounded for too long.
- App was force-quit; logs are queued but won’t flush until next foreground.
- Cert pinning or DNS broke after a hostname change.
- The device’s auth token expired and the user hasn’t signed back in.
Server-side ingest errors
Section titled “Server-side ingest errors”Failed POST /api/devices/me/state/sync requests will show up in
the server log stream:
docker compose logs --tail=10000 web | \ jq 'select(.req.url == "/api/devices/me/state/sync" and .res.statusCode >= 400)'A spike of 401s usually means an auth-token rotation issue; 409s usually mean an idempotency-key collision (benign — the client should retry).
Configure environment variables
Section titled “Configure environment variables”| Variable | Default | Required | Source | Notes |
|---|---|---|---|---|
LOG_LEVEL | info | no | web/.env | Pino level. trace, debug, info, warn, error, fatal. |
LOG_FORMAT | json | no | web/.env | Set to pretty only in dev; JSON is what shippers need. |
DEVICE_LOG_RETENTION_DAYS | 7 | no | web/.env | Pruned every 6 h by device_logs_retention_prune. |
DEVICE_LOG_SIZE_ALARM_BYTES | 1073741824 | no | web/.env | 1 GB. Alarms via the daily cron when device_logs exceeds this. |
SSE_MAX_CONNECTIONS | 100 | no | web/.env | Cap on concurrent /logs-stream subscribers. |
Verify
Section titled “Verify”After a fresh install or upgrade, run through this checklist.
-
Server is logging JSON
Terminal window docker compose logs --tail=5 web | jq '.severity_text' | sort -uExpect a non-empty list (
"INFO"at minimum). Ifjqcomplains about parse errors, you haveLOG_FORMATset topretty— fine in dev, wrong in prod. -
Device logs are landing
SELECT count(*), max(observed_ts) FROM device_logs;Within 15 minutes of devices being online,
countshould be nonzero andmax(observed_ts)should be recent. -
Admin console can read them
Open Devices → [any device] → Logs. You should see rows. Click Live tail, watch for a heartbeat or any incoming record.
-
Retention cron is registered
Terminal window docker compose logs web | grep -i 'device_logs_retention_prune'You should see the cron register at boot and fire every 6 h.
If something goes wrong
Section titled “If something goes wrong”No device logs are landing. Check the sync endpoint for 4xx/5xx in the server log:
docker compose logs web | jq 'select(.req.url | test("state/sync"))'If you see only successes but device_logs is empty, the client
build may be older than the log-shipping change (pre-logs? in the
sync envelope). Update the iOS app.
Live tail shows nothing despite new rows appearing.
You’re probably behind a proxy that buffers responses. Disable
buffering for /api/admin/devices/*/logs-stream:
location ~ ^/api/admin/devices/.+/logs-stream$ { proxy_buffering off; proxy_cache off; proxy_read_timeout 1h; proxy_pass http://web;}Also confirm you aren’t running multiple web replicas (see the
caution above).
device_logs table is enormous.
Two likely causes:
-
Retention cron failed to run. Check
docker compose logs web | grep device_logs_retention_prune. -
A misbehaving device is spamming logs. Find it:
SELECT device_id, count(*) FROM device_logsWHERE observed_ts > now() - interval '1 day'GROUP BY device_id ORDER BY count DESC LIMIT 10;If one device dominates, inspect its log stream for a tight error loop and consider remote-disabling it via the admin console until you can patch the app.
Pino is logging objects as [Object].
You’re piping through a pretty formatter that doesn’t expand
nested attributes. Use raw JSON + jq instead.
Audit and rollback
Section titled “Audit and rollback”This runbook only performs reads. The only mutation paths are:
- Changing
LOG_LEVEL— restartwebto revert. - Manual
TRUNCATE device_logs— irreversible. Take apg_dumpof the table first if you’re not certain.
Migration path
Section titled “Migration path”When device_logs exceeds ~5 GB sustained, or you want full-text
search across iOS and server logs together, swap the sink to
Grafana Loki (or VictoriaLogs) without changing the wire format.
See the log-format reference for the
migration outline.