Summary
There is no monitoring or alerting configured for the Trivela system on mainnet. Contract pause events, backend downtime, Soroban RPC degradation, or abnormal error rates will go undetected until users report them. For a mainnet platform, proactive monitoring is non-negotiable.
Problem
- Backend
/health and /metrics endpoints exist but nothing consumes them with alerting rules
- No alerting on: contract paused event, backend 5xx spike, RPC health check failure, campaign DB write errors
- No uptime monitoring (e.g. UptimeRobot, BetterStack) configured
- No runbook for common failure scenarios
Acceptance Criteria
Prometheus/Grafana (self-hosted option)
Soroban Event Monitoring
Runbook
References
backend/src/index.js — /metrics endpoint (Prometheus format)
compose.yaml
docs/ARCHITECTURE_OVERVIEW.md
Summary
There is no monitoring or alerting configured for the Trivela system on mainnet. Contract pause events, backend downtime, Soroban RPC degradation, or abnormal error rates will go undetected until users report them. For a mainnet platform, proactive monitoring is non-negotiable.
Problem
/healthand/metricsendpoints exist but nothing consumes them with alerting rulesAcceptance Criteria
Prometheus/Grafana (self-hosted option)
prometheus.ymlscrape config targeting the backend/metricsendpointalerting_rules.ymlwith alerts for:degradedfor > 2 minmonitoring/dashboards/trivela.json) with: request rate, error rate, uptime, route breakdownmonitoring/directory with compose override (compose.monitoring.yml) for local Prometheus + GrafanaSoroban Event Monitoring
pausedevent is indexed (from issue Backend: Implement Soroban event indexer worker #283 indexer)Runbook
docs/RUNBOOK.mdwith procedures for: backend restart, RPC failover, contract pause response, DB backup restoreReferences
backend/src/index.js—/metricsendpoint (Prometheus format)compose.yamldocs/ARCHITECTURE_OVERVIEW.md