Monitoring

Overview

Comprehensive monitoring system for tracking chaos test execution, system health, and performance metrics.

Metrics Collected

System Health

  • CPU/Memory usage of AI engine workers

  • Queue depth and processing latency

  • API response times

  • Error rates and types

Test Execution

  • Active test count

  • Test completion rate

  • Average test duration

  • Success/failure ratios

Token Economics

  • Transaction volume

  • Fee collection rate

  • Stake amounts and duration

  • Governance participation rate

Implementation

Prometheus Integration

Grafana Dashboards

  • System Overview

  • Test Execution Metrics

  • Token Economics

  • Governance Activity

Alerting Rules

Setup Instructions

  1. Install Dependencies:

  1. Configure Prometheus:

  1. Start Monitoring:

Best Practices

  1. Regular Metric Review

  • Monitor error rates daily

  • Review performance weekly

  • Analyze token metrics monthly

  1. Alert Configuration

  • Set appropriate thresholds

  • Avoid alert fatigue

  • Document response procedures

  1. Dashboard Organization

  • Group related metrics

  • Use clear labels

  • Include helpful descriptions

Next Steps

Last updated