Monitoring

Overview

Comprehensive monitoring system for tracking chaos test execution, system health, and performance metrics.

Metrics Collected

System Health

  • CPU/Memory usage of AI engine workers

  • Queue depth and processing latency

  • API response times

  • Error rates and types

Test Execution

  • Active test count

  • Test completion rate

  • Average test duration

  • Success/failure ratios

Token Economics

  • Transaction volume

  • Fee collection rate

  • Stake amounts and duration

  • Governance participation rate

Implementation

Prometheus Integration

import { Registry, Counter, Gauge } from 'prom-client';

// Initialize metrics
const registry = new Registry();
const activeTests = new Gauge({
    name: 'glitch_active_tests',
    help: 'Number of currently running chaos tests'
});
const testCompletions = new Counter({
    name: 'glitch_test_completions_total',
    help: 'Total number of completed tests'
});

Grafana Dashboards

  • System Overview

  • Test Execution Metrics

  • Token Economics

  • Governance Activity

Alerting Rules

groups:
  - name: glitch-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(glitch_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
      - alert: QueueBacklog
        expr: glitch_queue_depth > 100
        for: 5m
        labels:
          severity: critical

Setup Instructions

  1. Install Dependencies:

npm install prom-client winston @opentelemetry/api
  1. Configure Prometheus:

scrape_configs:
  - job_name: 'glitch-gremlin'
    static_configs:
      - targets: ['localhost:9090']
  1. Start Monitoring:

import { startMetricsServer } from './monitoring';
await startMetricsServer(9090);

Best Practices

  1. Regular Metric Review

  • Monitor error rates daily

  • Review performance weekly

  • Analyze token metrics monthly

  1. Alert Configuration

  • Set appropriate thresholds

  • Avoid alert fatigue

  • Document response procedures

  1. Dashboard Organization

  • Group related metrics

  • Use clear labels

  • Include helpful descriptions

Next Steps

Last updated