TYPE: TOOL LANG: MARKDOWN

monitoring

Set up monitoring, logging, and observability for applications. Use when implementing dashboards, alerting, metrics collection, or troubleshooting.

DIRECTIVE_TEXTUELLE

Monitoring Skill

Implement monitoring, logging, and observability for applications.

When to Use

Use this skill when the user wants to:

  • Set up metrics collection
  • Create monitoring dashboards
  • Configure alerting rules
  • Implement structured logging
  • Set up distributed tracing
  • Work with observability tools
  • Performance monitoring

Monitoring Stack Options

Agent-based

  • Prometheus: Metrics collection
  • Grafana: Visualization and dashboards
  • Node Exporter: Node metrics
  • CAdvisor: Container metrics

Serverless

  • CloudWatch: AWS monitoring
  • Cloud Logging: Google Cloud Logging
  • App Insights: Azure Application Insights

Distributed Tracing

  • Jaeger: OpenTracing compatible
  • Zipkin: Distributed tracing
  • OpenTelemetry: Tracing standard

Metrics Types

Metrics

  • Counter: Monotonically increasing
  • Gauge: Can go up or down
  • Histogram: Value distribution
  • Summary: Value distribution + quantiles

Logging

  • Structured logs (JSON)
  • Log levels (ERROR, WARN, INFO, DEBUG)
  • Correlation IDs for tracing
  • Log rotation and retention

Best Practices

  • Log everything important: Errors, warnings, user actions
  • Use meaningful log levels: Don’t spam with DEBUG
  • Structure logs: JSON format with fields
  • Don’t log secrets: Never log passwords, tokens, API keys
  • Add context: Include request ID, user info, timestamps
  • Monitor what matters: Core business metrics, errors, SLIs

Alerting Strategy

  • Alarm on anomalies: Systematic errors, unusual patterns
  • Set reasonable thresholds: Avoid alert fatigue
  • Escalate appropriately: Critical issues get attention
  • Keep SLOs in mind: Alerts should align with SLOs

Deliverables

  • Monitoring configuration files
  • Dashboard definition
  • Alerting rules
  • Log aggregation setup
  • Documentation

Quality Checklist

  • Critical metrics are monitored
  • Alerts are actionable
  • Logs are structured and searchable
  • Metrics are properly labeled
  • Distributed tracing is configured
  • SLO/SLA alignment
[RETOUR_AUX_MODULES]

>> AUTRES_MODULES