Monitoring

YokedCache exposes health checks, in-process stats, and pluggable metrics collectors for Prometheus, StatsD, and OpenTelemetry.


Health checks

# Boolean—is the backend reachable?
is_healthy = await cache.health()

# Full diagnostics
info = await cache.detailed_health_check()

detailed_health_check() returns:

{
    "status": "healthy",          # "healthy" or "unhealthy"
    "backend_type": "redis",
    "redis_connected": True,
    "connection_pool": {
        "available": 48,
        "in_use": 2,
        "max": 50,
    },
    "circuit_breaker": {
        "state": "closed",        # "closed", "open", or "half_open"
        "failure_count": 0,
    },
    "hit_rate": 0.87,
    "uptime_seconds": 7200,
}

FastAPI health endpoint

from fastapi.responses import JSONResponse

@app.get("/health")
async def health():
    info = await cache.detailed_health_check()
    status_code = 200 if info["status"] == "healthy" else 503
    return JSONResponse(info, status_code=status_code)

CLI

yokedcache ping              # connection check
yokedcache stats             # snapshot of current stats
yokedcache stats --watch     # live refresh every 2s

In-process stats

Without any extras:

stats = await cache.get_stats()

print(f"Hit rate:      {stats.hit_rate:.1%}")
print(f"Miss rate:     {stats.miss_rate:.1%}")
print(f"Keys:          {stats.key_count}")
print(f"Memory:        {stats.memory_usage_mb:.1f} MB")
print(f"Total ops:     {stats.total_operations}")
print(f"Hits:          {stats.cache_hits}")
print(f"Misses:        {stats.cache_misses}")
print(f"Uptime:        {stats.uptime_seconds}s")

Metrics (Prometheus + StatsD)

Requires pip install "yokedcache[observability]" or pip install "yokedcache[monitoring]".

Available metrics

Metric Type Description
cache.gets.total Counter Total GET operations
cache.sets.total Counter Total SET operations
cache.deletes.total Counter Total DELETE operations
cache.hits.total Counter Cache hits
cache.misses.total Counter Cache misses
cache.hit_rate Gauge Current hit rate (0–1)
cache.size_bytes Gauge Approximate memory usage
cache.keys_count Gauge Number of keys
cache.operation_duration_seconds Histogram Operation latency
cache.invalidations.total Counter Total invalidations
cache.errors.total Counter Total errors
cache.circuit_breaker_state Gauge 0=closed, 1=half-open, 2=open

Prometheus

from yokedcache import YokedCache
from yokedcache.config import CacheConfig
from yokedcache.monitoring import CacheMetrics, PrometheusCollector

metrics = CacheMetrics([
    PrometheusCollector(
        namespace="myapp",   # metrics prefixed with "myapp_cache_"
        port=9100,           # metrics endpoint port
    )
])

cache = YokedCache(CacheConfig(redis_url="redis://..."), metrics=metrics)

Metrics endpoint: http://localhost:9100/metrics

Sample output:

# HELP myapp_cache_hits_total Total cache hits
# TYPE myapp_cache_hits_total counter
myapp_cache_hits_total 12470.0

# HELP myapp_cache_hit_rate Current cache hit rate
# TYPE myapp_cache_hit_rate gauge
myapp_cache_hit_rate 0.891

# HELP myapp_cache_operation_duration_seconds Cache operation duration
# TYPE myapp_cache_operation_duration_seconds histogram
myapp_cache_operation_duration_seconds_bucket{operation="get",le="0.001"} 10240.0
myapp_cache_operation_duration_seconds_bucket{operation="get",le="0.01"} 13890.0
myapp_cache_operation_duration_seconds_bucket{operation="get",le="+Inf"} 14000.0

Custom labels

PrometheusCollector(
    namespace="myapp",
    port=9100,
    labels={
        "environment": "production",
        "region": "us-east-1",
        "service": "user-api",
    },
)

Prometheus scrape config

# prometheus.yml
scrape_configs:
  - job_name: 'yokedcache'
    static_configs:
      - targets: ['app:9100']
    scrape_interval: 15s

Useful PromQL queries

# Hit rate over the last 5 minutes
rate(myapp_cache_hits_total[5m])
  /
rate(myapp_cache_gets_total[5m])

# 95th percentile GET latency
histogram_quantile(
  0.95,
  rate(myapp_cache_operation_duration_seconds_bucket{operation="get"}[5m])
)

# Error rate
rate(myapp_cache_errors_total[5m])

# Invalidations per second
rate(myapp_cache_invalidations_total[5m])

Alerting rules

# alerts.yml
groups:
  - name: yokedcache
    rules:
      - alert: CacheHitRateLow
        expr: |
          rate(myapp_cache_hits_total[5m])
          /
          rate(myapp_cache_gets_total[5m]) < 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 80% for 5 minutes"
          description: "Current rate: {{ $value | humanizePercentage }}"

      - alert: CacheLatencyHigh
        expr: |
          histogram_quantile(0.95,
            rate(myapp_cache_operation_duration_seconds_bucket{operation="get"}[5m])
          ) > 0.01
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Cache p95 GET latency above 10ms"

      - alert: CacheCircuitBreakerOpen
        expr: myapp_cache_circuit_breaker_state == 2
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Cache circuit breaker is open—backend may be down"

      - alert: CacheErrorRateHigh
        expr: rate(myapp_cache_errors_total[5m]) > 1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Cache error rate above 1/s"

StatsD

from yokedcache.monitoring import StatsDCollector

metrics = CacheMetrics([
    StatsDCollector(
        host="statsd.example.com",
        port=8125,
        prefix="myapp.cache",
        sample_rate=1.0,
    )
])

DataDog / DogStatsD with tags:

StatsDCollector(
    host="localhost",
    port=8125,
    prefix="myapp.cache",
    use_tags=True,  # enables DogStatsD tag format
)

Metrics emitted in real-time:

myapp.cache.gets:1|c|#result:hit
myapp.cache.gets:1|c|#result:miss
myapp.cache.hit_rate:0.89|g
myapp.cache.operation_duration:0.002|h|#operation:get

OpenTelemetry

pip install "yokedcache[tracing]"
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from yokedcache.monitoring import OpenTelemetryCollector

tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)

metrics = CacheMetrics([OpenTelemetryCollector(tracer_provider=tracer_provider)])
cache = YokedCache(CacheConfig(...), metrics=metrics)

Each cache operation becomes a span:

cache.get [2.1ms]
  ├── key: "user:42"
  ├── hit: true
  └── backend: redis

Combining collectors

Run Prometheus and StatsD simultaneously:

from yokedcache.monitoring import CacheMetrics, PrometheusCollector, StatsDCollector

metrics = CacheMetrics([
    PrometheusCollector(namespace="myapp", port=9100),
    StatsDCollector(host="statsd.example.com", port=8125, prefix="myapp.cache"),
])

cache = YokedCache(CacheConfig(...), metrics=metrics)

Grafana dashboard

Key panels to include:

Panel Query
Hit rate (gauge) rate(cache_hits_total[5m]) / rate(cache_gets_total[5m])
Operations/sec (graph) rate(cache_gets_total[1m]) + rate(cache_sets_total[1m])
p50 / p95 / p99 latency histogram_quantile(0.95, rate(cache_operation_duration_seconds_bucket[5m]))
Key count cache_keys_count
Memory usage cache_size_bytes / 1024 / 1024
Error rate rate(cache_errors_total[5m])
Circuit breaker state cache_circuit_breaker_state
Invalidations/sec rate(cache_invalidations_total[5m])

What to watch

Signal Healthy Warning
Hit rate > 80% < 60%
GET p95 latency < 5ms > 20ms
Error rate 0 > 0.1/s
Circuit breaker closed half-open / open
Key evictions 0 > 0/s (Redis out of memory)

Search documentation

Type to search. Fuzzy matching handles typos.