curated_compose/OTEL_INTEGRATION_NOTES.md
2026-06-16 11:20:25 -04:00

139 lines
5.8 KiB
Markdown

# OTEL / Observability Integration Notes
> **Last updated**: 2026-06-16
> **Author**: Agent Zero analysis
> **Scope**: All `curated_compose` stacks
---
## TL;DR
- **SigNoz** is the central OTEL backend (traces, metrics, logs, APM in one platform).
- **Alloy** and **LGTM** are deprecated — replaced by SigNoz.
- **n8n** → SigNoz directly (✅ working).
- **Langfuse** → SigNoz (Langfuse's own self-traces, ✅ working).
- **Headroom** → Langfuse (intentional — LLM-specific observability).
- **Chroma** → ❌ not wired (env vars exist but compose ignores them).
- **Dify** → ❌ no OTEL support yet.
---
## Stack-by-Stack Telemetry Status
| Stack | Sends to SigNoz? | Sends to Langfuse? | Configured? | Notes |
|-------|-----------------|-------------------|-------------|-------|
| **SigNoz** | — | — | ✅ | Receives OTLP gRPC `:4317`, HTTP `:4318` |
| **n8n** | ✅ Yes | ❌ No | ✅ Active | `OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318` on main + worker |
| **Langfuse** | ✅ Yes | — | ✅ Active | Own traces to SigNoz; stack at `docker/langfuse/compose.yaml` |
| **Headroom** | ❌ No | ✅ Yes | ✅ Active | `OTEL_EXPORTER_OTLP_ENDPOINT=http://langfuse-web:3000/api/public/otel/v1` |
| **Chroma** | ❌ No | ❌ No | ❌ Not wired | `.env.example` has `CHROMA_OPEN_TELEMETRY__ENDPOINT`, compose ignores it |
| **Dify** | ❌ No | ❌ No | ❌ None | No OTEL env vars in compose or `.env.example` |
| **Zitadel** | ✅ Yes | ❌ No | ✅ Active | `ZITADEL_INSTRUMENTATION_TRACE_EXPORTER_ENDPOINT=http://lgtm:4318` |
---
## Architecture
```
Headroom Proxy ──OTEL──→ Langfuse ──OTEL──→ SigNoz
└── ClickHouse (analytics)
└── Postgres (metadata)
└── Redis (queues)
└── MinIO (S3 storage)
n8n (main + worker) ──OTEL──→ SigNoz
Zitadel ──────────────OTEL──→ SigNoz
[Chroma] ──❌──→ SigNoz
[Dify] ──❌──→ SigNoz
```
### Why Headroom → Langfuse (not direct to SigNoz)?
Langfuse is purpose-built for LLM observability — it tracks cost per token, prompt versions, user attribution, and LLM-specific metrics that SigNoz doesn't natively understand. Headroom's traces are most valuable inside Langfuse.
Langfuse then exports its own internal traces to SigNoz for infrastructure-wide correlation.
---
## Migration Notes (Alloy + LGTM → SigNoz)
### What changed
- **Removed**: `docker/alloy/` stack (OTEL collector) and `docker/lgtm/` stack (Grafana all-in-one)
- **Added**: `docker/signoz/` stack (all-in-one observability: collector + UI + storage)
- **SigNoz pipeline aliases**: `signoz`, `otel`, `lgtm` — existing stacks referencing `lgtm:4318` or `otel:4318` continue to work without changes
### Cross-stack endpoint mapping
| Old | New | Notes |
|-----|-----|-------|
| `alloy:4317` (gRPC) | `signoz:4317` (gRPC) | Same port, new host |
| `alloy:4318` / `lgtm:4318` (HTTP) | `signoz:4318` (HTTP) | Same port, new host |
| `lgtm:3000` (Grafana UI) | `signoz:3301` (SigNoz UI) | Different port |
### Stacks that need endpoint updates
Stacks with hardcoded `lgtm:4318` in their compose will still resolve via the `lgtm` alias on the `pipeline` network. No immediate changes required, but consider updating to `signoz:4318` for clarity:
- `docker/n8n/docker-compose.yaml``OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318`
- `docker/langfuse/compose.yaml``OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318`
- `docker/chroma/compose.yaml``CHROMA_OPEN_TELEMETRY__ENDPOINT=http://lgtm:4318`
- `docker/zitadel/compose.yaml``ZITADEL_INSTRUMENTATION_TRACE_EXPORTER_ENDPOINT=http://lgtm:4318`
---
## Known Issues / Action Items
### 🔴 Chroma — OTEL Not Wired
**Problem**: `docker/chroma/.env.example` defines:
```
CHROMA_OPEN_TELEMETRY__ENDPOINT=
CHROMA_OPEN_TELEMETRY__SERVICE_NAME=chromadb
OTEL_EXPORTER_OTLP_HEADERS=
```
But `docker/chroma/compose.yaml` does **not** pass these env vars into the `chroma` service.
**Fix**: Add to `compose.yaml` service `environment:`:
```yaml
CHROMA_OPEN_TELEMETRY__ENDPOINT: ${CHROMA_OPEN_TELEMETRY__ENDPOINT:-http://signoz:4318}
CHROMA_OPEN_TELEMETRY__SERVICE_NAME: ${CHROMA_OPEN_TELEMETRY__SERVICE_NAME:-chromadb}
OTEL_EXPORTER_OTLP_HEADERS: ${OTEL_EXPORTER_OTLP_HEADERS:-}
```
### 🟡 Dify — No OTEL Support
**Problem**: Dify doesn't expose OTEL configuration natively. It's Python/Flask-based but there's no auto-instrumentation or manual instrumentation in the current compose.
**Recommendation**: Wait for upstream Dify to add native OTEL support. Do not create custom patches per SKILL.md conventions.
### 🟡 Unified Log Collection
All stacks emit container logs. For collecting these into SigNoz/Loki:
- **Option A** (simplest): Configure Docker daemon with Loki log driver globally on Unraid.
- **Option B** (per-stack): Add Promtail sidecar to each compose.
**Recommendation**: Option A — configure once at the Docker daemon level.
---
## Files Referenced
| File | Purpose |
|------|---------|
| `docker/signoz/compose.yaml` | SigNoz observability stack (replaces Alloy + LGTM) |
| `docker/signoz/.env.example` | SigNoz config |
| `docker/chroma/compose.yaml` | Chroma vector DB stack |
| `docker/chroma/.env.example` | Chroma config (OTEL vars present) |
| `docker/dify/docker-compose.yaml` | Dify LLM platform |
| `docker/dify/.env.example` | Dify config (no OTEL vars) |
| `docker/headroom/compose.yaml` | Headroom LLM proxy |
| `docker/langfuse/compose.yaml` | Langfuse observability |
| `docker/n8n/docker-compose.yaml` | n8n automation |
| `docker/n8n/.env.example` | n8n config (OTEL vars present) |
| `docker/zitadel/compose.yaml` | Zitadel IAM |
| `SKILL.md` | Homelab conventions and design rules |