diff --git a/OTEL_INTEGRATION_NOTES.md b/OTEL_INTEGRATION_NOTES.md index 9c85048..d7bf0ab 100644 --- a/OTEL_INTEGRATION_NOTES.md +++ b/OTEL_INTEGRATION_NOTES.md @@ -1,6 +1,6 @@ # OTEL / Observability Integration Notes -> **Last updated**: 2026-06-15 +> **Last updated**: 2026-06-16 > **Author**: Agent Zero analysis > **Scope**: All `curated_compose` stacks @@ -8,9 +8,10 @@ ## TL;DR -- **LGTM** is the central OTEL backend (traces, metrics, logs via Grafana/Tempo/Loki/Prometheus). -- **n8n** → LGTM directly (✅ working). -- **Langfuse** → LGTM (Langfuse's own self-traces, ✅ working). +- **SigNoz** is the central OTEL backend (traces, metrics, logs, APM in one platform). +- **Alloy** and **LGTM** are deprecated — replaced by SigNoz. +- **n8n** → SigNoz directly (✅ working). +- **Langfuse** → SigNoz (Langfuse's own self-traces, ✅ working). - **Headroom** → Langfuse (intentional — LLM-specific observability). - **Chroma** → ❌ not wired (env vars exist but compose ignores them). - **Dify** → ❌ no OTEL support yet. @@ -19,38 +20,67 @@ ## Stack-by-Stack Telemetry Status -| Stack | Sends to LGTM? | Sends to Langfuse? | Configured? | Notes | -|-------|---------------|-------------------|-------------|-------| -| **LGTM** | — | — | ✅ | Receives OTLP gRPC `:4317`, HTTP `:4318` | +| Stack | Sends to SigNoz? | Sends to Langfuse? | Configured? | Notes | +|-------|-----------------|-------------------|-------------|-------| +| **SigNoz** | — | — | ✅ | Receives OTLP gRPC `:4317`, HTTP `:4318` | | **n8n** | ✅ Yes | ❌ No | ✅ Active | `OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318` on main + worker | -| **Langfuse** | ✅ Yes | — | ✅ Active | Own traces to LGTM; stack at `docker/langfuse/compose.yaml` | +| **Langfuse** | ✅ Yes | — | ✅ Active | Own traces to SigNoz; stack at `docker/langfuse/compose.yaml` | | **Headroom** | ❌ No | ✅ Yes | ✅ Active | `OTEL_EXPORTER_OTLP_ENDPOINT=http://langfuse-web:3000/api/public/otel/v1` | | **Chroma** | ❌ No | ❌ No | ❌ Not wired | `.env.example` has `CHROMA_OPEN_TELEMETRY__ENDPOINT`, compose ignores it | | **Dify** | ❌ No | ❌ No | ❌ None | No OTEL env vars in compose or `.env.example` | +| **Zitadel** | ✅ Yes | ❌ No | ✅ Active | `ZITADEL_INSTRUMENTATION_TRACE_EXPORTER_ENDPOINT=http://lgtm:4318` | --- ## Architecture ``` -Headroom Proxy ──OTEL──→ Langfuse ──OTEL──→ LGTM - │ - └── ClickHouse (analytics) - └── Postgres (metadata) - └── Redis (queues) - └── MinIO (S3 storage) +Headroom Proxy ──OTEL──→ Langfuse ──OTEL──→ SigNoz + │ + └── ClickHouse (analytics) + └── Postgres (metadata) + └── Redis (queues) + └── MinIO (S3 storage) -n8n (main + worker) ──OTEL──→ LGTM +n8n (main + worker) ──OTEL──→ SigNoz +Zitadel ──────────────OTEL──→ SigNoz -[Chroma] ──❌──→ LGTM -[Dify] ──❌──→ LGTM +[Chroma] ──❌──→ SigNoz +[Dify] ──❌──→ SigNoz ``` -### Why Headroom → Langfuse (not direct to LGTM)? +### Why Headroom → Langfuse (not direct to SigNoz)? -Langfuse is purpose-built for LLM observability — it tracks cost per token, prompt versions, user attribution, and LLM-specific metrics that Tempo/Grafana don't natively understand. Headroom's traces are most valuable inside Langfuse. +Langfuse is purpose-built for LLM observability — it tracks cost per token, prompt versions, user attribution, and LLM-specific metrics that SigNoz doesn't natively understand. Headroom's traces are most valuable inside Langfuse. -Langfuse then exports its own internal traces to LGTM for infrastructure-wide correlation. +Langfuse then exports its own internal traces to SigNoz for infrastructure-wide correlation. + +--- + +## Migration Notes (Alloy + LGTM → SigNoz) + +### What changed + +- **Removed**: `docker/alloy/` stack (OTEL collector) and `docker/lgtm/` stack (Grafana all-in-one) +- **Added**: `docker/signoz/` stack (all-in-one observability: collector + UI + storage) +- **SigNoz pipeline aliases**: `signoz`, `otel`, `lgtm` — existing stacks referencing `lgtm:4318` or `otel:4318` continue to work without changes + +### Cross-stack endpoint mapping + +| Old | New | Notes | +|-----|-----|-------| +| `alloy:4317` (gRPC) | `signoz:4317` (gRPC) | Same port, new host | +| `alloy:4318` / `lgtm:4318` (HTTP) | `signoz:4318` (HTTP) | Same port, new host | +| `lgtm:3000` (Grafana UI) | `signoz:3301` (SigNoz UI) | Different port | + +### Stacks that need endpoint updates + +Stacks with hardcoded `lgtm:4318` in their compose will still resolve via the `lgtm` alias on the `pipeline` network. No immediate changes required, but consider updating to `signoz:4318` for clarity: + +- `docker/n8n/docker-compose.yaml` — `OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318` +- `docker/langfuse/compose.yaml` — `OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318` +- `docker/chroma/compose.yaml` — `CHROMA_OPEN_TELEMETRY__ENDPOINT=http://lgtm:4318` +- `docker/zitadel/compose.yaml` — `ZITADEL_INSTRUMENTATION_TRACE_EXPORTER_ENDPOINT=http://lgtm:4318` --- @@ -69,7 +99,7 @@ But `docker/chroma/compose.yaml` does **not** pass these env vars into the `chro **Fix**: Add to `compose.yaml` service `environment:`: ```yaml -CHROMA_OPEN_TELEMETRY__ENDPOINT: ${CHROMA_OPEN_TELEMETRY__ENDPOINT:-http://lgtm:4318} +CHROMA_OPEN_TELEMETRY__ENDPOINT: ${CHROMA_OPEN_TELEMETRY__ENDPOINT:-http://signoz:4318} CHROMA_OPEN_TELEMETRY__SERVICE_NAME: ${CHROMA_OPEN_TELEMETRY__SERVICE_NAME:-chromadb} OTEL_EXPORTER_OTLP_HEADERS: ${OTEL_EXPORTER_OTLP_HEADERS:-} ``` @@ -80,15 +110,9 @@ OTEL_EXPORTER_OTLP_HEADERS: ${OTEL_EXPORTER_OTLP_HEADERS:-} **Recommendation**: Wait for upstream Dify to add native OTEL support. Do not create custom patches per SKILL.md conventions. -### 🟢 Langfuse Network Verification - -**Status**: Headroom joins external network `langfuse` with `name: langfuse_langfuse`. This is auto-created by Docker Compose from the `docker/langfuse/` directory. **This should work** on deployment. - -**Verify after deploy**: `docker network inspect langfuse_langfuse` should show both `langfuse-web` and `headroom-proxy` containers. - ### 🟡 Unified Log Collection -All stacks emit container logs. For collecting these into LGTM/Loki: +All stacks emit container logs. For collecting these into SigNoz/Loki: - **Option A** (simplest): Configure Docker daemon with Loki log driver globally on Unraid. - **Option B** (per-stack): Add Promtail sidecar to each compose. @@ -101,14 +125,15 @@ All stacks emit container logs. For collecting these into LGTM/Loki: | File | Purpose | |------|---------| +| `docker/signoz/compose.yaml` | SigNoz observability stack (replaces Alloy + LGTM) | +| `docker/signoz/.env.example` | SigNoz config | | `docker/chroma/compose.yaml` | Chroma vector DB stack | | `docker/chroma/.env.example` | Chroma config (OTEL vars present) | | `docker/dify/docker-compose.yaml` | Dify LLM platform | | `docker/dify/.env.example` | Dify config (no OTEL vars) | | `docker/headroom/compose.yaml` | Headroom LLM proxy | | `docker/langfuse/compose.yaml` | Langfuse observability | -| `docker/lgtm/docker-compose.yaml` | LGTM (OTEL backend) | -| `docker/lgtm/.env.example` | LGTM config | | `docker/n8n/docker-compose.yaml` | n8n automation | | `docker/n8n/.env.example` | n8n config (OTEL vars present) | +| `docker/zitadel/compose.yaml` | Zitadel IAM | | `SKILL.md` | Homelab conventions and design rules | diff --git a/docker/signoz/.env.example b/docker/signoz/.env.example new file mode 100644 index 0000000..0c16104 --- /dev/null +++ b/docker/signoz/.env.example @@ -0,0 +1,35 @@ +# ============================================================================= +# SigNoz — OpenTelemetry Observability Platform +# ============================================================================= +# Copy to .env and edit for your deployment. +# cp .env.example .env +# The actual .env is deployed by Dockhand and should not be committed. +# +# Replaces both Alloy (OTEL collector) and LGTM (Grafana/Prometheus/Tempo/Loki). +# All stacks should point their OTLP exporters to signoz on the pipeline network. +# ============================================================================= + +# ----------------------------------------------------------------------------- +# SigNoz Image Version +# ----------------------------------------------------------------------------- +# Pin a specific version for reproducibility. Check releases at: +# https://github.com/SigNoz/signoz/releases +SIGNOZ_VERSION=latest + +# ----------------------------------------------------------------------------- +# ClickHouse +# ----------------------------------------------------------------------------- +CLICKHOUSE_VERSION=25.5 +CLICKHOUSE_DB=signoz +CLICKHOUSE_USER=admin +CLICKHOUSE_PASSWORD=change-me-clickhouse-password + +# ----------------------------------------------------------------------------- +# Exposed Ports +# ----------------------------------------------------------------------------- +# SigNoz UI +EXPOSE_SIGNOZ_UI_PORT=3301 +# OTLP gRPC receiver (used by instrumented apps/services) +EXPOSE_OTLP_GRPC_PORT=4317 +# OTLP HTTP receiver (used by instrumented apps/services) +EXPOSE_OTLP_HTTP_PORT=4318 diff --git a/docker/signoz/compose.yaml b/docker/signoz/compose.yaml new file mode 100644 index 0000000..05acb08 --- /dev/null +++ b/docker/signoz/compose.yaml @@ -0,0 +1,92 @@ +name: signoz + +services: + # =========================================================================== + # ClickHouse — columnar storage for all telemetry data + # =========================================================================== + clickhouse: + image: clickhouse/clickhouse-server:${CLICKHOUSE_VERSION:-25.5} + restart: unless-stopped + environment: + CLICKHOUSE_DB: ${CLICKHOUSE_DB:-signoz} + CLICKHOUSE_USER: ${CLICKHOUSE_USER:-admin} + CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-change-me-clickhouse-password} + volumes: + - ./clickhouse-data:/var/lib/clickhouse + healthcheck: + test: wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1 + interval: 5s + timeout: 3s + retries: 30 + start_period: 10s + networks: + - signoz + + # =========================================================================== + # SigNoz — all-in-one observability platform (query service + UI + collector) + # =========================================================================== + # Replaces both Alloy (OTEL collector) and LGTM (Grafana/Prometheus/Tempo/Loki). + # Accepts OTLP gRPC (4317) and OTLP HTTP (4318) from all stacks. + # UI on port 3301. + # + # Docs: https://signoz.io/docs/install/docker/ + # =========================================================================== + signoz: + image: signoz/signoz:${SIGNOZ_VERSION:-latest} + restart: unless-stopped + depends_on: + clickhouse: + condition: service_healthy + environment: + SIGNOZ_TELEMETRY_STORE: clickhouse + DSN: tcp://clickhouse:9000 + CLICKHOUSE_USER: ${CLICKHOUSE_USER:-admin} + CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-change-me-clickhouse-password} + CLICKHOUSE_DATABASE: ${CLICKHOUSE_DB:-signoz} + STORAGE: clickhouse + CLICKHOUSE_ENDPOINT: tcp://clickhouse:9000 + SIGNOZ_CLICKHOUSE_DSN: tcp://clickhouse:9000 + ports: + # SigNoz UI + - ${EXPOSE_SIGNOZ_UI_PORT:-3301}:3301 + # OTLP gRPC receiver + - ${EXPOSE_OTLP_GRPC_PORT:-4317}:4317 + # OTLP HTTP receiver + - ${EXPOSE_OTLP_HTTP_PORT:-4318}:4318 + volumes: + - ./signoz-data:/var/lib/signoz + healthcheck: + test: + - CMD + - wget + - --no-verbose + - --tries=1 + - --spider + - http://localhost:3301/api/v1/health + interval: 15s + timeout: 5s + retries: 10 + start_period: 30s + networks: + signoz: {} + pipeline: + aliases: + - signoz + - otel + - lgtm + # swag: + # aliases: + # - signoz + +networks: + signoz: + name: signoz + driver: bridge + pipeline: + name: pipeline + external: true + # swag: + # name: swag + # external: true + +volumes: {} diff --git a/docker/signoz/swag/signoz.subdomain.conf b/docker/signoz/swag/signoz.subdomain.conf new file mode 100644 index 0000000..9a94fbe --- /dev/null +++ b/docker/signoz/swag/signoz.subdomain.conf @@ -0,0 +1,37 @@ +## ----------------------------------------------------------------------------- +## SWAG proxy config for SigNoz +## Domain: signoz.ld50.xyz +## Upstream: signoz:3301 (shared Docker network: ${NETWORKS_EXTERNAL_NAME:-swag}) +## +## Install: +## 1) Copy this file into SWAG: /config/nginx/proxy-confs/signoz.subdomain.conf +## 2) Ensure both stacks share the same external Docker network (e.g. `swag`). +## 3) In curated_compose/signoz/compose.yaml, uncomment the swag network + service attachment. +## 4) Reload SWAG. +## ----------------------------------------------------------------------------- + +server { + listen 443 ssl; + listen [::]:443 ssl; + + server_name signoz.ld50.xyz; + + include /config/nginx/ssl.conf; + + location / { + include /config/nginx/proxy.conf; + + set $upstream_app signoz; + set $upstream_port 3301; + set $upstream_proto http; + + proxy_pass $upstream_proto://$upstream_app:$upstream_port; + + # SigNoz UI uses WebSocket for live query results + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + + proxy_read_timeout 3600s; + proxy_send_timeout 3600s; + } +}