Observability in Detail: VictoriaMetrics, VictoriaLogs, Grafana
Fabian Peter 8 Minuten Lesezeit

Observability in Detail: VictoriaMetrics, VictoriaLogs, Grafana

Golden Signals: Latency, Traffic, Errors, and Saturation in Practice
compliance-campaign-2026 observability victoriametrics victorialogs grafana golden-signals
Ganze Serie lesen (40 Artikel)

Diese Serie erklärt systematisch, wie moderne Software compliant entwickelt und betrieben wird – von EU-Regulierungen bis zur technischen Umsetzung.

  1. Compliance Compass: EU Regulations for Software, SaaS, and Cloud Hosting
  2. GDPR: Privacy by Design as the Foundation of Modern Software
  3. NIS-2: Cyber Resilience Becomes Mandatory for 18 Sectors
  4. DORA: ICT Resilience for the Financial Sector Starting January 2025
  5. Cyber Resilience Act: Security by Design for Products with Digital Elements
  6. Data Act: Portability and Exit Capability Become Mandatory from September 2025
  7. Cloud Sovereignty Framework: Making Digital Sovereignty Measurable
  8. How EU Regulations Interconnect: An Integrated Compliance Approach
  9. 15 Factor App: The Evolution of Cloud-Native Best Practices
  10. 15 Factor App Deep Dive: Factors 1–6 (Basics & Lifecycle)
  11. 15 Factor App Deep Dive: Factors 7–12 (Networking, Scaling, Operations)
  12. 15 Factor App Deep Dive: Factors 13–15 (API First, Telemetry, Auth)
  13. The Modern Software Development Lifecycle: From Cloud-Native to Compliance
  14. Cloud Sovereignty + 15 Factor App: The Architectural Bridge Between Law and Technology
  15. Standardized Software Logistics: OCI, Helm, Kubernetes API
  16. Deterministically Checking Security Standards: Policy as Code, CVE Scanning, SBOM
  17. ayedo Software Delivery Platform: High-Level Overview
  18. ayedo Kubernetes Distribution: CNCF-compliant, EU-sovereign, compliance-ready
  19. Cilium: eBPF-based Networking for Zero Trust and Compliance
  20. Harbor: Container Registry with Integrated CVE Scanning and SBOM
  21. VictoriaMetrics & VictoriaLogs: Observability for NIS-2 and DORA
  22. Keycloak: Identity & Access Management for GDPR and NIS-2
  23. Kyverno: Policy as Code for Automated Compliance Checks
  24. Velero: Backup & Disaster Recovery for DORA and NIS-2
  25. Delivery Operations: The Path from Code to Production
  26. ohMyHelm: Helm Charts for 15-Factor Apps Without Kubernetes Complexity
  27. Let's Deploy with ayedo, Part 1: GitLab CI/CD, Harbor Registry, Vault Secrets
  28. Let's Deploy with ayedo, Part 2: ArgoCD GitOps, Monitoring, Observability
  29. GitLab CI/CD in Detail: Stages, Jobs, Pipelines for Modern Software
  30. Kaniko vs. Buildah: Rootless, Daemonless Container Builds in Kubernetes
  31. Harbor Deep Dive: Vulnerability Scanning, SBOM, Image Signing
  32. HashiCorp Vault + External Secrets Operator: Zero-Trust Secrets Management
  33. ArgoCD Deep Dive: GitOps Deployments for Multi-Environment Scenarios
  34. Guardrails in Action: Policy-Based Deployment Validation with Kyverno
  35. Observability in Detail: VictoriaMetrics, VictoriaLogs, Grafana
  36. Alerting & Incident Response: From Anomaly to Final Report
  37. Polycrate: Deployment Automation for Kubernetes and Cloud Migration
  38. Managed Backing Services: PostgreSQL, Redis, Kafka on ayedo SDP
  39. Multi-Tenant vs. Whitelabel: Deployment Strategies for SaaS Providers
  40. From Zero to Production: The Complete ayedo SDP Workflow in an Example

TL;DR

  • Observability is based on three pillars – metrics, logs, and traces – and is translated into a practical monitoring model for modern, often distributed systems through the four Golden Signals (Latency, Traffic, Errors, Saturation).
  • VictoriaMetrics serves as a Prometheus-compatible, high-performance time-series database providing the foundation for metrics with long-term retention, scalable throughput, and native integration into Kubernetes via ServiceMonitors.
  • VictoriaLogs complements this perspective with structured, tamper-proof logs based on LogQL, integrating seamlessly with Grafana, thus efficiently meeting technical and regulatory requirements (e.g., traceability, immutability).
  • Grafana acts as a central observability console: Dashboards, alerting, and multi-datasource capabilities allow consistent visualization of Golden Signals for applications like Django services across metrics and logs.
  • ayedo supports you in building a well-thought-out observability setup based on VictoriaMetrics, VictoriaLogs, and Grafana – from architecture and operations to dashboards and integration into your platform and compliance strategy.

Why Observability is Indispensable Today

Modern applications rarely consist of a monolith. More typical are dozens to hundreds of services distributed across containers, /kubernetes/ clusters, databases, and external APIs. Flawlessness in such environments is an illusion – what matters is how quickly and reliably you can detect, categorize, and resolve issues.

Observability is more than “a bit of monitoring.” It’s about reconstructing the internal state of your systems from externally observable behavior. This includes:

  • continuously collected metrics,
  • meaningful, structured logs,
  • and – where appropriate – distributed traces.

When properly implemented, observability becomes a stable component of your platform governance. It not only aids in incident handling but also in capacity planning, cost optimization, and demonstrating to auditors that your systems are controllable and traceable.

With VictoriaMetrics, VictoriaLogs, and Grafana, a stack is available that addresses these requirements without vendor lock-in and can be well integrated into European data protection and compliance models.


The Three Pillars of Observability

Metrics: Condensed Signals for System State

Metrics are numerical time series: requests per second, error rates, latencies, CPU, and memory usage. Their advantage is efficiency: they can be collected at high frequency and stored for a very long time.

Prometheus has established itself as the de facto standard for this – and VictoriaMetrics as a performant backend that accepts Prometheus-compatible data and is queryable via PromQL. For capacity planning and Golden Signals monitoring, metrics are the central tool.

Logs: Detail, Context, and Traceability

Logs provide the story behind the numbers. They contain context: user IDs, request IDs, exception stacks, business events. Especially from a compliance perspective, logs are central: they enable forensics, traceability of accesses, and reconstruction of incidents.

VictoriaLogs is designed to store these log data in a structured, searchable, and tamper-proof manner – an important prerequisite for regulatory requirements, such as those towards NIS2 or DORA, which apply from January 17, 2025.

Traces: Understanding Distributed Processes

Traces link events across service boundaries. They show how a single request traverses multiple services, queues, and databases. In highly distributed architectures, this helps to make performance bottlenecks and unexpected dependencies visible.

Even if tracing is not mandatory for every system, traces round out the observability perspective in complex platforms – especially in conjunction with the Golden Signals.


Golden Signals in Detail

The four Golden Signals – Latency, Traffic, Errors, Saturation – form a practical bridge between technology and operations. They help to understand observability not as a collection of arbitrary metrics but as a focused set of key figures with a clear purpose.

Latency: Keeping Response Times in Check

Latency describes the time a system takes to process a request. Important aspects:

  • Measuring end-to-end latencies (e.g., HTTP response time of your Django app),
  • Differentiating between successful and failed requests,
  • Percentiles (p95, p99) instead of just averages to make outliers visible.

With VictoriaMetrics, latency metrics can be captured in detail, and Grafana visualizes them in time series and heatmaps. Logs from VictoriaLogs complement the perspective: they show which specific requests became slower and which business operations are affected.

Traffic: Understanding Load, Planning Capacity

Traffic measures how much “work” your system performs:

  • Requests per second,
  • Messages per second in queues,
  • Number of concurrently active user sessions.

Traffic metrics are essential for contextualizing latency and errors: rising latencies with constant traffic indicate internal problems, while rising latencies with massively increasing traffic suggest capacity limits.

VictoriaMetrics scales very efficiently here, even when storing millions of time series over long periods. This greatly facilitates trend analysis and capacity planning.

Errors: Quantifying Reliability

Error signals show how reliably your system operates:

  • HTTP error codes (4xx, 5xx),
  • Application errors (exceptions, validation errors),
  • Timeouts or circuit breaker events.

Metrics provide aggregated error rates per service or endpoint, while logs provide details on causes and context. With VictoriaLogs and LogQL (compatible with Loki), you can quickly filter: for example, by error type, tenant, or feature flag.

From this data, service-level objectives (SLOs) can be derived, such as: “99.5% of requests to the checkout service are successful over a rollup period of 30 days.” Grafana helps you make these SLOs visible and verifiable.

Saturation: Recognizing Resources at Their Limits

Saturation describes how much your resources are utilized:

  • CPU and memory usage,
  • Database connection pools,
  • Queue lengths,
  • Thread and worker pools.

For operations teams, saturation is an early warning signal. As saturation rises, latency and errors often follow. With VictoriaMetrics, you can consistently capture these metrics per node, pod, and service; logs point to specific situations where resources were exhausted.


VictoriaMetrics: High-Performance Metrics for Long-Term Transparency

VictoriaMetrics is a high-performance time-series database that accepts Prometheus-compatible metrics. For those responsible in larger environments, several characteristics are particularly relevant:

  • Prometheus-compatible: You can continue to use existing exporters and instrumentations. Queries are made via PromQL, making it easier for teams with Prometheus experience to transition.
  • Scalability and Performance: VictoriaMetrics is designed for high ingestion rates and cost-efficient storage. This allows you to maintain a large number of metrics (e.g., per pod in /kubernetes/) with long retention without letting infrastructure costs spiral out of control.
  • Long-term Retention: Compliance requirements often dictate retention periods of months to years. VictoriaMetrics allows granularly configurable retention strategies, differentiated by metric sets or environments.
  • ServiceMonitor Integration: In Kubernetes environments, metrics are typically configured via ServiceMonitor objects (custom resources of the Prometheus operator). VictoriaMetrics can directly use these configurations, reducing the effort for migrating from Prometheus setups.
  • Efficient Queries: Queries via PromQL and integration into Grafana allow for complex evaluations, such as correlations of traffic peaks, latency, and errors across different services.

This makes VictoriaMetrics a reliable foundation for Golden Signals monitoring in productive platforms.


VictoriaLogs: Structured Logs as a Compliance Component

VictoriaLogs addresses the second core area of observability: logs. For those responsible with a focus on security and compliance, several points are particularly interesting:

  • Structured Logs: Instead of unstructured text lines, VictoriaLogs efficiently works with structured events (e.g., JSON). This allows targeted queries by fields like user_id, tenant, request_id, or feature_flag.
  • LogQL Support: The query language is deliberately modeled on LogQL, as known from Loki. Those already using Loki or with corresponding expertise in the team will find it easy to adapt – an important factor for acceptance and operational security.
  • Tamper-proof Approach: The way data is stored and versioned makes manipulation significantly more difficult or detectable. This is central for forensics, incident response, and audits – especially in light of NIS2, which must be transposed into national law in EU member states by October 17, 2024.
  • Seamless Grafana Integration: VictoriaLogs integrates directly into Grafana. This allows logs not only to be analyzed via a search interface but also to be embedded in dashboards, correlated with metrics from VictoriaMetrics and possibly other data sources.

The result is a log backend that provides both operations teams and data protection and compliance officers with actionable, structured information – without relying on proprietary SaaS solutions.


Grafana as an Observability Console

Grafana is the visible part of the observability stack. Technically responsible individuals need a tool that:

  • Consistently visualizes metrics, logs, and possibly traces,
  • Combines various data sources,
  • and supports alerting and team collaboration.

Key features:

  • Dashboards: Grafana offers a mature concept for dashboards with panels, variables, and reusable components. You can map Golden Signals per service, environment (Prod/Staging), or tenant and thus establish a unified language in operations.
  • Pre-built Dashboards: For many components – databases, message brokers, Kubernetes – there are already community or vendor dashboards. In conjunction with VictoriaMetrics, you can achieve a good basic coverage in a very short time.
  • Alerting: Grafana Alerting allows you to define rules directly on metrics and logs and integrate them into existing incident management processes (e.g., PagerDuty, Slack, email). This enables consistent monitoring of Golden Signals thresholds (e.g., error rate > 1%).
  • Multi-Datasource: Grafana can use multiple data sources in parallel. Besides VictoriaMetrics and VictoriaLogs, you can integrate databases, cloud metric services, or security tools. This makes Grafana the central console for your platform, rather than another isolated tool.

Practical Example: Golden Signals Dashboard for a Django Application

To make the mentioned concepts tangible, let’s consider a typical Django application operated in /kubernetes/ and accessible externally via Ingress.

A Golden Signals dashboard in Grafana k

Ähnliche Artikel