Skip to main content
Back to blogs
Observability

Structured Logging That Actually Scales

Why replacing text logs with structured JSON, shipping them to a central stack, and adopting consistent query patterns cuts incident response time in half.

January 28, 20265 min read
loggingobservabilitydevopsmonitoring

The first thing to check when inheriting a production system is the logs. Unstructured text like ERROR: something went wrong in payment service is a reliable signal that incident response is going to be painful. Structured logging is one of those practices that costs almost nothing to implement but transforms how fast you can diagnose problems.

The Problem with Text Logs

Traditional log lines look like this:

2026-01-28 14:23:01 ERROR PaymentService - Failed to process payment for user 12345, order 67890, amount $150.00, error: timeout

Parsing this requires regex. Every service formats logs differently. Searching across services means writing different queries for each one. Correlating a single request across multiple services is nearly impossible.

Structured Logging

The same event as structured JSON:

structured-log-entry.json
{
  "timestamp": "2026-01-28T14:23:01.456Z",
  "level": "error",
  "service": "payment-service",
  "message": "Payment processing failed",
  "userId": "12345",
  "orderId": "67890",
  "amount": 150.00,
  "currency": "USD",
  "error": "upstream_timeout",
  "duration_ms": 30000,
  "traceId": "abc-123-def-456",
  "spanId": "span-789"
}

Every field is queryable. Every service uses the same format. Correlating a request across services is a single query on traceId.

Implementation

Node.js with Pino

Pino is the fastest JSON logger for Node.js — it writes logs asynchronously and adds negligible overhead:

logger.ts
import pino from "pino";
 
export const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  formatters: {
    level(label) {
      return { level: label };
    },
  },
  base: {
    service: process.env.SERVICE_NAME,
    environment: process.env.NODE_ENV,
    version: process.env.APP_VERSION,
  },
});

Usage in application code:

payment-handler.ts
import { logger } from "./logger";
 
async function processPayment(userId: string, orderId: string, amount: number) {
  const log = logger.child({ userId, orderId, amount });
 
  log.info("Processing payment");
 
  try {
    const result = await paymentGateway.charge(amount);
    log.info({ transactionId: result.id, duration_ms: result.duration }, "Payment succeeded");
    return result;
  } catch (error) {
    log.error({ error: error.message, code: error.code }, "Payment failed");
    throw error;
  }
}

The child() method creates a logger with context fields that are automatically included in every log entry. No more manually including userId in every log call.

Go with zerolog

logger.go
package main
 
import (
    "os"
    "github.com/rs/zerolog"
    "github.com/rs/zerolog/log"
)
 
func init() {
    zerolog.TimeFieldFormat = zerolog.TimeFormatUnix
    log.Logger = zerolog.New(os.Stdout).With().
        Str("service", "payment-service").
        Str("version", os.Getenv("APP_VERSION")).
        Timestamp().
        Logger()
}
 
func processPayment(userID string, amount float64) error {
    log.Info().
        Str("userId", userID).
        Float64("amount", amount).
        Msg("Processing payment")
    return nil
}

Shipping Logs

Structured logs are only useful if they're aggregated in a central, searchable system. A minimal self-hosted stack looks like this:

docker-compose.logging.yml
services:
  vector:
    image: timberio/vector:latest-alpine
    volumes:
      - ./vector.toml:/etc/vector/vector.toml:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    depends_on:
      - loki
 
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    volumes:
      - loki-data:/loki
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
 
volumes:
  loki-data:
  grafana-data:

Vector collects logs from Docker containers, parses the JSON, and ships them to Loki. Grafana queries Loki for visualization and alerting.

vector.toml
[sources.docker]
type = "docker_logs"
 
[transforms.parse]
type = "remap"
inputs = ["docker"]
source = '''
. = parse_json!(.message)
'''
 
[sinks.loki]
type = "loki"
inputs = ["parse"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels.service = "{{ service }}"
labels.level = "{{ level }}"

Query Patterns That Save Time

Find all errors for a specific user in the last hour

{service="payment-service", level="error"} | json | userId = "12345"

Trace a request across services

{level=~"info|error"} | json | traceId = "abc-123-def-456"

Find slow requests

{service="api-gateway"} | json | duration_ms > 5000

Error rate by service (last 15 minutes)

sum by (service) (rate({level="error"}[15m]))

Alerting on Logs

Logs aren't just for post-incident investigation. With structured data, you can alert proactively:

loki-alert-rules.yml
groups:
  - name: log-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({level="error"}[5m])) by (service) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.service }}"
 
      - alert: PaymentFailureSpike
        expr: |
          sum(rate({service="payment-service", level="error"} |= "Payment failed" [5m])) > 0.1
        for: 1m
        labels:
          severity: critical

Key Takeaways

  1. Structured from day one — retrofitting structured logging is painful; start with JSON from the beginning
  2. Use child loggers for context — attach request-scoped fields once, not in every log call
  3. Include a trace ID in every log — this is the single most valuable field for debugging distributed systems
  4. Centralize immediately — logs on individual servers are useless during incidents when you need cross-service visibility
  5. Alert on log patterns — don't wait for users to report problems that your logs already show
Share