Self-Hosted Beacon Collection: Implementation & Analysis Workflows

Ingestion Architecture & Endpoint Design

Self-hosted beacon collection represents the foundational data ingestion layer for modern web performance observability. Unlike third-party SaaS platforms that abstract the collection layer, a custom implementation grants engineering teams full control over data retention, schema evolution, and cost scaling. When designing a RUM Architecture, Tooling & Self-Hosting strategy, the beacon endpoint must handle high-concurrency, loss-tolerant HTTP POST requests without blocking the main thread. This architecture typically leverages the navigator.sendBeacon() API to guarantee delivery during page unload events, ensuring Core Web Vitals metrics are captured accurately even during abrupt navigation.

To sustain high throughput while minimizing latency, deploy an edge-compatible HTTP collector that terminates TLS, validates payloads, and forwards events asynchronously to a message broker.

# NGINX Edge Collector Configuration
server {
 listen 443 ssl http2;
 server_name rum-collector.example.com;

 # Optimize for high-concurrency POSTs
 keepalive_timeout 65s;
 keepalive_requests 1000;
 client_max_body_size 64k; # Strict payload limit

 location /v1/beacon {
 # Accept only POST, reject unsupported methods immediately
 limit_except POST { deny all; }
 
 # Enable gzip for response compression (if returning ACKs)
 gzip on;
 gzip_types application/json;

 # Proxy to internal Kafka/Pulsar buffer layer
 proxy_pass http://rum_ingest_cluster;
 proxy_set_header Host $host;
 proxy_set_header X-Real-IP $remote_addr;
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 
 # Fire-and-forget for beacon drops
 proxy_ignore_client_abort on;
 proxy_read_timeout 2s;
 }
}

Storage Pipeline & Time-Series Aggregation

Ingesting millions of daily performance events requires a columnar storage backend optimized for time-series aggregation. Setting up a self-hosted RUM pipeline with ClickHouse provides a scalable foundation for querying high-cardinality dimensions like URL, device type, and geographic region. The ingestion layer should normalize incoming JSON payloads into flattened tables, applying materialized views for real-time p75 and p95 percentile calculations on LCP, FID, and CLS.

-- ClickHouse Raw Events Table
CREATE TABLE rum_events_raw
(
 `event_id` UUID DEFAULT generateUUIDv4(),
 `timestamp` DateTime64(3),
 `session_id` String,
 `user_agent` String,
 `url_path` String,
 `country_code` LowCardinality(String),
 `connection_type` LowCardinality(String),
 `lcp_ms` UInt32 DEFAULT 0,
 `inp_ms` UInt32 DEFAULT 0,
 `cls_score` Float32 DEFAULT 0.0,
 `trace_id` String DEFAULT '',
 INDEX idx_session session_id TYPE bloom_filter GRANULARITY 1,
 INDEX idx_url url_path TYPE ngrambf_v1(3, 1024, 1, 0) GRANULARITY 1
)
ENGINE = MergeTree()
ORDER BY (timestamp, url_path, country_code)
TTL timestamp + INTERVAL 90 DAY;

-- Materialized View for Real-Time Percentile Aggregation
CREATE MATERIALIZED VIEW rum_cwv_p75_mv
ENGINE = SummingMergeTree()
ORDER BY (date, url_path, connection_type)
AS SELECT
 toDate(timestamp) AS date,
 url_path,
 connection_type,
 quantile(0.75)(lcp_ms) AS p75_lcp,
 quantile(0.75)(inp_ms) AS p75_inp,
 quantile(0.75)(cls_score) AS p75_cls,
 count() AS event_count
FROM rum_events_raw
GROUP BY date, url_path, connection_type;

Payload Optimization & Network Constraints

Network constraints heavily influence data fidelity, particularly on 3G and emerging market connections. Reducing beacon payload size for mobile networks involves strategic field pruning, delta encoding for repeated metrics, and binary serialization protocols like Protocol Buffers. Engineers must balance telemetry granularity with bandwidth consumption to prevent beacon drops caused by strict mobile data caps or aggressive browser throttling.

Implement client-side payload builders that strip redundant fields, compress numeric arrays, and fallback gracefully when sendBeacon is unavailable.

// Optimized Beacon Payload Builder
function buildRumPayload(metrics, sessionContext) {
 const payload = {
 sid: sessionContext.id,
 u: window.location.pathname,
 ct: navigator.connection?.effectiveType || 'unknown',
 m: {
 lcp: metrics.lcp?.value || 0,
 inp: metrics.inp?.value || 0,
 cls: metrics.cls?.value || 0
 }
 };

 // Delta encode timestamps relative to navigationStart
 payload.t = Math.round(performance.now());
 
 // Stringify and compress (modern browsers support CompressionStream)
 const jsonStr = JSON.stringify(payload);
 return new Blob([jsonStr], { type: 'application/json' });
}

// Send with fallback
function dispatchBeacon(blob) {
 if (navigator.sendBeacon) {
 navigator.sendBeacon('/v1/beacon', blob);
 } else {
 // Fallback for legacy browsers
 fetch('/v1/beacon', {
 method: 'POST',
 body: blob,
 keepalive: true,
 headers: { 'Content-Type': 'application/json' }
 }).catch(() => {});
 }
}

Distributed Context & Trace Correlation

Isolating frontend latency from backend processing delays requires distributed context propagation. Correlating RUM data with backend APM traces establishes a unified trace ID that bridges client-side navigation timing with server-side span data. This correlation enables performance engineers to pinpoint whether a degraded LCP stems from resource fetch delays, CDN routing inefficiencies, or slow API response times.

Propagate W3C Trace Context headers through initial document requests and attach the extracted traceparent ID to subsequent beacon payloads.

// Extract Trace Context from initial server response
function extractTraceContext() {
 const metaTag = document.querySelector('meta[name="traceparent"]');
 return metaTag?.content || '';
}

// Attach to beacon payload
const traceId = extractTraceContext();
const payload = buildRumPayload(performanceMetrics, { id: crypto.randomUUID() });
payload.trace_id = traceId;

// Ensure correlation ID is sent in headers for collector routing
const headers = new Headers({ 'Content-Type': 'application/json' });
if (traceId) headers.set('X-Trace-Context', traceId);

navigator.sendBeacon('/v1/beacon', new Blob([JSON.stringify(payload)], { type: 'application/json' }));

Standardization & Vendor Evaluation

Standardization efforts have matured significantly, making vendor-agnostic telemetry increasingly viable. Implementing OpenTelemetry for Web RUM ensures semantic consistency across frontend, backend, and infrastructure layers, simplifying cross-domain debugging. However, organizations must evaluate build-versus-buy tradeoffs; comparing SpeedCurve vs Custom RUM reveals that while SaaS solutions accelerate time-to-insight, self-hosted pipelines offer superior data sovereignty and long-term cost efficiency at enterprise scale.

// OpenTelemetry Web SDK Initialization
import { WebTracerProvider } from '@opentelemetry/web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';

const provider = new WebTracerProvider();
provider.addSpanProcessor(
 new BatchSpanProcessor(new OTLPTraceExporter({ url: '/v1/traces' }))
);
provider.register();

registerInstrumentations({
 instrumentations: [
 new DocumentLoadInstrumentation(),
 new UserInteractionInstrumentation()
 ]
});

Production Workflows & Analysis Patterns

Implementation Steps

  1. Deploy Edge Collector: Provision an NGINX, Envoy, or Cloudflare Workers endpoint optimized for high-concurrency POST handling with strict client_max_body_size limits.
  2. Configure Validation Middleware: Implement JSON schema validation and payload sanitization to reject malformed or oversized events before they hit the message broker.
  3. Buffer & Route: Route validated events to Kafka or Pulsar for backpressure management, then stream to ClickHouse via a high-throughput consumer.
  4. Implement Fallbacks: Deploy navigator.sendBeacon() with a fetch() keepalive fallback for legacy browsers and restricted environments.
  5. Automate Schema Migrations: Deploy CI/CD pipelines that execute ALTER TABLE statements and materialized view updates as Core Web Vitals metrics evolve (e.g., FID → INP).

Debugging Workflows

  • Monitor Drop Rates: Track HTTP 4xx/5xx response codes at the collector and monitor retry queue depths to identify network or validation bottlenecks.
  • Validate Payload Integrity: Run synthetic test suites against JSON schema assertions to ensure metric fields align with expected data types and ranges.
  • Trace Missing CWV Data: Cross-reference session replay IDs with beacon ingestion timestamps to isolate client-side script failures or unload race conditions.
  • Profile Ingestion Latency: Use perf and eBPF tools to profile collector worker thread pools, disk I/O wait times, and network socket buffers during peak traffic windows.

Analysis Patterns

  • Percentile Segmentation: Calculate p75/p95 percentiles for LCP, INP, and CLS segmented by connection_type, device_memory, and effective_type to surface mobile-specific degradation.
  • Statistical Anomaly Detection: Apply rolling Z-score or EWMA algorithms on daily beacon aggregates to automatically flag performance regressions before they impact user cohorts.
  • Geographic Hotspot Mapping: Join country_code and city dimensions with CDN routing tables to correlate latency spikes with edge node cache misses or routing misconfigurations.
  • Sampling Variance Evaluation: Compare metric distributions across 1%, 5%, and 10% sampling tiers to quantify statistical variance and optimize RUM Data Sampling Strategies without sacrificing analytical accuracy.