Back to project library/ULTRA-LOW LATENCY TELEMETRY
C++HFTOpen Source

Title

Latte Telemetry

ULTRA-LOW LATENCY TELEMETRYHeader-only C++ telemetry framework for nanosecond-scale profiling in real-time systems.

Description

Ultra-low-latency C++ telemetry library for nanosecond-scale profiling.

Loop's Beacon Avg

29.9 cycles (TSC)

Fast Start+Stop Avg

60.1 cycles (TSC)

Mid Start+Stop Avg

119.8 cycles (TSC)

Hard Start+Stop Avg

148.5 cycles (TSC)

Default Buffer Capacity

65,536 samples

Developed Description

Latte is a header-only C++ telemetry framework engineered for high-frequency trading, game engines, and real-time systems where instrumentation overhead must be measured in nanoseconds. It timestamps code using x86_64 TSC instructions (RDTSC/RDTSCP/LFENCE) and stores samples in per-thread fixed-size ring buffers so measurement never blocks other threads. Each thread records to its own storage, eliminating global contention and minimizing cache-line interference. To reduce observer effect, Latte identifies scopes by pointer-stable const char* IDs instead of hashing strings, and Start/Stop pushes and pops stack entries to support nested captures without linear search. On Stop, the per-thread map lookup is O(log N) using pointer comparisons, after which samples are appended to the appropriate ring buffer. The library exposes three timing modes (Fast, Mid, Hard) so users can trade precision for overhead, and it provides LATTE_PULSE for consecutive-event deltas plus Snapshot for raw sample extraction. Reporting is handled by DumpToStream, which computes statistical summaries (average, median, standard deviation, skewness, min/max/range) and optionally calibrates samples by subtracting measured overhead for each Start/Stop mode pairing. Before reporting, Latte applies data cleaning to filter preemption and outlier samples using IQR rules. The result is a lightweight, deterministic profiling pipeline that surfaces long-tail latency behavior without materially perturbing the system being observed.