Hardware entropy is a coupled system

What: Analysis of hardware entropy sources and their coupling
Impact: Relevant to researchers and developers working on entropy generation and cryptographic security

Explorer See the embedding space Each dot is one 256-byte entropy window. Sources that produce similar bytes land near each other. Start with the 9-source slice, then switch to the full 58-source union. open the explorer evidence tables Setup How we ran this All sessions were recorded with OpenEntropy , an open-source framework we built for capturing raw bytes from hardware entropy sources. The machine is a Mac Mini M4 (base model, 16 GB RAM). OpenEntropy exposes 58 sources on it, covering clock jitter, memory timing, interrupt scheduling, PLL phase, USB transport, and more. The full source catalog describes each one. One external source is included: a Crypta Labs QCicada USB quantum RNG. No conditioning was applied at the OpenEntropy layer (no Von Neumann debiasing, no SHA-256 hashing). QCicada was recorded in its "raw noise" mode. In this mode, the device runs its built-in health tests (what the SDK exposes as repetition count and adaptive proportion flags) but does not apply cryptographic conditioning. A fully unprocessed "raw samples" mode also exists, which outputs directly from the quantum optical module with no filtering. The embedding pipeline ( openentropy-embed ) cuts each recording into 256-byte windows with a 128-byte stride (~6,800 windows across the full 58-source union). Each window is serialized as spaced hex and embedded with OpenAI's text-embedding-3-large (3,072 dimensions). We ran 4 deliberate stress-test campaigns across 87 sessions. Retrieval evaluation uses leave-one-session-out splits so test sessions are always unseen. Results What we found 1. A third of the sources are nearly indistinguishable 19 out of 58 sources land within cosine distance 0.01 of each other in the embedding space. They come from 10 different categories, spanning PLL oscillators, I/O, network, GPU, scheduling, microarchitecture, IPC, signal, timing, and the external quantum RNG. The human-assigned category labels do not predict which sources cluster together. At the center of this dense core sits fsync_journal , an I/O source that flushes the entire storage stack on every call. It is the nearest embedding neighbor to four different sources from three categories (thermal, timing, scheduling). That makes physical sense: fsync timing is sensitive to every kind of system load, so it acts as a barometer of overall machine state. We tested whether the core is held together by one dominant shared factor. PCA on the source centroids shows that PC1 explains 58.7% of the variance. After removing it, the core stays intact. Distances between core members increase by only 1.1x. The PLL sources remain nearest neighbors to each other. fsync_journal remains the hub. When we force the residual into clusters, 13-16 of the 19 sources refuse to split. The core is not an artifact of one shared variable. Either there are multiple layers of shared host state affecting the same sources, or high-entropy hex genuinely looks similar to the embedding model regardless of origin. Both explanations survive this test. 2. CPU pipeline sources stand apart Not everything collapses into the core. Sources that measure CPU pipeline internals produce genuinely different byte patterns. preemption_boundary (kernel scheduler preemption timing) is the most isolated source in the entire space, sitting 0.41 cosine distance from the average. Other outliers include mach_continuous_timing , icc_atomic_contention (cross-core atomic operations), and sleep_jitter . On the 38 sources with enough sessions for leave-one-out retrieval, the model identifies the correct source 46 % of the time on unseen sessions (chance is 2.6%). The embedding picks up real structure, but the dense core makes clean separation hard for the sources inside it. Retrieval accuracy on unseen sessions ( 58 -source union) text-embedding-3-large 46 % chance baseline (1/58) 1.7 % The model identifies the correct source 46 % of the time on sessions it has never seen. Random guessing would score 1.7 %. Evaluated with leave-one-session-out splits. click to expand 3. Stress tests confirm the shared structure is real We ran 4 stress-test campaigns targeting CPU scheduling, memory and network pressure, GPU/CPU ramp, and mixed software load. In every campaign, a small number of latent factors explained the majority of cross-source movement. The scheduler campaign was the cleanest: one factor alone accounted for 80% of the variance. dram_row_buffer was the strongest responder across campaigns, moving the most under both scheduler and software pressure. The PLL sources ( audio_pll_timing , display_pll , pcie_pll ) were among the most temporally stable, drifting around 0.003-0.005 cosine distance between sessions. Sources that measure system-level state ( usb_enumeration , process_table ) drifted the most, up to 0.29 between sessions. Stress-test campaigns are dominated by a few hidden variables scheduler ~ 2.0 factors 80 % explained by top factor software ~ 3.2 factors 52 % explained by top factor gpu_cp...

Read Full Article → ← Back to News

Hardware entropy is a coupled system

Share this article