Abstract

We conducted a large-scale benchmark comparing AES-256-GCM and ChaCha20-Poly1305 across 10,247 file transfer operations spanning six consumer device classes. Our findings reveal a previously undocumented crossover point at approximately 4MB file size, where AES-256-GCM overtakes ChaCha20-Poly1305 in throughput on hardware with AES-NI instruction support. Below this threshold, ChaCha20-Poly1305 demonstrates consistently lower latency due to reduced initialization overhead. These results have direct implications for adaptive cipher selection in real-time file transfer systems.

Background

The cryptographic community has long debated the practical performance characteristics of AES-256-GCM versus ChaCha20-Poly1305 in applied settings. While theoretical analyses and microbenchmarks exist in abundance, real-world file transfer workloads introduce variables that synthetic tests cannot capture: filesystem I/O contention, memory pressure from concurrent browser tabs, thermal throttling on mobile devices, and variable network conditions affecting chunking strategies.

AES-256-GCM benefits from hardware acceleration via AES-NI instructions, available on most x86 processors manufactured after 2010 and ARM processors with ARMv8 Cryptography Extensions. ChaCha20-Poly1305, designed by Daniel J. Bernstein, uses ARX (add-rotate-XOR) operations that perform efficiently in software without dedicated hardware support.

For file transfer applications, the relevant metric is not raw cipher throughput but end-to-end encryption latency: the time from plaintext input to authenticated ciphertext output, including key derivation, IV generation, padding, and authentication tag computation.

Methodology

Our test framework executed file transfers across controlled conditions, measuring end-to-end encryption time from plaintext buffer to authenticated ciphertext output. We tested files ranging from 1KB to 250MB across six device classes representative of our user base.

Device Classes Tested

Test Parameters

Each file size category was tested with a minimum of 200 iterations per device class per cipher. Files were generated with cryptographically random content to eliminate compression-related variance. Tests were conducted at thermal steady-state after a 5-minute warmup period.

Classification Notice

Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The benchmarking harness, statistical analysis pipeline, and device provisioning infrastructure remain internal to PrivLab.

Results

The following table summarizes median encryption throughput (MB/s) across file sizes on AES-NI-equipped hardware (Class A), which represents 73% of our observed user base.

File Size AES-256-GCM ChaCha20-Poly1305 Winner Delta
64 KB 1,847 MB/s 2,134 MB/s ChaCha20 +15.5%
256 KB 2,891 MB/s 3,067 MB/s ChaCha20 +6.1%
1 MB 4,213 MB/s 4,312 MB/s ChaCha20 +2.3%
4 MB 5,102 MB/s 5,089 MB/s ~Parity -0.3%
16 MB 5,847 MB/s 5,201 MB/s AES-256 +12.4%
64 MB 6,103 MB/s 5,187 MB/s AES-256 +17.7%
250 MB 6,241 MB/s 5,156 MB/s AES-256 +21.0%

On devices without AES-NI support (Classes B, D, E), ChaCha20-Poly1305 consistently outperformed AES-256-GCM by 18-34% across all file sizes, with the advantage increasing for smaller payloads.

Latency Distribution (Class A, 1MB files, n=2,000)

Percentile AES-256-GCM ChaCha20-Poly1305
p50 0.24 ms 0.23 ms
p90 0.31 ms 0.27 ms
p95 0.38 ms 0.29 ms
p99 0.52 ms 0.34 ms

Key observation: ChaCha20-Poly1305 exhibits significantly tighter tail latency distribution. At p99, AES-256-GCM shows 53% higher variance, likely due to pipeline stalls during AES-NI scheduling on heavily loaded cores.

Key Findings

  1. The 4MB crossover is real and consistent. Across 10,247 transfer operations, the crossover point held within a ±0.3MB band (3.7-4.3MB) on AES-NI hardware. This is attributable to AES-NI's instruction pipeline reaching optimal utilization above this threshold.
  2. ChaCha20 wins at tail latencies regardless of file size. Even on AES-NI hardware where AES-256-GCM achieves higher throughput for large files, ChaCha20's p99 latency was 18-34% lower. For latency-sensitive applications, this is significant.
  3. Browser-based (Web Crypto) performance diverges substantially. The Web Crypto API's AES-GCM implementation shows 2.1x overhead versus native, while ChaCha20 (via subtle.crypto polyfills) shows 3.4x overhead. Native AES-NI advantages are partially negated by the JavaScript-to-native bridge cost.
  4. Thermal throttling disproportionately affects AES-NI. After sustained encryption workloads (>30 seconds continuous), AES-NI throughput degraded 12% on mobile devices versus 4% degradation for ChaCha20, suggesting better thermal characteristics for the ARX-based cipher.
  5. Memory allocation patterns differ meaningfully. AES-256-GCM's requirement for unique IVs and GCM counter blocks results in 1.4x more memory allocations per operation versus ChaCha20-Poly1305's streamlined state management.

Implications for PrivDrop

Based on these findings, we have implemented an adaptive cipher selection strategy for PrivDrop's file transfer engine. The system now performs a lightweight hardware capability detection during initialization and applies the following policy:

This adaptive approach yields a measured 11.3% improvement in median encryption latency across our real-world file size distribution (heavily skewed toward 1-8MB transfers) compared to a static cipher selection policy.

Classification Notice

The hardware detection heuristic, thermal throttling detection algorithm, and specific thresholds used in production cipher selection are proprietary. The adaptive selection logic described above is simplified for publication purposes. Actual implementation includes additional variables not disclosed here.

Limitations

This study was conducted under controlled laboratory conditions with synthetic file content. Real-world performance may vary based on concurrent system load, OS-level scheduling decisions, and browser sandboxing overhead. Additionally, our Web Crypto API benchmarks were limited to Chromium-based browsers and Firefox; Safari's implementation may exhibit different performance characteristics due to its distinct JavaScript engine architecture.

We did not evaluate XChaCha20-Poly1305 (extended nonce variant) or AES-256-GCM-SIV in this study. Future work will address these variants.