Abstract
We conducted a systematic audit of eight file sharing services that market themselves as "zero-knowledge" or "end-to-end encrypted." Our analysis focused on metadata exposure at the infrastructure layer—information observable by the service operator independent of payload encryption. Results indicate that six of eight services leak meaningful metadata including exact file sizes, upload timestamps with millisecond precision, client IP addresses, user agent strings, and in some cases approximate file types via MIME sniffing. Only two services demonstrated comprehensive metadata protection. We conclude that the term "zero-knowledge" is frequently applied in ways that are technically misleading to users.
Background
The term "zero-knowledge" in the context of file sharing services has no standardized definition. Unlike zero-knowledge proofs in cryptography—which have precise mathematical definitions—"zero-knowledge file sharing" is a marketing term that implies the service operator cannot access uploaded content. However, this framing obscures a critical distinction: content encryption versus metadata protection.
A service may encrypt file contents with a client-held key while simultaneously logging the file's exact size in bytes, the precise timestamp of upload, the uploader's IP address, geographic location, browser fingerprint, and access patterns. This metadata alone can be deeply revealing: file size combined with type hints can identify specific documents, access timestamps reveal behavioral patterns, and IP addresses provide geographic and organizational attribution.
Previous work by Greschbach et al. (2012) and Pulls & Dahlberg (2020) examined metadata leakage in anonymity networks and messaging protocols respectively. Our contribution extends this analysis to the specific domain of file sharing services marketed for privacy-conscious users.
Threat Model
We assume an honest-but-curious service operator with full access to server-side logs, database records, network traffic metadata, and infrastructure telemetry. We do not consider compromised client-side code in this study (which would defeat any end-to-end encryption scheme). Our analysis focuses exclusively on what the operator can observe through normal system operation.
Methodology
Each service was tested through a standardized audit protocol examining seven metadata vectors. We performed uploads and downloads under controlled conditions, capturing all network traffic and analyzing server-observable information.
Metadata Vectors Tested
- File Size Exposure — Whether the server can determine exact file size from upload request headers, content-length, or stored ciphertext size
- Timestamp Precision — Whether upload/download timestamps are logged with precision sufficient for traffic correlation
- IP Address Logging — Whether client IP is associated with specific file operations in server logs
- User Agent Leakage — Whether browser/client fingerprint data accompanies file operations
- File Type Inference — Whether file type can be inferred from metadata (magic bytes, extension in URL, MIME type in headers)
- Access Pattern Visibility — Whether the operator can observe download frequency and accessor identity
- Filename Exposure — Whether original filename is visible to the server infrastructure
Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The audit toolchain, traffic interception methodology, and service identification are not disclosed to prevent targeted countermeasures that would mask rather than resolve the underlying issues.
Results
The following table summarizes metadata exposure across all eight audited services. A "Pass" indicates the metadata vector is protected; "Fail" indicates the service operator can observe this information; "Partial" indicates some protection with exploitable gaps.
| Vector | Svc A | Svc B | Svc C | Svc D | Svc E | Svc F | Svc G | Svc H |
|---|---|---|---|---|---|---|---|---|
| File Size | Fail | Fail | Partial | Fail | Fail | Pass | Fail | Pass |
| Timestamp | Fail | Fail | Fail | Fail | Partial | Pass | Fail | Pass |
| IP Address | Fail | Partial | Fail | Fail | Fail | Fail | Fail | Pass |
| User Agent | Fail | Fail | Fail | Partial | Fail | Pass | Fail | Pass |
| File Type | Fail | Fail | Pass | Fail | Pass | Pass | Partial | Pass |
| Access Pattern | Fail | Fail | Fail | Fail | Fail | Partial | Fail | Pass |
| Filename | Pass | Pass | Pass | Fail | Pass | Pass | Pass | Pass |
Composite Privacy Scores
| Service | Score | Rating | Claims "Zero-Knowledge" |
|---|---|---|---|
| Service A | 1/7 | Critical | Yes |
| Service B | 2/7 | Critical | Yes |
| Service C | 3/7 | Poor | Yes |
| Service D | 1/7 | Critical | Yes |
| Service E | 3/7 | Poor | No |
| Service F | 6/7 | Good | Yes |
| Service G | 2/7 | Critical | Yes |
| Service H | 7/7 | Excellent | Yes |
Critical finding: Five of the six services that failed our audit explicitly use the term "zero-knowledge" in their marketing materials. Users relying on these claims for operational security are exposed to metadata analysis attacks that the services' own marketing implicitly claims to prevent.
Analysis of Common Failures
File Size Exposure via Content-Length
The most prevalent vulnerability (present in 6/8 services) is file size exposure through predictable ciphertext sizes. Five services store ciphertext with no padding, meaning the encrypted file size directly reveals the plaintext size (offset by a constant authentication tag and IV). Service C applies fixed-block padding but uses 1KB blocks, which still allows file size estimation within ±512 bytes—sufficient to identify many common document types.
Timestamp Correlation
Seven of eight services log upload timestamps with millisecond precision in their database layer. Even services that claim to "not log" user activity maintain creation timestamps on stored objects for garbage collection purposes. This metadata enables temporal correlation attacks where upload timing is matched against known user activity patterns.
IP Address Association
Seven services associate client IP addresses with file operations at some point in their stack. Even services using CDN proxies (which strip client IP from the origin request) retain IP data in CDN-level access logs that are available to the service operator. Only Service H demonstrated a architecture where IP addresses are never associated with specific file operations at any layer.
How Service H Achieves Full Protection
Service H (scoring 7/7) employs several architectural decisions that collectively eliminate metadata exposure:
- Randomized padding: All uploads are padded to randomized sizes within predefined bands, preventing file size inference. The padding scheme adds between 5% and 40% overhead, non-deterministically.
- Timing obfuscation: Upload operations are batched and released to storage with randomized delays (0-3 seconds), preventing timestamp correlation with sub-second precision.
- IP disassociation: The upload endpoint operates behind a privacy-preserving relay architecture where the storage layer never observes client IP. Access logs at the edge layer are rotated every 60 seconds and contain only aggregate counters.
- Uniform request profiles: All API requests use identical content types, header patterns, and response formats regardless of file type or operation, preventing inference from traffic analysis.
- Access counting prevention: Download operations are served through a cache layer that does not report individual access events to the application tier.
Conclusions
The file sharing industry's use of "zero-knowledge" terminology is, in the majority of cases we examined, technically misleading. Encrypting file contents while exposing file size, upload timing, and accessor identity provides a false sense of security. Users with genuine privacy requirements—journalists, whistleblowers, activists, legal professionals—cannot rely on marketing claims alone.
We recommend that the industry adopt more precise terminology:
- Content-encrypted: File contents are encrypted with client-held keys (minimum bar)
- Metadata-private: File metadata is protected against operator observation
- Zero-knowledge: Reserved for services that can mathematically prove they learn nothing about uploaded content or metadata
Based on this taxonomy, only 2 of 8 services audited qualify for "metadata-private" status, and only 1 approaches true zero-knowledge architecture.
Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The identities of Services A-G are withheld under responsible disclosure principles. Service operators were notified of findings 30 days prior to publication.
Responsible Disclosure
All services identified with critical metadata leakage were contacted 30 days prior to this publication. Three operators acknowledged the issues; one has committed to addressing file size exposure in a future release. We will re-audit all services in Q3 2026 and publish updated results.