Metadata Leakage in "Zero-Knowledge" File Sharing Services: A Comparative Audit

Abstract

We conducted a systematic audit of eight file sharing services that market themselves as "zero-knowledge" or "end-to-end encrypted." Our analysis focused on metadata exposure at the infrastructure layer - information observable by the service operator independent of payload encryption. Results indicate that six of eight services leak meaningful metadata including exact file sizes, upload timestamps with millisecond precision, client IP addresses, user agent strings, and in some cases approximate file types via MIME sniffing. Only two services demonstrated comprehensive metadata protection. We conclude that the term "zero-knowledge" is frequently applied in ways that are technically misleading to users.

Background

The term "zero-knowledge" in the context of file sharing services has no standardized definition. Unlike zero-knowledge proofs in cryptography - which have precise mathematical definitions - "zero-knowledge file sharing" is a marketing term that implies the service operator cannot access uploaded content. However, this framing obscures a critical distinction: content encryption versus metadata protection.

A service may encrypt file contents with a client-held key while simultaneously logging the file's exact size in bytes, the precise timestamp of upload, the uploader's IP address, geographic location, browser fingerprint, and access patterns. This metadata alone can be deeply revealing: file size combined with type hints can identify specific documents, access timestamps reveal behavioral patterns, and IP addresses provide geographic and organizational attribution.

Previous work by Greschbach et al. (2012) and Pulls & Dahlberg (2020) examined metadata leakage in anonymity networks and messaging protocols respectively. Our contribution extends this analysis to the specific domain of file sharing services marketed for privacy-conscious users.

Threat Model

We assume an honest-but-curious service operator with full access to server-side logs, database records, network traffic metadata, and infrastructure telemetry. We do not consider compromised client-side code in this study (which would defeat any end-to-end encryption scheme). Our analysis focuses exclusively on what the operator can observe through normal system operation.

Methodology

Each service was tested through a standardized audit protocol examining seven metadata vectors. We performed uploads and downloads under controlled conditions, capturing all network traffic and analyzing server-observable information.

Metadata Vectors Tested

File Size Exposure - Whether the server can determine exact file size from upload request headers, content-length, or stored ciphertext size
Timestamp Precision - Whether upload/download timestamps are logged with precision sufficient for traffic correlation
IP Address Logging - Whether client IP is associated with specific file operations in server logs
User Agent Leakage - Whether browser/client fingerprint data accompanies file operations
File Type Inference - Whether file type can be inferred from metadata (magic bytes, extension in URL, MIME type in headers)
Access Pattern Visibility - Whether the operator can observe download frequency and accessor identity
Filename Exposure - Whether original filename is visible to the server infrastructure

Classification Notice

Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The audit toolchain, traffic interception methodology, and service identification are not disclosed to prevent targeted countermeasures that would mask rather than resolve the underlying issues.

Results

The following table summarizes metadata exposure across all eight audited services. A "Pass" indicates the metadata vector is protected; "Fail" indicates the service operator can observe this information; "Partial" indicates some protection with exploitable gaps.

Vector	Svc A	Svc B	Svc C	Svc D	Svc E	Svc F	Svc G	Svc H
File Size	Fail	Fail	Partial	Fail	Fail	Pass	Fail	Pass
Timestamp	Fail	Fail	Fail	Fail	Partial	Pass	Fail	Pass
IP Address	Fail	Partial	Fail	Fail	Fail	Fail	Fail	Pass
User Agent	Fail	Fail	Fail	Partial	Fail	Pass	Fail	Pass
File Type	Fail	Fail	Pass	Fail	Pass	Pass	Partial	Pass
Access Pattern	Fail	Fail	Fail	Fail	Fail	Partial	Fail	Pass
Filename	Pass	Pass	Pass	Fail	Pass	Pass	Pass	Pass

Composite Privacy Scores

Service	Score	Rating	Claims "Zero-Knowledge"
Service A	1/7	Critical	Yes
Service B	2/7	Critical	Yes
Service C	3/7	Poor	Yes
Service D	1/7	Critical	Yes
Service E	3/7	Poor	No
Service F	6/7	Good	Yes
Service G	2/7	Critical	Yes
Service H	7/7	Excellent	Yes

Critical finding: Five of the six services that failed our audit explicitly use the term "zero-knowledge" in their marketing materials. Users relying on these claims for operational security are exposed to metadata analysis attacks that the services' own marketing implicitly claims to prevent.

Analysis of Common Failures

File Size Exposure via Content-Length

The most prevalent vulnerability (present in 6/8 services) is file size exposure through predictable ciphertext sizes. Five services store ciphertext with no padding, meaning the encrypted file size directly reveals the plaintext size (offset by a constant authentication tag and IV). Service C applies fixed-block padding but uses 1KB blocks, which still allows file size estimation within ±512 bytes - sufficient to identify many common document types.

Timestamp Correlation

Seven of eight services log upload timestamps with millisecond precision in their database layer. Even services that claim to "not log" user activity maintain creation timestamps on stored objects for garbage collection purposes. This metadata enables temporal correlation attacks where upload timing is matched against known user activity patterns.

IP Address Association

Seven services associate client IP addresses with file operations at some point in their stack. Even services using CDN proxies (which strip client IP from the origin request) retain IP data in CDN-level access logs that are available to the service operator. Only Service H demonstrated a architecture where IP addresses are never associated with specific file operations at any layer.

How Service H Achieves Full Protection

Service H (scoring 7/7) employs several architectural decisions that collectively eliminate metadata exposure:

Randomized padding: All uploads are padded to randomized sizes within predefined bands, preventing file size inference. The padding scheme adds between 5% and 40% overhead, non-deterministically.
Timing obfuscation: Upload operations are batched and released to storage with randomized delays (0-3 seconds), preventing timestamp correlation with sub-second precision.
IP disassociation: The upload endpoint operates behind a privacy-preserving relay architecture where the storage layer never observes client IP. Access logs at the edge layer are rotated every 60 seconds and contain only aggregate counters.
Uniform request profiles: All API requests use identical content types, header patterns, and response formats regardless of file type or operation, preventing inference from traffic analysis.
Access counting prevention: Download operations are served through a cache layer that does not report individual access events to the application tier.

Conclusions

The file sharing industry's use of "zero-knowledge" terminology is, in the majority of cases we examined, technically misleading. Encrypting file contents while exposing file size, upload timing, and accessor identity provides a false sense of security. Users with genuine privacy requirements - journalists, whistleblowers, activists, legal professionals - cannot rely on marketing claims alone.

We recommend that the industry adopt more precise terminology:

Content-encrypted: File contents are encrypted with client-held keys (minimum bar)
Metadata-private: File metadata is protected against operator observation
Zero-knowledge: Reserved for services that can mathematically prove they learn nothing about uploaded content or metadata

Based on this taxonomy, only 2 of 8 services audited qualify for "metadata-private" status, and only 1 approaches true zero-knowledge architecture.

Classification Notice

Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The identities of Services A-G are withheld under responsible disclosure principles. Service operators were notified of findings 30 days prior to publication.

Responsible Disclosure

All services identified with critical metadata leakage were contacted 30 days prior to this publication. Three operators acknowledged the issues; one has committed to addressing file size exposure in a future release. We will re-audit all services in Q3 2026 and publish updated results.

This study was never run against our users. Every input was synthetic and machine-generated, executed entirely in isolated test environments. We did not observe, sample, or experiment on any real user, file, or transfer. Our zero-knowledge, no-retention design means we couldn't have - there is nothing to test against in the first place.

Metadata Leakage in “Zero-Knowledge” File Sharing Services: A Comparative Audit

Abstract

Background

Threat Model

Methodology

Metadata Vectors Tested

Results

Composite Privacy Scores

Analysis of Common Failures

File Size Exposure via Content-Length

Timestamp Correlation

IP Address Association

How Service H Achieves Full Protection

Conclusions

Responsible Disclosure