Abstract

We conducted a systematic audit of eight file sharing services that market themselves as "zero-knowledge" or "end-to-end encrypted." Our analysis focused on metadata exposure at the infrastructure layer—information observable by the service operator independent of payload encryption. Results indicate that six of eight services leak meaningful metadata including exact file sizes, upload timestamps with millisecond precision, client IP addresses, user agent strings, and in some cases approximate file types via MIME sniffing. Only two services demonstrated comprehensive metadata protection. We conclude that the term "zero-knowledge" is frequently applied in ways that are technically misleading to users.

Background

The term "zero-knowledge" in the context of file sharing services has no standardized definition. Unlike zero-knowledge proofs in cryptography—which have precise mathematical definitions—"zero-knowledge file sharing" is a marketing term that implies the service operator cannot access uploaded content. However, this framing obscures a critical distinction: content encryption versus metadata protection.

A service may encrypt file contents with a client-held key while simultaneously logging the file's exact size in bytes, the precise timestamp of upload, the uploader's IP address, geographic location, browser fingerprint, and access patterns. This metadata alone can be deeply revealing: file size combined with type hints can identify specific documents, access timestamps reveal behavioral patterns, and IP addresses provide geographic and organizational attribution.

Previous work by Greschbach et al. (2012) and Pulls & Dahlberg (2020) examined metadata leakage in anonymity networks and messaging protocols respectively. Our contribution extends this analysis to the specific domain of file sharing services marketed for privacy-conscious users.

Threat Model

We assume an honest-but-curious service operator with full access to server-side logs, database records, network traffic metadata, and infrastructure telemetry. We do not consider compromised client-side code in this study (which would defeat any end-to-end encryption scheme). Our analysis focuses exclusively on what the operator can observe through normal system operation.

Methodology

Each service was tested through a standardized audit protocol examining seven metadata vectors. We performed uploads and downloads under controlled conditions, capturing all network traffic and analyzing server-observable information.

Metadata Vectors Tested

  1. File Size Exposure — Whether the server can determine exact file size from upload request headers, content-length, or stored ciphertext size
  2. Timestamp Precision — Whether upload/download timestamps are logged with precision sufficient for traffic correlation
  3. IP Address Logging — Whether client IP is associated with specific file operations in server logs
  4. User Agent Leakage — Whether browser/client fingerprint data accompanies file operations
  5. File Type Inference — Whether file type can be inferred from metadata (magic bytes, extension in URL, MIME type in headers)
  6. Access Pattern Visibility — Whether the operator can observe download frequency and accessor identity
  7. Filename Exposure — Whether original filename is visible to the server infrastructure
Classification Notice

Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The audit toolchain, traffic interception methodology, and service identification are not disclosed to prevent targeted countermeasures that would mask rather than resolve the underlying issues.

Results

The following table summarizes metadata exposure across all eight audited services. A "Pass" indicates the metadata vector is protected; "Fail" indicates the service operator can observe this information; "Partial" indicates some protection with exploitable gaps.

Vector Svc A Svc B Svc C Svc D Svc E Svc F Svc G Svc H
File Size Fail Fail Partial Fail Fail Pass Fail Pass
Timestamp Fail Fail Fail Fail Partial Pass Fail Pass
IP Address Fail Partial Fail Fail Fail Fail Fail Pass
User Agent Fail Fail Fail Partial Fail Pass Fail Pass
File Type Fail Fail Pass Fail Pass Pass Partial Pass
Access Pattern Fail Fail Fail Fail Fail Partial Fail Pass
Filename Pass Pass Pass Fail Pass Pass Pass Pass

Composite Privacy Scores

Service Score Rating Claims "Zero-Knowledge"
Service A 1/7 Critical Yes
Service B 2/7 Critical Yes
Service C 3/7 Poor Yes
Service D 1/7 Critical Yes
Service E 3/7 Poor No
Service F 6/7 Good Yes
Service G 2/7 Critical Yes
Service H 7/7 Excellent Yes

Critical finding: Five of the six services that failed our audit explicitly use the term "zero-knowledge" in their marketing materials. Users relying on these claims for operational security are exposed to metadata analysis attacks that the services' own marketing implicitly claims to prevent.

Analysis of Common Failures

File Size Exposure via Content-Length

The most prevalent vulnerability (present in 6/8 services) is file size exposure through predictable ciphertext sizes. Five services store ciphertext with no padding, meaning the encrypted file size directly reveals the plaintext size (offset by a constant authentication tag and IV). Service C applies fixed-block padding but uses 1KB blocks, which still allows file size estimation within ±512 bytes—sufficient to identify many common document types.

Timestamp Correlation

Seven of eight services log upload timestamps with millisecond precision in their database layer. Even services that claim to "not log" user activity maintain creation timestamps on stored objects for garbage collection purposes. This metadata enables temporal correlation attacks where upload timing is matched against known user activity patterns.

IP Address Association

Seven services associate client IP addresses with file operations at some point in their stack. Even services using CDN proxies (which strip client IP from the origin request) retain IP data in CDN-level access logs that are available to the service operator. Only Service H demonstrated a architecture where IP addresses are never associated with specific file operations at any layer.

How Service H Achieves Full Protection

Service H (scoring 7/7) employs several architectural decisions that collectively eliminate metadata exposure:

Conclusions

The file sharing industry's use of "zero-knowledge" terminology is, in the majority of cases we examined, technically misleading. Encrypting file contents while exposing file size, upload timing, and accessor identity provides a false sense of security. Users with genuine privacy requirements—journalists, whistleblowers, activists, legal professionals—cannot rely on marketing claims alone.

We recommend that the industry adopt more precise terminology:

Based on this taxonomy, only 2 of 8 services audited qualify for "metadata-private" status, and only 1 approaches true zero-knowledge architecture.

Classification Notice

Implementation details, source code, and specific parameter configurations referenced in this study are proprietary to topriv and are not disclosed in this publication. The identities of Services A-G are withheld under responsible disclosure principles. Service operators were notified of findings 30 days prior to publication.

Responsible Disclosure

All services identified with critical metadata leakage were contacted 30 days prior to this publication. Three operators acknowledged the issues; one has committed to addressing file size exposure in a future release. We will re-audit all services in Q3 2026 and publish updated results.