Step 1 — Select Files or Folder

Uses File System Access API — scans recursively

Select Files (Firefox / Safari)

Fallback — select individual files or a folder

Drag & Drop Files Here

Drop any number of files to scan for duplicates

Step 2 — Scan Options

Scan Mode

Auto

pHash for images, MD5 for others

MD5 Only

Exact hash for all file types

Image Similarity Threshold 97%

95% — More matches 100% — Exact only

Applies pHash Hamming distance for images. 97% = max 2 bits different.

Frequently Asked Questions

Why can't I delete files directly from the browser?

Browser security policies (sandboxing) prevent web pages from directly modifying or deleting files on your system. Even the File System Access API only grants read access for scanning. This is actually a feature — it means a rogue website can never silently delete your data. This tool gives you a full report and CSV export so you can safely review and manually delete files at your own discretion.

What is perceptual hashing (pHash)?

Perceptual hashing reduces an image to a compact fingerprint that captures visual content rather than exact pixel data. Two photos of the same subject — even at different resolutions, slight crops, or minor edits — will produce nearly identical pHashes. This contrasts with MD5/SHA hashing where even a single pixel difference produces a completely different hash. The similarity slider lets you tune how strict the perceptual matching is across a 95%–100% range.

Does it scan inside folders recursively?

Yes — when you use the "Select Folder" button (Chrome/Edge), the tool recursively enumerates all files in every subdirectory using the File System Access API. When using the "Select Files" fallback (Firefox/Safari with webkitdirectory), the browser itself performs the recursive enumeration and provides all files to the tool. In both cases all nested files are included.

What file types does it support?

The tool supports all file types. Images (JPEG, PNG, GIF, WEBP, BMP, TIFF, AVIF) use perceptual hashing for smart similarity detection. All other types — PDFs, Word documents, spreadsheets, videos, audio files, zip archives, source code — use SparkMD5 content hashing to detect byte-for-byte duplicates. The auto mode automatically chooses the right algorithm per file.

How does the similarity slider work?

The slider controls the Hamming distance threshold for image pHash comparison. A pHash is a 64-bit binary string. The Hamming distance is the number of bit positions that differ between two hashes. At 100%, only hashes with 0 bits different (identical) are matched. At 97%, up to ~2 bits may differ. At 95%, up to ~3 bits may differ, catching more visually similar but not identical images. For non-image files, the threshold has no effect — only exact hash matches are reported.

How large a folder can I scan?

In practice, scanning thousands of files works smoothly. The tool processes files sequentially with async chunked reads to avoid blocking the UI. For very large folders (50,000+ files), the browser tab may use significant memory since file metadata and hashes are held in RAM. There is no server-side limit. For best performance, scan specific sub-folders rather than your entire system root.

Find Duplicate Files Instantly