Table of Contents#
- Introduction to File Verification & Why It Matters
- What is Hashdeep? Key Features Overview
- Installing Hashdeep on Major Linux Distros
- Getting Started with Hashdeep: Basic Commands
- Advanced Hashdeep Techniques
- Hashdeep vs. Traditional Hashing Tools (md5sum, sha256sum)
- Practical Use Cases for Hashdeep
- Tips and Best Practices
- Conclusion
- References
What is Hashdeep? Key Features Overview#
Hashdeep is a cross-platform command-line utility for computing and comparing cryptographic hashes. Its core features set it apart from traditional tools:
- Multi-Algorithm Support: Generate MD5, SHA-1, SHA-256, SHA-512, and Tiger hashes simultaneously.
- Recursive Scanning: Natively traverse directory trees to hash all files in subdirectories.
- Hash Databases: Create persistent databases of file hashes for future integrity checks.
- Advanced Comparison: Identify matches, mismatches, new files, and missing files when comparing against a database.
- Duplicate Detection: Efficiently find duplicate files by comparing their hashes.
- Regex Filtering: Include or exclude files based on regular expressions for targeted hashing.
Installing Hashdeep on Major Linux Distros#
Hashdeep is pre-packaged in most popular Linux distributions. Use the following commands to install it:
Debian/Ubuntu and Derivatives#
sudo apt update && sudo apt install hashdeep -yRHEL/CentOS/Fedora#
sudo dnf install hashdeep -yArch Linux and Manjaro#
sudo pacman -S hashdeepVerify Installation#
Confirm Hashdeep is installed correctly with:
hashdeep --versionYou should see output like: hashdeep 4.4 (version number may vary).
Getting Started with Hashdeep: Basic Commands#
Hashdeep uses a consistent syntax, with the -c flag specifying one or more hash algorithms (separated by commas). Let’s start with fundamental operations.
Generating Hashes for Single or Multiple Files#
To generate hashes for a single file using multiple algorithms:
hashdeep -c md5,sha256,sha1 mydocument.pdfThis command outputs three hashes (MD5, SHA-256, SHA-1) for mydocument.pdf, along with the file path and size.
For multiple files matching a pattern (e.g., all .txt files in the current directory):
hashdeep -c sha256 *.txtRecursive Hashing of Directories#
Hashdeep can natively scan directories recursively. To hash all files in /home/user/photos and its subfolders using SHA-256:
hashdeep -r -c sha256 /home/user/photosThe -r flag enables recursive mode, and the output lists every file’s hash, path, and size.
Creating a Hash Database for Future Comparison#
Store hashes in a text file (database) to verify integrity later:
hashdeep -r -c md5,sha256 /home/user/important-docs > important-docs-hashes.dbStore this database in a secure location (e.g., external drive, encrypted cloud storage) separate from the original files.
Advanced Hashdeep Techniques#
Comparing Files Against a Hash Database#
Use the -k flag to compare files against a pre-generated database. Add -v for verbose output:
hashdeep -v -k important-docs-hashes.db -r /home/user/important-docsHashdeep categorizes results into four groups:
- Matches: Files with identical hashes in the database and directory.
- Mismatches: Files with different hashes (potential corruption or tampering).
- New Files: Files in the directory not in the database.
- Missing Files: Files in the database not in the directory.
Verifying File Integrity After Transfer#
Ensure files are not corrupted during transfer with these steps:
- Before Transfer: Generate a hash database on the source system:
hashdeep -r -c sha256 /path/to/source-dir > transfer-hashes.db - Transfer Files: Copy the directory to the target system.
- After Transfer: Compare the target directory against the database:
hashdeep -k transfer-hashes.db -r /path/to/target-dir
Any mismatches indicate corrupted files that need re-transferring.
Identifying Duplicate Files Efficiently#
Find duplicates by comparing hashes. First, generate hash lists, then filter for duplicates:
hashdeep -r -c sha256 /path/to/dir | awk '{print $1, $3}' | sort | uniq -d -f 1awk '{print $1, $3}': Extracts the hash and file path.sort: Organizes the list by hash.uniq -d -f 1: Shows only duplicate hash entries.
Using Hashdeep with Regular Expressions#
Use -i (include) or -x (exclude) flags to filter files with regex:
- Hash only
.jpgand.pngfiles recursively:hashdeep -r -c sha1 -i ".*\.(jpg|png)$" /home/user/gallery - Exclude temporary
.tmpfiles:hashdeep -r -c md5 -x ".*\.tmp$" /home/user/projects
Hashdeep vs. Traditional Hashing Tools (md5sum, sha256sum)#
| Feature | Hashdeep | md5sum/sha256sum |
|---|---|---|
| Recursive Hashing | Native support (-r flag) | Requires find + xargs |
| Multiple Hash Algorithms | Simultaneous generation | Only one algorithm per run |
| Hash Database Comparison | Built-in (-k flag) | Manual diffing required |
| Duplicate Detection | Supported via hash comparison | No native support |
| Detailed Output | Categorized results | Only match/non-match |
| Regex Filtering | Native (-i/-x flags) | No native support |
Traditional tools work for single-file verification, but Hashdeep excels at complex, large-scale tasks.
Practical Use Cases for Hashdeep#
- Digital Forensics: Create a hash baseline of evidence files to prove integrity in court.
- Backup Verification: Periodically compare backups against hash databases to detect corruption.
- Server Integrity Checks: Monitor critical system files (e.g.,
/etc,/bin) for unauthorized modifications. - Download Verification: Recursively verify entire directories of downloaded software packages.
- Storage Cleanup: Free up space by deleting duplicate photos, videos, or documents.
Tips and Best Practices#
- Use Secure Hashes: Avoid MD5 for critical data—opt for SHA-256 or SHA-512 (resistant to collision attacks).
- Secure Database Storage: Keep hash databases separate from original files to avoid losing both in case of failure.
- Automate Checks: Create a cron job for periodic integrity scans. Example:
# Daily SHA-256 check of /home/user/docs at 2 AM 0 2 * * * /usr/bin/hashdeep -k /var/backups/docs-hashes.db -r /home/user/docs >> /var/log/hashdeep.log 2>&1 - Filter Results: Use
grepto focus on critical output (e.g.,hashdeep ... | grep MISMATCH). - Document Databases: Note the date, user, and system when creating databases for compliance or forensics.
Conclusion#
Hashdeep is a comprehensive solution for file integrity verification, duplicate detection, and data security. Its ability to handle recursive directories, compare against persistent hash databases, and support multiple algorithms makes it an indispensable tool for Linux users of all skill levels. Whether you’re maintaining server integrity, verifying backups, or cleaning up duplicates, Hashdeep simplifies complex tasks with its intuitive interface and robust feature set.
References#
- Hashdeep Official Repository: https://github.com/jessekornblum/hashdeep
- Hashdeep Man Page: https://linux.die.net/man/1/hashdeep
- "File Integrity Checking with Hashdeep" - Linux Journal: https://www.linuxjournal.com/content/file-integrity-checking-hashdeep
- NIST Cryptographic Hash Functions: https://csrc.nist.gov/projects/hash-functions