# Checksum software that supports directories for Mac



## kkamin (Jun 22, 2010)

Hello,

As you know, in this day and age of digital imaging, being able to archive our image files in an uncorrupted state is of upmost importance.  For all the experts out there, I am having a hard time finding a checksum program that supports dropping in whole directories rather than manually selecting files from within the directories or folders. 

I tried using Checksum+ 1.5.2 but it freezes when a large directory is dragged on to the program.  Checksum+ 1.5.3 just crashes at start up; I can't even get the program to work at all on Leopard 10.5.8.

Thanks for reading.


----------



## Garbz (Jun 22, 2010)

Wait. Why are we so untrusting of the copy method. Maybe that is what needs to be changed. 

You could copy the files using rsync. It has a checksum based copy option. Where everytime you run it on a directory it can checksum it to see if any data is corrupted.


----------



## kkamin (Jun 22, 2010)

Well, I also want to be able to create a checksum MD5 file that I can store with the archived files.  So in the future I can periodically run the MD5 file and see that it still matches the files and that my file data is still intact.

On a side note to everyone.  I've recently learned about 'data validation' as part of a digital workflow.  I encourage people to check out Data Validation | dpBestflow.  It is an ASMP initiative funded by the US Library of Congress.  I was not aware of potential transfer corruption before (I assumed my OS naturally did it perfectly).  Or of how to deal with scanning for data corruption that will eventually occur on stored archived media.  A lot of the third party tools can be shareware or freeware.


----------



## epatsellis (Jun 22, 2010)

If you're going through that much pain, why not just use enterprise level hardware? I have a Raid Enclosure and DLT tape backup, and though I do a checksum and media audit from time to time, in the several years I've been using it, I've never had a single issue. Prior, I had a file server with 4 enterprise class Raid 5 arrays, never had a single issue for the 8 years I used it.

The bigger issue is are you doing work at a level that requires such levels of absolute availabilty, or just family snapshots? If the former, enterprise level RAID arrays and LTO drives aren't that expensive when amortized, it's far easier justify a $10k expense if you can amortize it over 5 years. 

Know any IT professionals? You can typically find last generation RAID arrays and LTO drives for the cost of the gas to pick them up these days.  You may be limited to ~ 1-5tb arrays, but if you're hell bent on absolute reliability, they're the way to go.


----------



## kkamin (Jun 22, 2010)

Thanks for the suggestions.  RAID will definitely be a way to go for me in the future when I get into more video work.  I'll talk to my IT friend about that.  : )  As for the motivation for wanting to use hashes to verify data, I am a professional photographer and need to ensure proper data transfer to redundant back-up sources (online servers and read-once media) and in the future will need some way to test for data degradation.  It's not that hard at all.  It's just an extra step or two in the work flow.  So, the RAID drive coupled with a synchronization app will be great for on-site storage.  But I'll have drives and discs off-site and images backed up on server sites for redundancy purposes (apt. gets hit by lightning, fire, etc.)


----------



## usayit (Jun 22, 2010)

Do this from terminal and "cd" to the top level directory you want to start generating cksum output.

find . | xargs -I{} cksum {}

If you do this on the source and destination side of your copy and redirect the output to a file, their resulting files should match (use "diff" command).   If cksum doesn't match, then use the diff output to determine which particular file didn't match.


----------



## kkamin (Jun 22, 2010)

usayit said:


> Do this from terminal and "cd" to the top level directory you want to start generating cksum output.
> 
> find . | xargs -I{} cksum {}
> 
> If you do this on the source and destination side of your copy and redirect the output to a file, their resulting files should match (use "diff" command).   If cksum doesn't match, then use the diff output to determine which particular file didn't match.



I'll give it a try.  Thanks a lot!  Terminal scares me though.  lol.


----------



## Garbz (Jun 23, 2010)

Doesn't that generate one checksum per file?

If so I can suggest piping the output to a file and then checksumming that file. That'll give you a single number.

find . | xargs -I{} cksum {} >> file.chk

cksum file.chk


----------



## abcschuetze (Sep 12, 2011)

I know this is an old thread but since it was one of the top google hits when I searched for a similar program as OP did, I figure it makes sense to complement this thread with what I believe is a good reply to the OP's request.

I found a very nice utility that is based on java, definitely works great on Mac OS 10.7 (Lion). It can be used to compare two folders based on filesize and three different checksums (CRC, SHA1, MD5) and it also can be used for the OPs original task as if offers to create files to store the checksums of all files contained in a folder. As a sweet side note, it actually performs faster than Mac OS' own -md5- command line utility, even though it computes all three checksums. I used it on a very large (1.2 TB) folder and it just worked away until it was done without crash (I didn't time it but started it in the afternoon and it was done the next morning). The free tool is aptly named "Compare Folders" and found on this page: Software Downloads and Purchase - by: Keith Fenske under "Java Utilities", or here: Download Compare Folders 3 Free - Compare files, folders, and CRC32, MD5, SHA1 checksums in XML format - Softpedia

Another remark regarding data safety and backup in general. I believe that the OPs approach to burn files to (ideally two copies) of DVDs for long term storage and then use the checksum files to periodically check for data integrity is a great approach and, for long term archiving storage, possibly preferable to RAID. What many people forget is that RAID protects very effectively from hardware failure. But just from hardware failure. It doesn't protect from data corruption or loss due to user mistakes or software issues. For example, irreparable file system corruption is more common than one would think. So, I think the OPs backup and data protection setup seems very appropriate and I also believe that any backup scheme that relies on optical discs should include periodical checks of data integrity and a hash check is a great way of doing that.


----------

