# Detecting bit rot experiments and jottings



## LRList001 (May 27, 2020)

I've been doing some experiments on detecting bit rot.

One problem with a backup is knowing when it is needed.  For archival photographic images and data, this is effectively an open-ended time period.  So, some kind of consistency checker might be useful.

This requirement is different from cryptographic integrity, (which has to be non-reversible).  Detecting bit rot, which is assumed to be random and hardware related, not deliberate, merely requires a check that has a high probability of detecting a change, any change.  It is the nature of bit rot that the operating system will not know about the change, so it is no good looking at meta-data such as date of last modification, or size of file.



I have run three tests.

1/  Work out a simple check-sum for the file (the longer the word size, the higher the probability of detecting a change, 32 bits is probably long enough).  Advantage, I have total control over the results, where and when it runs and how discrepancies are reported.  Disadvantage, it is slow.  Multi-threading isn't going to help much, it is disk speed that matters.  Disk reading optimisation would likely make it somewhat faster.

2/  Use the (Windows 10) built-in certutil command (certutil -hashfile <filename> <hashtype>) (hashtype can be eg MD5, SHA256).  Advantage, much faster than 1/ (call it 10x faster), but output not intended for batch use.  Still, performance is such that it is 2-4 seconds per file.  Again, multi-threading isn't going to help, it is disk speed that matters.  Recently accessed files are fast (about instant), because they are cached by the OS, however, that isn't going to work for 1,000s of them.

3/ Use LR's DNG validation.  I don't have many DNGs, I managed to get the few I have processed at roughly 300/min.  So, performance ok, might take a day to analyse the number of files I have, however, I'm not currently using DNG.

There are products out there that aim to detect changes to file systems.  However, so far as I know, they work mostly by trapping write events or detecting changes to the meta-data.  Trapping for write events will be how the real-time ones work.  Ie, no use for detecting bit rot.  What I'm after is a tool that certificates each file and then at some slow, background rate, runs through all the files it is monitoring looking for a change and then has a good notification method.

So, perhaps the answer is to convert my 5* source images to DNG and use LR's validation from time to time.  That does at least give some long-term reassurance for my more highly rated images.


----------



## Roelof Moorlag (May 27, 2020)

Did you take a look at Validator? Plug-in Download and Installation It does something similar as DNG validation but then for all file formats.


----------



## LRList001 (May 27, 2020)

Roelof Moorlag said:


> Did you take a look at Validator? Plug-in Download and Installation It does something similar as DNG validation but then for all file formats.


Looking at it now thanks.


----------



## LRList001 (May 27, 2020)

While very useful, Validator doesn't solve a problem I didn't mention, which is that I have a lot of files outside LR. I ran the DNG validation to get some performance figures and then was musing on how I might use it.  Thanks for the suggestion.


----------



## Roelof Moorlag (May 27, 2020)

Image Verifier maybe? 
https://basepath.com/site/detail-ImageVerifier.php


----------



## Roelof Moorlag (May 27, 2020)

Maybe this thread is interesting too?
https://www.lightroomqueen.com/comm...date-image-data-from-lightroom-catalog.27809/


----------



## Paul_DS256 (May 27, 2020)

LRList001 said:


> I've been doing some experiments on detecting bit rot


You peaked my interest since I'm a techy and have never heard of 'bit rot'. There has been a lot written about it and my take, from some quick research,  is that it happens on the media holding the file system. This could be traditional magnetic disk, flash, DVD, SSD's etc. Actually,  I don't believe punch cards could suffer from bit rot so maybe it's time these were resurrected 

Someone correct me if I'm wrong  but 'bit rot' MAY appear in an individual file because of an underlying media failure where the file resides. Since media failure will always be occurring (that's why we have bad sectors) today's modern file systems will ensure that when a file is written or updated it is successfully updated.  This obviously doe not help when media deteriorates just by sitting there.

The solution for anticipating media failure at rest is multiple backups. We now get into the physicality of these backups. No sense if having them in the same building/cloud structure. One reason for this is because you don't access a file on a disk directly from a program. You have to go through the file system that has to talk to the appropriate drivers for the media. If there is a problem with the file system, it doesn't matter if a single file has bit rot.

If correct, it seems to me that checking for bit rot is a file system or media manufacturer test. This gets into the MTBF and life-span of media types e.g. DVD around 20 years. It also introduces needs such as RAID technologies and multiple nodes that a file gets written to in a Cloud storage solution.

Yes, you could check an individual file for bit rot. We used to do this in past when both disk and networking were not as robust as they are now. I would suggest is that if you have media bit rot on a single media more than one file would be impacted. 



LRList001 said:


> There are products out there that aim to detect changes to file systems


Not sure if these are the traditional disk utilities but it would be the file system that would receive the notice of a bad sector for bit rot. 

I'm also thinking that SMART disk technology would be helpful here in predicting failures. Bit Rot is only one problem.

My 2 cents.


----------



## kimballistic (May 27, 2020)

Lloyd Chambers has a command-line java utility called Integrity Checker which can be used to monitor for bit rot and also compare one folder tree with another (for example, to make sure your backups actually copied correctly, and aren't experiencing bit rot themselves).

https://diglloydtools.com/integritychecker.html


----------



## Linwood Ferguson (May 28, 2020)

Paul_DS256 said:


> Someone correct me if I'm wrong  but 'bit rot' MAY appear in an individual file because of an underlying media failure where the file resides. Since media failure will always be occurring (that's why we have bad sectors) today's modern file systems will ensure that when a file is written or updated it is successfully updated.  This obviously doe not help when media deteriorates just by sitting there.



I would not tie bit rot specifically to media failure, it can also be caused by driver errors in disk or raid controllers for example; really anything that can cause an uncommanded change. 

Also, it is not true that "today's modern file systems will ensure that when a file is written or updated it is successfully updated" at least not if you include windows and mac and typical linux.  It may TRY to ensure that, but if you include the entire stack up through the application then there are many failures of design that let errors occur without being handled properly.  Most windows (and I suspect Mac) programs for example generically trap write errors and at best give a non-detailed failure error, leaving no indication of which file and whether or not it was damaged.

There are file systems explicitly designed to detect classic bet rot, i.e. changes that do not go through the normal file I/O routines.  ReFS in Windows, zfs, btrfs are three that come to mind.  None are mainstreamed, in fact Windows seemed to have forgotten about ReFS, it never appears to have matured where it could replace NTFS.   These all have maintenance functions that do the sort of checksum discussed above on a routine basis.

I disagree that checksum validation is too slow, and that multiple threads won't help.  I wrote the LR Validator mentioned above and it will run (if I recall) 7 threads trying to get the disk drives busy (admittedly I have SSD).  Checksum calculation is pretty compute intensive so threading is good, it also provides overlap of the update on the disk with the calculations keeping both busy rather than alternating.  It's plenty fast for occasional runs.  It is not, however, suitable for people that use TIFF and JPG since it detects changes if you do something like edit-original in photoshop.   But it works for raw.   I wish LR would do such a tool, but they seem satisfied with DNG.  The main drawback of my approach is that it is not in line with "real" updates from LR and Photoshop, so it cannot distinguish an uncommanded change from one made (for example) in photoshop.

Note you will have that problem with any checksum program that is not tightly integrated with ALL programs that can change data in non-raw files (and in raw files for writing metadata back into the file vs xmp). 

The only backup tool I've found that does checksum comparison routinely is Goodsync.  It's one reason I use it (the UI is awful; complete and powerful but not intuitive, but it's a great tool otherwise).

I also use Teracopy anytime I am moving large amounts of data between drives because it will do a verify after copy.  Copying files is, I think,by far the most likely time you can introduce errors.

End of rambling...


----------



## PhilBurton (May 28, 2020)

Roelof Moorlag said:


> Image Verifier maybe?
> https://basepath.com/site/detail-ImageVerifier.php


Check the copyright dates and the supported OS.  There is no listed support for Windows 10.  I don't have experience with this utility, but I do know that Marc no longer supports other programs he has developed.

Phil Burton


----------

