Sunday, September 15, 2024

Re: checksums to detect/correct bit-rot

Hi,

On Sun, Sep 15, 2024 at 12:12:08AM -0700, Jonathan Thornburg wrote:
> Thinking about how to detect/correct bit-rot in these backups, it
> occurs to me that I could hack up some Perl to walk the filesystem
> tree on a mounted backup disk, /stat()/ and read each file, and build
> a database of (pathname, inode mtime, checksum) tuples. (I could either
> ignore symlinks, or checksum the result of /readlink()/.) Then given
> such databases for a bunch of disks, a bit more Perl could read all
> the databases, find all the files with matching pathname and inode
> mtime (so that the contents should be the same, given that my usage
> of /rsync/ preserves /mtime/), look for differing checksums, and for
> any differences, majority-vote the checksums to identify which copy
> or copies is in error.
>
> But before I reinvent the wheel, can anyone point me to software
> which already does this? Bonus points if the software is already
> in ports.

I went the "hack up some Perl" route a few years ago to create a
script that saves checksums generated by 'cksum' for all files:

https://lumidify.org/git/lumia/log.html
https://lumidify.org/doc/lumia/lumia-current.html

I didn't like the other options I found (e.g. 'bitrot') because I
wanted to easily move files around together with their checksums,
so there are some commands like 'mv' that are wrappers around the
tools of the same name but also copy/move/remove the checksums.

Everything is still a bit hacky, and I've wanted to rewrite it for
a long time, but it works well enough for my use-case that I haven't
bothered to do that so far.

For my use-case, I always copy the checksums together with the files
(which isn't very difficult since they're just stored in text files
in each directory), so the stored checksums are the same on all backup
disks, and I can just run 'lumia check' on all the disks to check if
the checksums still match the files.

Your use-case is a bit different, especially because your /home will
contain files that change often and for which it's annoying to always
manage the checksums manually, but maybe this tool will still be
helpful for someone.

--
lumidify

No comments:

Post a Comment