Sunday, September 15, 2024

checksums to detect/correct bit-rot

Does OpenBSD support any file systems with built-in checksums to
(try to) ensure metadata and/or data integrity in the face of "bit rot"
disk (or memory/cpu/USB) errors? I'm not looking for ZFS-style storage
pools or logical volume management, "just" checksums to catch silent
metadata and/or data corruption.

Softraid 1, 5, or 1C could in theory do this, but with a large space
overhead (a factor of 2 to detect errors, or 3 to correct errors).
And, the current (7.5) man pages don't mention any option to have each
read read all the chunks and verify that they're identical.

And a related question: I have a pool of ~10 external USB3 backup
disks (all consumer-grade WD or Seagate 2.5" spinning rust, either
2TB or 4TB capacity each), all currently setup with FFS2 filesystems
on top of softraid crypto (/bioctl -c C/). Each backup is to a single
disk, written with (roughly speaking)
rsync -aHESvv --delete /home/ /mnt/home/
Each disk thus has slightly different contents depending on how
recently I did a backup to that disk, but the vast majority of the
files (those that haven't changed recently) should be identical
across disks.

[Before anyone asks: Yes, I regularly rotate some of the disks offsite.
And yes, I regularly restore files "in anger".]

Each backup disk somewhat more than 1e13 bits, so at an unrecoverable
bit error rate of 1e-14 or 1e-15 for consumer disks there's a non-trivial
chance of a bit error somewhere in my backup pool.

Thinking about how to detect/correct bit-rot in these backups, it
occurs to me that I could hack up some Perl to walk the filesystem
tree on a mounted backup disk, /stat()/ and read each file, and build
a database of (pathname, inode mtime, checksum) tuples. (I could either
ignore symlinks, or checksum the result of /readlink()/.) Then given
such databases for a bunch of disks, a bit more Perl could read all
the databases, find all the files with matching pathname and inode
mtime (so that the contents should be the same, given that my usage
of /rsync/ preserves /mtime/), look for differing checksums, and for
any differences, majority-vote the checksums to identify which copy
or copies is in error.

But before I reinvent the wheel, can anyone point me to software
which already does this? Bonus points if the software is already
in ports.

Thanks,
--
-- "Jonathan Thornburg [remove -color to reply]" <dr.j.thornburg@gmail-pink.com>
on the west coast of Canada
"The programmers outside looked from Web 2.0 firm to AI company, and from
AI company to Web 2.0 firm, and from Web 2.0 firm to AI company again;
but already it was impossible to say which was which."
-- /Ars Technica/ comment by /ubercurmudgeon/, 2024-05-09
>

No comments:

Post a Comment