Sunday, September 15, 2024

Re: checksums to detect/correct bit-rot

Jonathan Thornburg wrote:
> And a related question: I have a pool of ~10 external USB3 backup
> disks (all consumer-grade WD or Seagate 2.5" spinning rust, either
> 2TB or 4TB capacity each), all currently setup with FFS2 filesystems
> on top of softraid crypto (/bioctl -c C/). Each backup is to a single
> disk, written with (roughly speaking)
> rsync -aHESvv --delete/home/ /mnt/home/
> Each disk thus has slightly different contents depending on how
> recently I did a backup to that disk, but the vast majority of the
> files (those that haven't changed recently) should be identical
> across disks.
>
> [...]
> Thinking about how to detect/correct bit-rot in these backups

I am glad you started this thread.

First of all, the most outrageous threat against your data isn't silent
bitrot. It is accidents. In my experience, I am way more likely to lose
a file because I delete it by accident, or because I decide I no longer
want it and regret trashing it a week after.

That said, my backup strategy (for desktops) is as follows:

I create a checksum list of all my files. Something like:

cd /home/user/documents
find . -type f ! -name '*.md5' -print0 | xargs -0 md5 -r | sort -k 2 >
checksums_`date +%Y-%m-%d`.md5

For big datasets this is time-intensive.

When time comes to make a backup, I run the command above and diff the
recently created checksum list with the previous checksum list to
manually verify changes:

diff $old_checksum $new_checksum | less

This is because it is important to ensure your files are good before you
back them up. You don't want to commit bad data to backup storage.

I use the restic backup tool (in ports) to send the data to backup
storage. I use it to commit to both sftp and NFS based repositories, but
you can do the same with external media too.

restic -r /some/repository backup --exclude-file=restic_exclude.txt /

Restic has a number of advantages. First, it supports encrypted backups
so you can export your data to unsafe locations (such as cheap NAS
appliances). Restic takes incremental snapshots and keeps them on record
until you delete them, so the same repository can be used to recover
into any point in time for which you took a snapshot. Restic also has
integrated repository testing you can use to ensure your repositories
are sound

restic -r /some/repository check # Ensure index integrity

restic -r /some/repository check --read-data=true # Time-consuming
verification of EVERYTHING.

The downsides are that this method is only very good for files that
don't change that much (otherwise, diffing the checksums generates too
much noise) and that it makes you depend on a non standard tool for your
recovers.

No comments:

Post a Comment