Friday, August 02, 2024

new (wip-ish): sysutils/plocate

I'm not particularly asking for OKs at the moment, undecided how useful
this is to have in general, but thought I'd send this out since I've got
a port in half decent shape, for interest as much as anything.

It's another locate implementation. Key differences to our usual one:

- rather than excluding "non-public" files from the database, they are
included - the database is mode 640 and the search tool is setgid so
that it can access the files, but it does an access check before
returning results to the user.

- it's extremely fast (it uses an "inverted index" aka "postings
list" of trigrams to allow fast full text searches). That doesn't really
matter for one-off searches (~150ms for a lookup in the default one
isn't too bad), but locate is used in ports infrastructure to check for
duplicate files when generating a plist, and there it soon racks up. If
the plist is much more than "fairly small", it can takes minutes of 100%
cpu on all cores to do that check. So I'm at least slightly interested
in adding a plocate database to pkglocatedb and adding a way to use that
in infrastructure.

If you want to play with this, you can convert a pkglocate db into
plocate format like this:

$ pkglocate : > tmpfile; plocate-build -l no -p tmpfile plocate-pkg.db

(the "-l no" is to mark the database as not requiring access checks;
obviously they're not possible/useful with pkglocatedb as the files are
usually _not_ installed - also the pkgname prefix gets in the way).

Biggest yucky bit with the port if used as a "standard" locate tool is
that the code to check filesystem types is Linux-only and I haven't
added an OpenBSD implementation, so you can't easily disable (e.g.)
"all NFS partitions", you've got to specify paths to skip.

No comments:

Post a Comment