Thursday, May 23, 2024

Re: advice debugging lockups with swap-thrashing symptoms?

On 5/23/24 03:18, Stuart Henderson wrote:
> On 2024-05-22, James Cook <falsifian@falsifian.org> wrote:
>> One of my OpenBSD boxes sometimes gets in a weird locked-up or
>> almost-locked-up state. I'm wondering what I can do to debug it
>> further next time it happens.
> ...
>> I would also expect the cache number to be much higher. E.g. on
>> this occasion, I was running "git annex fsck", which reads plenty
>> of data from disk.
>
> Heavy filesystem access can result in this sort of thing, I used to
> have unpacked ports source on one of my machines for grepping over,
> the machine was pretty much unusable for anything else while that was
> running.
>
> Might be worth trying some noatime mount flags if you don't already have
> them, at least then you can avoid turning some reads into writes.
>

Definitely a possibility. Long time ago, I think I asked about the
possibility of a "disknice" to throttle disk access on individual
tasks. TedU@ came through for me with something that definitely solved
my problem, and I use it from time to time since -- basically, it just
suspends a particular program occasionally, which lets other programs
have a chance to get disk access. I saved it (and made a tiny update
that is needed now) and put it here:

https://holland-consulting.net/scripts/disknice.html


Also...
I've seen disks "fail" where they get super-slow. The failure modes
seems to be difficulty reading data...but after enough retries, it
succeeds, resetting the retry counter back to zero, and then the next
read encounters the same problem. You may be able to hear lots of
activity on the drive with little obvious progress. I'm not convinced
this is your problem, but ... something to consider.

Nick.

No comments:

Post a Comment