Sunday, February 28, 2021

Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

On Sun, Feb 28, 2021 at 03:05:49AM +0100, Mark Schneider wrote:
> Hi again,
>
> I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G SATA
> drives as RAID5 and I do not face the crash issue of the OS when using SSDs
> in the RAID5.
> Details of the RAID5 setting are in the attached file.
>
> It looks like using SSD drives as RAID5 leads for some reason to the OpenBSD
> 6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface (what is
> different compared to tested HDDs)
>
> NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm / cryptoLUKS)
> does not face any issues
>       There are also no issues using those SSDs as RAID on FreeBSD
> (TrueNAS).

I've seen some Samsung Pro SSDs cause I/O errors on ahci(4) due to unhandled
NCQ error conditions. Not sure if this relates to your problem; I assume that
these errors were specific to my machine, which is over 10 years old. Its AHCI
controller has likely not been designed with modern SSDs in mind. I switched
to different SSDs and the problem disappeared. This was on RAID1 where the
kernel didn't crash. Instead, the volume ended up in degraded state.

Maybe some I/O error is happening in your case as well?
Perhaps the raid5 code doesn't handle i/o errors gracefully?

In any case, your bug report is missing important information:

> > # Error messages
> >
> > uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at      sr_validate_io+0x44:    cmpl     $0,0x40(%r9)
> > ddb{2}>

This tells us where it crashed but not how the code flow ended up here.
Please show the stack trace printed by the 'trace' command, and the output
of the 'ps' command (both commands at the ddb> prompt).

No comments:

Post a Comment