Sunday, February 28, 2021

Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

Hi,

compile kernel with debug enabled so you will get line number from the
crash. See what's there. Go thorough git/cvs logs and see if anybody
did anything with global mutex over sata/sr raid. Read the code. The
possibility is you are hitting a bug which is there since raid5 was
added to obsd, none
just tested with that amount of ssds so you are in unique position to
hunt this bug down. Congratulations and good luck!

Karel

On 2/28/21 3:05 AM, Mark Schneider wrote:
> Hi again,
>
> I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G
> SATA drives as RAID5 and I do not face the crash issue of the OS when
> using SSDs in the RAID5.
> Details of the RAID5 setting are in the attached file.
>
> It looks like using SSD drives as RAID5 leads for some reason to the
> OpenBSD 6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface
> (what is different compared to tested HDDs)
>
> NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm /
> cryptoLUKS) does not face any issues
>       There are also no issues using those SSDs as RAID on FreeBSD
> (TrueNAS).
>
> Kind regards
> Mark
>
>
> On 27.02.21 04:30, Mark Schneider wrote:
>> Hi,
>>
>>
>> I face system crash on OpenBSD 6.8 when trying to use softraid RAID5
>> drive trying to write big files (like 10GBytes) to it.
>>
>> I can reproduce the error (tested on two different systems with
>> OpenBSD 6.8 installed on an SSD drive or an USB stick). The RAID5
>> drive itself consist of six Samsung PRO 860 512GB SSDs.
>>
>> In short:
>>
>> bioctl -c 5 -l sd0a,sd1a,sd2a,sd3a,sd4a,sd5a softraid0
>>
>> obsdssdarc# disklabel sd7
>> # /dev/rsd7c:
>> type: SCSI
>> disk: SCSI disk
>> label: SR RAID 5
>> duid: a50fb9a25bf07243
>> flags:
>> bytes/sector: 512
>> sectors/track: 255
>> tracks/cylinder: 511
>> sectors/cylinder: 130305
>> cylinders: 38379
>> total sectors: 5001073280
>> boundstart: 0
>> boundend: 5001073280
>> drivedata: 0
>>
>> 16 partitions:
>> #                size           offset  fstype [fsize bsize cpg]
>>   a:       5001073280                0  4.2BSD   8192 65536 52270
>>   c:       5001073280                0  unused
>>
>> #
>> --------------------------------------------------------------------------------
>>
>> obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/1GB-urandom.bin bs=1M
>> count=1024
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes transferred in 8.120 secs (132218264 bytes/sec)
>>     0m08.13s real     0m00.00s user     0m08.14s system
>>
>> # Working as expected
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>
>> obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/10GB-urandom.bin
>> bs=10M count=1024
>>
>> # Error messages
>>
>> uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e
>> kernel: page fault trap, code=0
>> Stopped at      sr_validate_io+0x44:    cmpl     $0,0x40(%r9)
>> ddb{2}>
>>
>> # Crashing OpenBSD 6.8
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>
>> # After reboot:
>>
>> obsdssdarc# mount /dev/sd7a /arc-ssd/
>> mount_ffs: /dev/sd7a on /arc-ssd: Device not configure
>>
>> obsdssdarc# grep sd7 /var/run/dmesg.boot
>> softraid0: trying to bring up sd7 degraded
>> softraid0: sd7 was not shutdown properly
>> softraid0: sd7 is offline, will not be brought online
>>
>>
>> More details in attached files. Thanks a lot in advance for short
>> feedback.
>>
>>
>> Kind regards
>>
>> Mark
>>
>

No comments:

Post a Comment