Tuesday, March 02, 2021

Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

obsd69b# dd if=/dev/zero of=/dev/rsd1c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.480 secs (308516258 bytes/sec)
obsd69b# dd if=/dev/zero of=/dev/rsd2c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.472 secs (309202512 bytes/sec)
obsd69b# dd if=/dev/zero of=/dev/rsd3c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.462 secs (310091044 bytes/sec)
obsd69b# # ----------------------------------------------------------------
obsd69b# disklabel -E sd1
Label editor (enter '?' for help at any prompt)
sd1> p
OpenBSD area: 0-2000409264; size: 2000409264; free: 2000409264
# size offset fstype [fsize bsize cpg]
c: 2000409264 0 unused
sd1> a a
offset: [0]
size: [2000409264] *
FS type: [4.2BSD] RAID
sd1*> w
sd1> q
No label changes.
obsd69b# disklabel -E sd2
Label editor (enter '?' for help at any prompt)
sd2> a a
offset: [0]
size: [2000409264] *
FS type: [4.2BSD] RAID
sd2*> w
sd2> q
No label changes.
obsd69b# disklabel -E sd3
Label editor (enter '?' for help at any prompt)
sd3> a a
offset: [0]
size: [2000409264] *
FS type: [4.2BSD] RAID
sd3*> w
sd3> q
No label changes.


# ---

obsd69b# bioctl -c 5 -l sd1a,sd2a,sd3a softraid0
softraid0: RAID 5 volume attached as sd4
obsd69b# bi
biff bind bioctl
obsd69b# bioctl sd4
Volume Status Size Device
softraid0 0 Online 2048418512896 sd4 RAID5
0 Online 1024209272832 0:0.0 noencl <sd1a>
1 Online 1024209272832 0:1.0 noencl <sd2a>
2 Online 1024209272832 0:2.0 noencl <sd3a>
obsd69b# # ----------------------------------------------------------------
obsd69b# disklabel sd4
# /dev/rsd4c:
type: SCSI
disk: SCSI disk
label: SR RAID 5
duid: 0000000000000000
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 249039
total sectors: 4000817408
boundstart: 0
boundend: 4000817408
drivedata: 0

16 partitions:
# size offset fstype [fsize bsize cpg]
c: 4000817408 0 unused
obsd69b# # ----------------------------------------------------------------
obsd69b# disklabel -E sd4
Label editor (enter '?' for help at any prompt)
sd4> a a
offset: [0]
size: [4000817408] *
FS type: [4.2BSD]
sd4*> w
sd4> q
No label changes.


obsd69b# # ----------------------------------------------------------------


obsd69b# newfs sd4a
/dev/rsd4a: 1953524.1MB in 4000817408 sectors of 512 bytes
598 cylinder groups of 3266.88MB, 52270 blocks, 104704 inodes each
super-block backups (for fsck -b #) at:
256, 6690816, 13381376, 20071936, 26762496, 33453056, 40143616, 46834176, 53524736, 60215296, 66905856, 73596416, 80286976, 86977536, 93668096, 100358656, 107049216, 113739776, 120430336, 127120896, 133811456, 140502016, 147192576, 153883136, 160573696, 167264256, 173954816, 180645376, 187335936, 194026496,
200717056, 207407616, 214098176, 220788736, 227479296, 234169856, 240860416, 247550976, 254241536, 260932096, 267622656, 274313216, 281003776, 287694336, 294384896, 301075456, 307766016, 314456576, 321147136, 327837696, 334528256, 341218816, 347909376, 354599936, 361290496, 367981056, 374671616, 381362176,
388052736, 394743296, 401433856, 408124416, 414814976, 421505536, 428196096, 434886656, 441577216, 448267776, 454958336, 461648896, 468339456, 475030016, 481720576, 488411136, 495101696, 501792256, 508482816, 515173376, 521863936, 528554496, 535245056, 541935616, 548626176, 555316736, 562007296, 568697856,
575388416, 582078976, 588769536, 595460096, 602150656, 608841216, 615531776, 622222336, 628912896, 635603456, 642294016, 648984576, 655675136, 662365696, 669056256, 675746816, 682437376, 689127936, 695818496, 702509056, 709199616, 715890176, 722580736, 729271296, 735961856, 742652416, 749342976, 756033536,
762724096, 769414656, 776105216, 782795776, 789486336, 796176896, 802867456, 809558016, 816248576, 822939136, 829629696, 836320256, 843010816, 849701376, 856391936, 863082496, 869773056, 876463616, 883154176, 889844736, 896535296, 903225856, 909916416, 916606976, 923297536, 929988096, 936678656, 943369216,
950059776, 956750336, 963440896, 970131456, 976822016, 983512576, 990203136, 996893696, 1003584256, 1010274816, 1016965376, 1023655936, 1030346496, 1037037056, 1043727616, 1050418176, 1057108736, 1063799296, 1070489856, 1077180416, 1083870976, 1090561536, 1097252096, 1103942656, 1110633216, 1117323776, 1124014336,
1130704896, 1137395456, 1144086016, 1150776576, 1157467136, 1164157696, 1170848256, 1177538816, 1184229376, 1190919936, 1197610496, 1204301056, 1210991616, 1217682176, 1224372736, 1231063296, 1237753856, 1244444416, 1251134976, 1257825536, 1264516096, 1271206656, 1277897216, 1284587776, 1291278336, 1297968896,
1304659456, 1311350016, 1318040576, 1324731136, 1331421696, 1338112256, 1344802816, 1351493376, 1358183936, 1364874496, 1371565056, 1378255616, 1384946176, 1391636736, 1398327296, 1405017856, 1411708416, 1418398976, 1425089536, 1431780096, 1438470656, 1445161216, 1451851776, 1458542336, 1465232896, 1471923456,
1478614016, 1485304576, 1491995136, 1498685696, 1505376256, 1512066816, 1518757376, 1525447936, 1532138496, 1538829056, 1545519616, 1552210176, 1558900736, 1565591296, 1572281856, 1578972416, 1585662976, 1592353536, 1599044096, 1605734656, 1612425216, 1619115776, 1625806336, 1632496896, 1639187456, 1645878016,
1652568576, 1659259136, 1665949696, 1672640256, 1679330816, 1686021376, 1692711936, 1699402496, 1706093056, 1712783616, 1719474176, 1726164736, 1732855296, 1739545856, 1746236416, 1752926976, 1759617536, 1766308096, 1772998656, 1779689216, 1786379776, 1793070336, 1799760896, 1806451456, 1813142016, 1819832576,
1826523136, 1833213696, 1839904256, 1846594816, 1853285376, 1859975936, 1866666496, 1873357056, 1880047616, 1886738176, 1893428736, 1900119296, 1906809856, 1913500416, 1920190976, 1926881536, 1933572096, 1940262656, 1946953216, 1953643776, 1960334336, 1967024896, 1973715456, 1980406016, 1987096576, 1993787136,
2000477696, 2007168256, 2013858816, 2020549376, 2027239936, 2033930496, 2040621056, 2047311616, 2054002176, 2060692736, 2067383296, 2074073856, 2080764416, 2087454976, 2094145536, 2100836096, 2107526656, 2114217216, 2120907776, 2127598336, 2134288896, 2140979456, 2147670016, 2154360576, 2161051136, 2167741696,
2174432256, 2181122816, 2187813376, 2194503936, 2201194496, 2207885056, 2214575616, 2221266176, 2227956736, 2234647296, 2241337856, 2248028416, 2254718976, 2261409536, 2268100096, 2274790656, 2281481216, 2288171776, 2294862336, 2301552896, 2308243456, 2314934016, 2321624576, 2328315136, 2335005696, 2341696256,
2348386816, 2355077376, 2361767936, 2368458496, 2375149056, 2381839616, 2388530176, 2395220736, 2401911296, 2408601856, 2415292416, 2421982976, 2428673536, 2435364096, 2442054656, 2448745216, 2455435776, 2462126336, 2468816896, 2475507456, 2482198016, 2488888576, 2495579136, 2502269696, 2508960256, 2515650816,
2522341376, 2529031936, 2535722496, 2542413056, 2549103616, 2555794176, 2562484736, 2569175296, 2575865856, 2582556416, 2589246976, 2595937536, 2602628096, 2609318656, 2616009216, 2622699776, 2629390336, 2636080896, 2642771456, 2649462016, 2656152576, 2662843136, 2669533696, 2676224256, 2682914816, 2689605376,
2696295936, 2702986496, 2709677056, 2716367616, 2723058176, 2729748736, 2736439296, 2743129856, 2749820416, 2756510976, 2763201536, 2769892096, 2776582656, 2783273216, 2789963776, 2796654336, 2803344896, 2810035456, 2816726016, 2823416576, 2830107136, 2836797696, 2843488256, 2850178816, 2856869376, 2863559936,
2870250496, 2876941056, 2883631616, 2890322176, 2897012736, 2903703296, 2910393856, 2917084416, 2923774976, 2930465536, 2937156096, 2943846656, 2950537216, 2957227776, 2963918336, 2970608896, 2977299456, 2983990016, 2990680576, 2997371136, 3004061696, 3010752256, 3017442816, 3024133376, 3030823936, 3037514496,
3044205056, 3050895616, 3057586176, 3064276736, 3070967296, 3077657856, 3084348416, 3091038976, 3097729536, 3104420096, 3111110656, 3117801216, 3124491776, 3131182336, 3137872896, 3144563456, 3151254016, 3157944576, 3164635136, 3171325696, 3178016256, 3184706816, 3191397376, 3198087936, 3204778496, 3211469056,
3218159616, 3224850176, 3231540736, 3238231296, 3244921856, 3251612416, 3258302976, 3264993536, 3271684096, 3278374656, 3285065216, 3291755776, 3298446336, 3305136896, 3311827456, 3318518016, 3325208576, 3331899136, 3338589696, 3345280256, 3351970816, 3358661376, 3365351936, 3372042496, 3378733056, 3385423616,
3392114176, 3398804736, 3405495296, 3412185856, 3418876416, 3425566976, 3432257536, 3438948096, 3445638656, 3452329216, 3459019776, 3465710336, 3472400896, 3479091456, 3485782016, 3492472576, 3499163136, 3505853696, 3512544256, 3519234816, 3525925376, 3532615936, 3539306496, 3545997056, 3552687616, 3559378176,
3566068736, 3572759296, 3579449856, 3586140416, 3592830976, 3599521536, 3606212096, 3612902656, 3619593216, 3626283776, 3632974336, 3639664896, 3646355456, 3653046016, 3659736576, 3666427136, 3673117696, 3679808256, 3686498816, 3693189376, 3699879936, 3706570496, 3713261056, 3719951616, 3726642176, 3733332736,
3740023296, 3746713856, 3753404416, 3760094976, 3766785536, 3773476096, 3780166656, 3786857216, 3793547776, 3800238336, 3806928896, 3813619456, 3820310016, 3827000576, 3833691136, 3840381696, 3847072256, 3853762816, 3860453376, 3867143936, 3873834496, 3880525056, 3887215616, 3893906176, 3900596736, 3907287296,
3913977856, 3920668416, 3927358976, 3934049536, 3940740096, 3947430656, 3954121216, 3960811776, 3967502336, 3974192896, 3980883456, 3987574016, 3994264576,
obsd69b# # ----------------------------------------------------------------
obsd69b# mount /dev/sd4a /arc-3x1TB-ssd860
obsd69b# df -h | grep arc
/dev/sd4a 1.8T 8.0K 1.8T 0% /arc-3x1TB-ssd860



obsd69b# dd if=/dev/urandom of=/arc-3x1TB-ssd860/1GB-urandom.bin bs=1M count=1024

# OpenBSD 6.9beta is crashing and the access to the ddb{4}> prompt is not possible to run trace, ps or sh commands


bioctl -c 5 -C force -l sd1a,sd2a,sd3a softraid0


obsd69b# bioctl -c 5 -C force -l sd1a,sd2a,sd3a softraid0
softraid0: RAID 5 volume attached as sd4
obsd69b# fsck /dev/rsd4a
** /dev/rsd4a
** Last Mounted on /arc-3x1TB-ssd860
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [Fyn?] y

SUMMARY INFORMATION BAD
SALVAGE? [Fyn?] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [Fyn?] y

2 files, 1 used, 248084845 free (13 frags, 31010604 blocks, 0.0% fragmentation)

MARK FILE SYSTEM CLEAN? [Fyn?] y


***** FILE SYSTEM WAS MODIFIED *****



# OpenBSD 6.9beta RAID5 configuration with three 1TB "Samsung SSD PRO 860" drives


sysctl hw.disknames

dd if=/dev/zero of=/dev/rsd1c bs=1m count=1024
dd if=/dev/zero of=/dev/rsd2c bs=1m count=1024
dd if=/dev/zero of=/dev/rsd3c bs=1m count=1024

obsd69b# dd if=/dev/zero of=/dev/rsd1c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.301 secs (325251532 bytes/sec)
obsd69b# dd if=/dev/zero of=/dev/rsd2c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.355 secs (319969662 bytes/sec)
obsd69b# dd if=/dev/zero of=/dev/rsd3c bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.318 secs (323554123 bytes/sec)
# ---

obsd69b# bioctl sd1
sd1: <ATA, Samsung SSD 860, RVM0>, serial S42NNF0N110543P
obsd69b# bioctl sd2
sd2: <ATA, Samsung SSD 860, RVM0>, serial S42NNF0MA01951H
obsd69b# bioctl sd3
sd3: <ATA, Samsung SSD 860, RVM0>, serial S42NNF0M603477P

disklabel -E sd1
disklabel -E sd2
disklabel -E sd3

newfs sd1a
newfs sd2a
newfs sd3a

mkdir /ssd1T-sd1a
mkdir /ssd1T-sd2a
mkdir /ssd1T-sd3a

mount /dev/sd1a /ssd1T-sd1a
mount /dev/sd2a /ssd1T-sd2a
mount /dev/sd3a /ssd1T-sd3a

obsd69b# df -h | grep ssd
/dev/sd1a 946G 8.0K 899G 0% /ssd1T-sd1a
/dev/sd2a 946G 8.0K 899G 0% /ssd1T-sd2a
/dev/sd3a 946G 8.0K 899G 0% /ssd1T-sd3a

dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
dd if=/dev/urandom of=/ssd1T-sd2a/1GB-urandom.bin bs=1M count=1024
dd if=/dev/urandom of=/ssd1T-sd3a/1GB-urandom.bin bs=1M count=1024



ahci2: NCQ errored slot 5 is idle (3ff8001f active)
ahci2: NCQ errored slot 23 is idle (3e0ffe00 active)
ahci2: NCQ errored slot 27 is idle (41dffe00 active)
ahci2: NCQ errored slot 14 is idle (3c6601fe active)
ahci2: NCQ errored slot 27 is idle (317f003f active)
ahci2: NCQ errored slot 19 is idle (03f01ff8 active)
ahci2: NCQ errored slot 19 is idle (31e03cdb active)
ahci2: NCQ errored slot 19 is idle (31e03cde active)
ahci2: NCQ errored slot 20 is idle (31e033e7 active)
ahci2: NCQ errored slot 3 is idle (4c738fc1 active)
ahci2: NCQ errored slot 13 is idle (339e03ae active)
ahci2: NCQ errored slot 17 is idle (33e00df3 active)
ahci2: NCQ errored slot 0 is idle (1f0f2de0 active)

obsd69b# ls -l /ssd1T-sd*
/ssd1T-sd1a:
total 2097536
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:11 1GB-urandom.bin

/ssd1T-sd2a:
total 2097536
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:11 1GB-urandom.bin

/ssd1T-sd3a:
total 2097536
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:11 1GB-urandom.bin

# ---
obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
dd: /ssd1T-sd1a/1GB-urandom.bin: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.014 secs (0 bytes/sec)
obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 5.156 secs (208228191 bytes/sec)

# ---

ahci2: NCQ errored slot 3 is idle (04000000 active)
ahci2: NCQ errored slot 13 is idle (7c0f01eb active)
ahci2: NCQ errored slot 26 is idle (03fe1e07 active)
ahci2: NCQ errored slot 30 is idle (03e1e38f active)
ahci2: NCQ errored slot 28 is idle (03e1fc71 active)
ahci2: NCQ errored slot 30 is idle (03fc9f81 active)
ahci2: NCQ errored slot 9 is idle (0f0ee03f active)
ahci2: NCQ errored slot 16 is idle (70f400ff active)
ahci2: NCQ errored slot 28 is idle (0f3c407f active)
ahci2: NCQ errored slot 13 is idle (70dc41fc active)
ahci2: NCQ errored slot 17 is idle (0f3c1fe0 active)
ahci2: NCQ errored slot 30 is idle (0f7c181f active)


# ------------------------------------------------------------------------------
obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/10GB-urandom.bin bs=10M count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 160.129 secs (67054710 bytes/sec)

obsd69b# dd if=/dev/urandom of=/ssd1T-sd2a/10GB-urandom.bin bs=10M count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 158.783 secs (67623059 bytes/sec

obsd69b# dd if=/dev/urandom of=/ssd1T-sd3a/10GB-urandom.bin bs=10M count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 160.085 secs (67072961 bytes/sec

# ---

ahci2: NCQ errored slot 25 is idle (000000ff active)
ahci2: NCQ errored slot 21 is idle (00007fff active)
ahci2: NCQ errored slot 6 is idle (00000001 active)
ahci2: NCQ errored slot 21 is idle (000003ff active)
ahci2: NCQ errored slot 1 is idle (03800000 active)
ahci2: NCQ errored slot 27 is idle (01ffffff active)
ahci2: NCQ errored slot 23 is idle (00000030 active)
ahci2: NCQ errored slot 25 is idle (00000fff active)

# ---

obsd69b# ls -ltr /ssd1T-sd* | grep -v total
/ssd1T-sd2a:
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:16 1GB-urandom.bin
-rw-r--r-- 1 root wheel 10737418240 Mar 2 16:22 10GB-urandom.bin

/ssd1T-sd1a:
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:15 1GB-urandom.bin
-rw-r--r-- 1 root wheel 10737418240 Mar 2 16:22 10GB-urandom.bin

/ssd1T-sd3a:
-rw-r--r-- 1 root wheel 1073741824 Mar 2 16:16 1GB-urandom.bin
-rw-r--r-- 1 root wheel 10737418240 Mar 2 16:22 10GB-urandom.bin



On 02.03.21 10:39, Stuart Henderson wrote:
> On 2021/03/02 00:09, Mark Schneider wrote:
>> Hi,
>>
>> Thank you for your feeeback.
>>
>> Also OpenBSD 6.9beta snapshot is crashing when I setup RAID5 with three
>> "Samsung PRO 860 1TB" SSDs.
>> OpenBSD obsd69b.it-infra.org 6.9 GENERIC.MP#368 amd64
>>
>> obsd69b# dmesg | grep  -i bios
>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
>> bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
>> bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
>> acpi0 at bios0: ACPI 5.0
> Can you isolate softraid from the equation? Are the drives reliable with
> this hardware configuration when not using softraid? I guess it would
> need testing with simultaneous writes to the 3 drives to give a closer
> match to the situation with softraid.

Thanks a lot for all hints Stuart.

The isolated 1TB SSD Samsung PRO 860 drives have some AHCI errors
(OpenBSD_6.9beta-RAID5-3x1TB-SSD-isolated.txt in the attachment).


Writing to an "isolated" drive does not crash OpenBSD even there are
AHCI errors and sometimes an I/O error from dd (see directly below).

# ---
obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
dd: /ssd1T-sd1a/1GB-urandom.bin: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.014 secs (0 bytes/sec)

obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 5.156 secs (208228191 bytes/sec)

# ---

ahci2: NCQ errored slot 3 is idle (04000000 active)
ahci2: NCQ errored slot 13 is idle (7c0f01eb active)
ahci2: NCQ errored slot 26 is idle (03fe1e07 active)
ahci2: NCQ errored slot 30 is idle (03e1e38f active)
ahci2: NCQ errored slot 28 is idle (03e1fc71 active)
ahci2: NCQ errored slot 30 is idle (03fc9f81 active)
ahci2: NCQ errored slot 9 is idle (0f0ee03f active)
ahci2: NCQ errored slot 16 is idle (70f400ff active)
ahci2: NCQ errored slot 28 is idle (0f3c407f active)
ahci2: NCQ errored slot 13 is idle (70dc41fc active)
ahci2: NCQ errored slot 17 is idle (0f3c1fe0 active)
ahci2: NCQ errored slot 30 is idle (0f7c181f active)


Writing to all "isolated" drives simultanously does not crash OpenBSD
even there are AHCI errors

#
------------------------------------------------------------------------------
obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/10GB-urandom.bin bs=10M
count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 160.129 secs (67054710 bytes/sec)

obsd69b# dd if=/dev/urandom of=/ssd1T-sd2a/10GB-urandom.bin bs=10M
count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 158.783 secs (67623059 bytes/sec

obsd69b# dd if=/dev/urandom of=/ssd1T-sd3a/10GB-urandom.bin bs=10M
count=1024
1024+0 records in
1024+0 records out
10737418240 bytes transferred in 160.085 secs (67072961 bytes/sec

# ---

ahci2: NCQ errored slot 25 is idle (000000ff active)
ahci2: NCQ errored slot 21 is idle (00007fff active)
ahci2: NCQ errored slot 6 is idle (00000001 active)
ahci2: NCQ errored slot 21 is idle (000003ff active)
ahci2: NCQ errored slot 1 is idle (03800000 active)
ahci2: NCQ errored slot 27 is idle (01ffffff active)
ahci2: NCQ errored slot 23 is idle (00000030 active)
ahci2: NCQ errored slot 25 is idle (00000fff active)


# OpenBSD 6.9beta is crashing after a dd command writing to the RAID5
softraid volume (sd4a) and the access to the ddb{4}> prompt is not
possible to run trace, ps or sh commands (the root console is dead).


> "trace" and "sh reg" from ddb would give more clues.

I am not able to run the commands above as the root ddb{4} console is
dead (I can see only the last error message but I am not able to type in
using the keyboard)


I will connect those Samsung PRO 860 1TB SSDs to a Xeon based system
(another SATA-controller) and check there for AHCI errors.

Maybe it is worth to mention, that the original RAID tests on Debian
buster with six of 512GB Samsung PRO 860 (the same drives andf RAID6 set
with mdadm) worked without crashing the OS.


Kind regards

Mark


>>>> bs=10M count=1024
>>>>
>>>> # Error messages
>>>>
>>>> uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e
>>>> kernel: page fault trap, code=0
>>>> Stopped at      sr_validate_io+0x44:    cmpl     $0,0x40(%r9)
>>>> ddb{2}>
> $ objdump -dlr softraid.o | less
> ...skipping...
> 0000000000009cc0 <sr_validate_io>:
> sr_validate_io():
> /usr/src/sys/dev/softraid.c:4569
> 9cc0: 4c 8b 1d 00 00 00 00 mov 0(%rip),%r11 # 9cc7 <sr_validate_io+0x7>
> 9cc3: R_X86_64_PC32 __retguard_3962+0xfffffffffffffffc
> 9cc7: 4c 33 1c 24 xor (%rsp),%r11
> 9ccb: 55 push %rbp
> 9ccc: 48 89 e5 mov %rsp,%rbp
> 9ccf: 57 push %rdi
> 9cd0: 56 push %rsi
> 9cd1: 52 push %rdx
> 9cd2: 57 push %rdi
> 9cd3: 41 53 push %r11
> 9cd5: 50 push %rax
> /usr/src/sys/dev/softraid.c:4570
> 9cd6: 4c 8b 47 08 mov 0x8(%rdi),%r8
> /usr/src/sys/dev/softraid.c:4577
> 9cda: 49 8b 88 70 09 00 00 mov 0x970(%r8),%rcx
> 9ce1: 83 b9 94 00 00 00 00 cmpl $0x0,0x94(%rcx)
> 9ce8: 0f 84 a2 01 00 00 je 9e90 <sr_validate_io+0x1d0>
> 9cee: b8 01 00 00 00 mov $0x1,%eax
> /usr/src/sys/dev/softraid.c:4580
> 9cf3: 41 83 b8 20 0a 00 00 cmpl $0x1,0xa20(%r8)
> 9cfa: 01
> 9cfb: 0f 84 69 01 00 00 je 9e6a <sr_validate_io+0x1aa>
> 9d01: 4c 8b 0f mov (%rdi),%r9
> /usr/src/sys/dev/softraid.c:4586
> 9d04: 41 83 79 40 00 cmpl $0x0,0x40(%r9)
> 9d09: 74 47 je 9d52 <sr_validate_io+0x92>
> /usr/src/sys/dev/softraid.c:4592
>
> putting sr_validate_io+0x44 at the xs->datalen dereference,
>
> 4580 if (sd->sd_vol_status == BIOC_SVOFFLINE) {
> 4581 DNPRINTF(SR_D_DIS, "%s: %s device offline\n",
> 4582 DEVNAME(sd->sd_sc), func);
> 4583 goto bad;
> 4584 }
> 4585
> 4586 if (xs->datalen == 0) {
> 4587 printf("%s: %s: illegal block count for %s\n",
> 4588 DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname) ;
> 4589 goto bad;
> 4590 }
>
> ...so null/invalid xs?
>
> "trace" and "sh reg" from ddb would give more clues.
>

No comments:

Post a Comment