Monday, November 16, 2020

Re: OpenBSD 6.8 (release) guest (qemu/kvm) on Linux 5.9 host (amd64) fails with protection fault trap

On Sun, Nov 15, 2020 at 10:24 AM Gabriel Garcia <gabriel@gagv.org.uk> wrote:

> I would like to run OpenBSD as stated on the subject - I have been able,
> however, to run it successfully with "-cpu Opteron_G2-v1", but I would
> rather use "-cpu host" instead. Also note that on an Intel host, OpenBSD
> appears to work successfully on the same Linux base.
>
> qemu invocation that yields a trap:
>
...

Lots of looking everywhere but the error going on here. Let's look at the
trap/ddb output:


> kernel: protection fault trap, code=0
> Stopped at amd64_errata_setmsr+0x4e: wrmsr
>
> Contents of CPU registers:
> ddb> show registers
> rdi 0x9c5a203a
> rsi 0xffffffff820ff920 errata+0xe0
> rbp 0xffffffff824c5740 end+0x2c5740
> rbx 0x18
> rdx 0
> rcx 0xc0011029
> rax 0x3
> r8 0xffffffff824c55a8 end+0x2c55a8
> r9 0
> r10 0xbdf7dabff85d847b
> r11 0x51e076fef1dcfa7b
> r12 0
> r13 0
> r14 0xffffffff820ff940 acpihid_ca
> r15 0xffffffff820ff920 errata+0xe0
> rip 0xffffffff81bc6ede amd64_errata_setmsr+0x4e
> cs 0x8
> rflags 0x10256 __ALIGN_SIZE+0xf256
> rsp 0xffffffff824c5730 end+0x2c5730
> ss 0x10
> amd64_errata_setmsr+0x4e: wrmsr


Oh hey, it says RIGHT THERE that a wrmsr instruction faulted. Which one?
Well, it's in the function amd64_errata_setmsr(). Furthermore, we just
have to remember that wrmsr takes the MSR to write in the %ecx register
(something the qemu people surely know) and so it's the 0xc0011029 MSR.
Let's grep for that in the amd64 kernel source:

: bleys; cd /usr/src/sys/arch/amd64/
: bleys; grep -rw 0xc0011029 *
include/specialreg.h:#define MSR_DE_CFG 0xc0011029 /* Decode
Configuration */
: bleys; grep -rwl MSR_DE_CFG *
amd64/identcpu.c
amd64/vmm.c
amd64/amd64errata.c
include/specialreg.h
: bleys; grep -rwl ^amd64_errata_setmsr *
amd64/amd64errata.c
: bleys; less +/MSR_DE_CFG amd64/amd64errata.c
<...>
/*
* 721: Processor May Incorrectly Update Stack Pointer
*/
{
721, 0, MSR_DE_CFG, amd64_errata_set9,
amd64_errata_setmsr, DE_CFG_721
},


Looks like qemu fails to behave like a real AMD CPU by failing to handle
the wrmsr() for that errata. Also the kernel you're running it on is
failing to apply the errata itself (because otherwise OpenBSD won't be
trying to flip the bit itself). Go shake an AMD errata document at the
qemu people and figure out why your host kernel isn't applying a documented
fix.

Paying attention to what the kernel tells you is a Good Thing. Honestly,
what you showed above, that it trapped on wrmsr with those registers should
have been enough for the qemu people to figure out what wasn't working.


Philip Guenther

No comments:

Post a Comment