Wednesday, August 01, 2018

Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

On Wed, Aug 01, 2018 at 01:07:33PM -0700, Mike Larkin wrote:
> On Wed, Aug 01, 2018 at 12:14:59PM -0400, Bryan Steele wrote:
> > On Wed, Aug 01, 2018 at 11:27:26AM -0400, Bryan Steele wrote:
> > > On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> > > > After installing the 014_amdlfence patch released yesterday for 6.3, my
> > > > OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> > > > 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> > > > I suppose this would also happen on vmm(4) and bhyve, however I don't have
> > > > any such AMD hosts available for testing.
> > >
> > > Hi Elmer,
> > >
> > > This was tested in vmm(4), which does work, unfortunately there was not
> > > extensive testing by in other virtualization software. The MSR that is
> > > being set here is only mentioned in AMDs whitepaper and I had no reason
> > > to believe any special consideration was needed for guest VMs on AMD
> > > processors.
> > >
> > > > It occurs both using libvirt's "EPYC" CPU model and using "host-passthrough"
> > > > (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> > > >
> > > > I guess not many people are running OpenBSD as a VM, and even less on AMD
> > > > hardware. But still, a syspatch leaving the system unable to boot is
> > > > probably not a good thing. :)
> > > >
> > >
> > > Even so, I would like to apologize. This situation is unfortunate, and
> > > I'll try to work with other developers to find the best way forward.
> > > But, I regret I am only but an amateur magician.
> > >
> > > -Bryan.
> >
> > Actually, it looks like this is at least partially a KVM/QEMU bug. In
> > the meantime I guess the solution would be to do as you suggested and
> > set a different CPU model for now until Linux distros include a fix for
> > this.
> >
> > https://lkml.org/lkml/2018/2/21/1202
> >
> > Afterwards, on the OpenBSD side, it looks like one small change may be
> > required in addition..
> >
> > -Bryan.
> >
> > Index: sys/arch/amd64/amd64/identcpu.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> > retrieving revision 1.95.2.2
> > diff -u -p -u -r1.95.2.2 identcpu.c
> > --- sys/arch/amd64/amd64/identcpu.c 30 Jul 2018 14:45:05 -0000 1.95.2.2
> > +++ sys/arch/amd64/amd64/identcpu.c 1 Aug 2018 16:09:50 -0000
> > @@ -650,8 +650,10 @@ identifycpu(struct cpu_info *ci)
> >
> > msr = rdmsr(MSR_DE_CFG);
> > #define DE_CFG_SERIALIZE_LFENCE (1 << 1)
> > - msr |= DE_CFG_SERIALIZE_LFENCE;
> > - wrmsr(MSR_DE_CFG, msr);
> > + if ((msr & DE_CFG_SERIALIZE_LFENCE) == 0) {
> > + msr |= DE_CFG_SERIALIZE_LFENCE;
> > + wrmsr(MSR_DE_CFG, msr);
> > + }
> > }
> > }
> >
> >
>
> As expected, -current works properly on real AMD hardware. So my assumption
> about KVM doing something odd seems to be correct.
>
> The issue should be reported upstream to the KVM folks. But if the diff above
> also fixes the issue (I didn't test because I cannot reproduce it), ok mlarkin.
>
> -ml

I committed a fix for the potential MSR write #GP bug to -current:

https://marc.info/?l=openbsd-cvs&m=153315564121057&w=2

Unfortunately, for the MSR read issue on older KVMs, it would require
adding additional code to determine if we're running under KVM, there's
really not much at all we can do here..

I agree these seem like KVM bugs, as this does not happen on real
hardware, and at least also not in OpenBSD vmm(4).

-Bryan.

No comments:

Post a Comment