On Fri, Jul 24, 2020 at 10:58:45AM +0000, Alyssa Ross wrote:
> Alyssa Ross <hi@alyssa.is> writes:
> 
> > Stefan Hajnoczi <stefanha@redhat.com> writes:
> >
> >> On Tue, Jul 21, 2020 at 07:14:38AM +0000, Alyssa Ross wrote:
> >>> Hi -- I hope it's okay me reaching out like this.
> >>> 
> >>> I've been trying to test out the virtio-vhost-user implementation that's
> >>> been posted to this list a couple of times, but have been unable to get
> >>> it to boot a kernel following the steps listed either on
> >>> <https://wiki.qemu.org/Features/VirtioVhostUser> or
> >>> <https://ndragazis.github.io/dpdk-vhost-vvu-demo.html>.
> >>> 
> >>> Specifically, the kernel appears to be unable to write to the
> >>> virtio-vhost-user device's PCI registers.  I've included the full panic
> >>> output from the kernel at the end of this message.  The panic is
> >>> reproducible with two different kernels I tried (with different configs
> >>> and versions).  I tried both versions of the virtio-vhost-user I was
> >>> able to find[1][2], and both exhibited the same behaviour.
> >>> 
> >>> Is this a known issue?  Am I doing something wrong?
> >>
> >> Hi,
> >> Unfortunately I'm not sure what the issue is. This is an early
> >> virtio-pci register access before a driver for any specific device type
> >> (net, blk, vhost-user, etc) comes into play.
> >
> > Small update here: I tried on another computer, and it worked.  Made
> > sure that it was exactly the same QEMU binary, command line, and VM
> > disk/initrd/kernel, so I think I can fairly confidently say the panic
> > depends on what hardware QEMU is running on.  I set -cpu value to the
> > same on both as well (SandyBridge).
> >
> > I also discovered that it works on my primary computer (the one it
> > panicked on before) with KVM disabled.
> >
> > Note that I've only got so far as finding that it boots on the other
> > machine -- I haven't verified yet that it actually works.
> >
> > Bad host CPU:  Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
> > Good host CPU: AMD EPYC 7401P 24-Core Processor
> >
> > May I ask what host CPUs other people have tested this on?  Having more
> > data would probably be useful.  Could it be an AMD vs. Intel thing?
> 
> I think I've figured it out!
> 
> Sandy Bridge and Ivy Bridge hosts encounter this panic because the
> "additional resources" bar size is too big, at 1 << 36.  If I change
> this to 1 << 35, no more kernel panic.
> 
> Skylake and later are fine with 1 << 36.  In between Ivy Bridge and
> Skylake were Haswell and Broadwell, but I couldn't find anybody who was
> able to help me test on either of those, so I don't know what they do.
> 
> Perhaps related, the hosts that produce panics all seem to have a
> physical address size of 36 bits, while the hosts that work have larger
> physical address sizes, as reported by lscpu.

I have run it successfully on Broadwell but never tried 64GB or larger
shared memory resources.

Stefan