Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】

From: Alex Williamson <alex.williamson@redhat.com>
To: Bob Chen <a175818323@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel@redhat.com>, 陈博 <chenbo02@meituan.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】
Date: Tue, 29 Aug 2017 08:13:05 -0600	[thread overview]
Message-ID: <20170829081305.7a2685c6@w520.home> (raw)
In-Reply-To: <CAMxP3BSNPjRZGOULZ0zKBq8v5Rqv9f5xasx+BD57c=oAS5tSWg@mail.gmail.com>

On Tue, 29 Aug 2017 18:41:44 +0800
Bob Chen <a175818323@gmail.com> wrote:

> The topology is already having all GPUs directly attached to root bus 0. In
> this situation you can't see the LnkSta attribute in any capabilities.

Right, this is why I suggested viewing the physical device lspci info
from the host.  I haven't seen the suck link issue with devices on the
root bus, but it may be worth double checking.  Thanks,

Alex

> The other way of using emulated switch would somehow show this attribute,
> at 8 GT/s, although the real bandwidth is low as usual.
> 
> 2017-08-23 2:06 GMT+08:00 Michael S. Tsirkin <mst@redhat.com>:
> 
> > On Tue, Aug 22, 2017 at 10:56:59AM -0600, Alex Williamson wrote:  
> > > On Tue, 22 Aug 2017 15:04:55 +0800
> > > Bob Chen <a175818323@gmail.com> wrote:
> > >  
> > > > Hi,
> > > >
> > > > I got a spec from Nvidia which illustrates how to enable GPU p2p in
> > > > virtualization environment. (See attached)  
> > >
> > > Neat, looks like we should implement a new QEMU vfio-pci option,
> > > something like nvidia-gpudirect-p2p-id=.  I don't think I'd want to
> > > code the policy of where to enable it into QEMU or the kernel, so we'd
> > > push it up to management layers or users to decide.
> > >  
> > > > The key is to append the legacy pci capabilities list when setting up  
> > the  
> > > > hypervisor, with a Nvidia customized capability config.
> > > >
> > > > I added some hack in hw/vfio/pci.c and managed to implement that.
> > > >
> > > > Then I found the GPU was able to recognize its peer, and the latency  
> > has  
> > > > dropped. ✅
> > > >
> > > > However the bandwidth didn't improve, but decreased instead. ❌
> > > >
> > > > Any suggestions?  
> > >
> > > What's the VM topology?  I've found that in a Q35 configuration with
> > > GPUs downstream of an emulated root port, the NVIDIA driver in the
> > > guest will downshift the physical link rate to 2.5GT/s and never
> > > increase it back to 8GT/s.  I believe this is because the virtual
> > > downstream port only advertises Gen1 link speeds.  
> >
> >
> > Fixing that would be nice, and it's great that you now actually have a
> > reproducer that can be used to test it properly.
> >
> > Exposing higher link speeds is a bit of work since there are now all
> > kind of corner cases to cover as guests may play with link speeds and we
> > must pretend we change it accordingly.  An especially interesting
> > question is what to do with the assigned device when guest tries to play
> > with port link speed. It's kind of similar to AER in that respect.
> >
> > I guess we can just ignore it for starters.
> >  
> > >  If the GPUs are on
> > > the root complex (ie. pcie.0) the physical link will run at 2.5GT/s
> > > when the GPU is idle and upshift to 8GT/s under load.  This also
> > > happens if the GPU is exposed in a conventional PCI topology to the
> > > VM.  Another interesting data point is that an older Kepler GRID card
> > > does not have this issue, dynamically shifting the link speed under
> > > load regardless of the VM PCI/e topology, while a new M60 using the
> > > same driver experiences this problem.  I've filed a bug with NVIDIA as
> > > this seems to be a regression, but it appears (untested) that the
> > > hypervisor should take the approach of exposing full, up-to-date PCIe
> > > link capabilities and report a link status matching the downstream
> > > devices.  
> >
> >  
> > > I'd suggest during your testing, watch lspci info for the GPU from the
> > > host, noting the behavior of LnkSta (Link Status) to check if the
> > > devices gets stuck at 2.5GT/s in your VM configuration and adjust the
> > > topology until it works, likely placing the GPUs on pcie.0 for a Q35
> > > based machine.  Thanks,
> > >
> > > Alex  
> >