All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Bob Chen <a175818323@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel@redhat.com>, 陈博 <chenbo02@meituan.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】
Date: Tue, 29 Aug 2017 08:13:05 -0600	[thread overview]
Message-ID: <20170829081305.7a2685c6@w520.home> (raw)
In-Reply-To: <CAMxP3BSNPjRZGOULZ0zKBq8v5Rqv9f5xasx+BD57c=oAS5tSWg@mail.gmail.com>

On Tue, 29 Aug 2017 18:41:44 +0800
Bob Chen <a175818323@gmail.com> wrote:

> The topology is already having all GPUs directly attached to root bus 0. In
> this situation you can't see the LnkSta attribute in any capabilities.

Right, this is why I suggested viewing the physical device lspci info
from the host.  I haven't seen the suck link issue with devices on the
root bus, but it may be worth double checking.  Thanks,

Alex
 
> The other way of using emulated switch would somehow show this attribute,
> at 8 GT/s, although the real bandwidth is low as usual.
> 
> 2017-08-23 2:06 GMT+08:00 Michael S. Tsirkin <mst@redhat.com>:
> 
> > On Tue, Aug 22, 2017 at 10:56:59AM -0600, Alex Williamson wrote:  
> > > On Tue, 22 Aug 2017 15:04:55 +0800
> > > Bob Chen <a175818323@gmail.com> wrote:
> > >  
> > > > Hi,
> > > >
> > > > I got a spec from Nvidia which illustrates how to enable GPU p2p in
> > > > virtualization environment. (See attached)  
> > >
> > > Neat, looks like we should implement a new QEMU vfio-pci option,
> > > something like nvidia-gpudirect-p2p-id=.  I don't think I'd want to
> > > code the policy of where to enable it into QEMU or the kernel, so we'd
> > > push it up to management layers or users to decide.
> > >  
> > > > The key is to append the legacy pci capabilities list when setting up  
> > the  
> > > > hypervisor, with a Nvidia customized capability config.
> > > >
> > > > I added some hack in hw/vfio/pci.c and managed to implement that.
> > > >
> > > > Then I found the GPU was able to recognize its peer, and the latency  
> > has  
> > > > dropped. ✅
> > > >
> > > > However the bandwidth didn't improve, but decreased instead. ❌
> > > >
> > > > Any suggestions?  
> > >
> > > What's the VM topology?  I've found that in a Q35 configuration with
> > > GPUs downstream of an emulated root port, the NVIDIA driver in the
> > > guest will downshift the physical link rate to 2.5GT/s and never
> > > increase it back to 8GT/s.  I believe this is because the virtual
> > > downstream port only advertises Gen1 link speeds.  
> >
> >
> > Fixing that would be nice, and it's great that you now actually have a
> > reproducer that can be used to test it properly.
> >
> > Exposing higher link speeds is a bit of work since there are now all
> > kind of corner cases to cover as guests may play with link speeds and we
> > must pretend we change it accordingly.  An especially interesting
> > question is what to do with the assigned device when guest tries to play
> > with port link speed. It's kind of similar to AER in that respect.
> >
> > I guess we can just ignore it for starters.
> >  
> > >  If the GPUs are on
> > > the root complex (ie. pcie.0) the physical link will run at 2.5GT/s
> > > when the GPU is idle and upshift to 8GT/s under load.  This also
> > > happens if the GPU is exposed in a conventional PCI topology to the
> > > VM.  Another interesting data point is that an older Kepler GRID card
> > > does not have this issue, dynamically shifting the link speed under
> > > load regardless of the VM PCI/e topology, while a new M60 using the
> > > same driver experiences this problem.  I've filed a bug with NVIDIA as
> > > this seems to be a regression, but it appears (untested) that the
> > > hypervisor should take the approach of exposing full, up-to-date PCIe
> > > link capabilities and report a link status matching the downstream
> > > devices.  
> >
> >  
> > > I'd suggest during your testing, watch lspci info for the GPU from the
> > > host, noting the behavior of LnkSta (Link Status) to check if the
> > > devices gets stuck at 2.5GT/s in your VM configuration and adjust the
> > > topology until it works, likely placing the GPUs on pcie.0 for a Q35
> > > based machine.  Thanks,
> > >
> > > Alex  
> >  

  reply	other threads:[~2017-08-29 14:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4E0AFA5F-44D6-4624-A99F-68A7FE52F397@meituan.com>
     [not found] ` <4b31a711-a52e-25d3-4a7c-1be8521097d9@redhat.com>
     [not found]   ` <F99BFE80-FC15-40A0-BB3E-1B53B6CF9B05@meituan.com>
2017-07-26  6:21     ` [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】 Marcel Apfelbaum
2017-07-26 15:29       ` Alex Williamson
2017-07-26 16:06         ` Michael S. Tsirkin
2017-07-26 17:32           ` Alex Williamson
2017-08-01  5:04             ` Bob Chen
2017-08-01  5:46               ` Alex Williamson
2017-08-01  9:35                 ` Bob Chen
2017-08-01 14:39                   ` Michael S. Tsirkin
2017-08-01 15:01                   ` Alex Williamson
2017-08-07 13:00                     ` Bob Chen
2017-08-07 15:52                       ` Alex Williamson
2017-08-08  1:44                         ` Bob Chen
2017-08-08  8:06                           ` Bob Chen
2017-08-08 16:53                           ` Alex Williamson
2017-08-08 20:07                         ` Michael S. Tsirkin
2017-08-22  7:04                           ` Bob Chen
2017-08-22 16:56                             ` Alex Williamson
2017-08-22 18:06                               ` Michael S. Tsirkin
2017-08-29 10:41                                 ` Bob Chen
2017-08-29 14:13                                   ` Alex Williamson [this message]
2017-08-30  9:41                                     ` Bob Chen
2017-08-30 16:43                                       ` Alex Williamson
2017-09-01  9:58                                         ` Bob Chen
2017-11-30  8:06                                           ` Bob Chen
2017-08-07 13:04                     ` Bob Chen
2017-08-07 16:00                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170829081305.7a2685c6@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=a175818323@gmail.com \
    --cc=chenbo02@meituan.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.