All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ruben <rubenbryon@gmail.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [question]: BAR allocation failing
Date: Thu, 15 Jul 2021 01:43:17 +0300	[thread overview]
Message-ID: <CALdZjm6TsfsaQZRxJvr5YDh9VRn28vQjFY+JfZv-daU=gQu_Uw@mail.gmail.com> (raw)
In-Reply-To: <20210714160350.1bef2778.alex.williamson@redhat.com>

No luck so far with "-global q35-pcihost.pci-hole64-size=2048G"
("-global q35-host.pci-hole64-size=" gave an error "warning: global
q35-host.pci-hole64-size has invalid class name").
The result stays the same.

When we pass through the NVLink bridges we can have the (5 working)
GPUs talk at full P2P bandwidth and is described in the NVidia docs as
a valid option (ie. passing through all GPUs and NVlink bridges).
In production we have the bridges passed through to a service VM which
controls traffic, which is also described in their docs.

Op do 15 jul. 2021 om 01:03 schreef Alex Williamson
<alex.williamson@redhat.com>:
>
> On Thu, 15 Jul 2021 00:32:30 +0300
> Ruben <rubenbryon@gmail.com> wrote:
>
> > I am experiencing an issue with virtualizing a machine which contains
> > 8 NVidia A100 80GB cards.
> > As a bare metal host, the machine behaves as expected, the GPUs are
> > connected to the host with a PLX chip PEX88096, which connects 2 GPUs
> > to 16 lanes on the CPU (using the same NVidia HGX Delta baseboard).
> > When passing through all GPUs and NVLink bridges to a VM, a problem
> > arises in that the system can only initialize 4-5 of the 8 GPUs.
> >
> > The dmesg log shows failed attempts for assiging BAR space to the GPUs
> > that are not getting initialized.
> >
> > Things that were tried:
> > Q35-i440fx with and without UEFI
> > Qemu 5.x, Qemu 6.0
> > Host Ubuntu 20.04 host with Qemu/libvirt
> > Now running proxmox 7 on debian 11, host kernel 5.11.22-2, VM kernel 5.4.0-77
> > VM kernel parameters pci=nocrs pci=realloc=on/off
> >
> > ------------------------------------
> >
> > lspci -v:
> > 01:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
> >         Memory at db000000 (32-bit, non-prefetchable) [size=16M]
> >         Memory at 2000000000 (64-bit, prefetchable) [size=128G]
> >         Memory at 1000000000 (64-bit, prefetchable) [size=32M]
> >
> > 02:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
> >         Memory at dc000000 (32-bit, non-prefetchable) [size=16M]
> >         Memory at 4000000000 (64-bit, prefetchable) [size=128G]
> >         Memory at 6000000000 (64-bit, prefetchable) [size=32M]
> >
> > ...
> >
> > 0c:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
> >         Memory at e0000000 (32-bit, non-prefetchable) [size=16M]
> >         Memory at <ignored> (64-bit, prefetchable)
> >         Memory at <ignored> (64-bit, prefetchable)
> >
> > ...
> >
> ...
> >
> > ------------------------------------
> >
> > I have (blindly) messed with parameters like pref64-reserve for the
> > pcie-root-port but to be frank I have little clue what I'm doing so my
> > question would be suggestions on what I can try.
> > This server will not be running an 8 GPU VM in production but I have a
> > few days left to test before it goes to work. I was hoping to learn
> > how to overcome this issue in the future.
> > Please be aware that my knowledge regarding virtualization and the
> > Linux kernel does not reach far.
>
> Try playing with the QEMU "-global q35-host.pci-hole64-size=" option for
> the VM rather than pci=nocrs.  The default 64-bit MMIO hole for
> QEMU/q35 is only 32GB.  You might be looking at a value like 2048G to
> support this setup, but could maybe get away with 1024G if there's room
> in 32-bit space for the 3rd BAR.
>
> Note that assigning bridges usually doesn't make a lot of sense and
> NVLink is a proprietary black box, so we don't know how to virtualize
> it or what the guest drivers will do with it, you're on your own there.
> We generally recommend to use vGPUs for such cases so the host driver
> can handle all the NVLink aspects for GPU peer-to-peer.  Thanks,
>
> Alex
>

  reply	other threads:[~2021-07-14 22:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 21:32 [question]: BAR allocation failing Ruben
2021-07-14 22:03 ` Alex Williamson
2021-07-14 22:43   ` Ruben [this message]
2021-07-15 14:49     ` Bjorn Helgaas
2021-07-15 20:39       ` Ruben
2021-07-15 21:50         ` Keith Busch
2021-07-15 23:05         ` Bjorn Helgaas
2021-07-15 23:08           ` Alex Williamson
2021-07-16  6:14             ` Ruben

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALdZjm6TsfsaQZRxJvr5YDh9VRn28vQjFY+JfZv-daU=gQu_Uw@mail.gmail.com' \
    --to=rubenbryon@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.