All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	David Gibson <david@gibson.dropbear.id.au>,
	Piotr Jaroszynski <pjaroszynski@nvidia.com>,
	Jose Ricardo Ziviani <joserz@linux.ibm.com>
Subject: Re: [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough
Date: Fri, 8 Feb 2019 13:29:37 +1100	[thread overview]
Message-ID: <295fa9ca-29c1-33e6-5168-8991bc0ef7b1@ozlabs.ru> (raw)
In-Reply-To: <20190207081830.4dcbb822@x1.home>



On 08/02/2019 02:18, Alex Williamson wrote:
> On Thu, 7 Feb 2019 15:43:18 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 07/02/2019 04:22, Daniel Henrique Barboza wrote:
>>> Based on this series, I've sent a Libvirt patch to allow a QEMU process
>>> to inherit IPC_LOCK when using VFIO passthrough with the Tesla V100
>>> GPU:
>>>
>>> https://www.redhat.com/archives/libvir-list/2019-February/msg00219.html
>>>
>>>
>>> In that thread, Alex raised concerns about allowing QEMU to freely lock
>>> all the memory it wants. Is this an issue to be considered in the review
>>> of this series here?
>>>
>>> Reading the patches, specially patch 3/3, it seems to me that QEMU is
>>> going to lock the KVM memory to populate the NUMA node with memory
>>> of the GPU itself, so at first there is no risk of not taking over the
>>> host RAM.
>>> Am I missing something?  
>>
>>
>> The GPU memory belongs to the device and not visible to the host as
>> memory blocks and not covered by page structs, for the host it is more
>> like MMIO which is passed through to the guest without that locked
>> accounting, I'd expect libvirt to keep working as usual except that:
>>
>> when libvirt calculates the amount of memory needed for TCE tables
>> (which is guestRAM/64k*8), now it needs to use the end of the last GPU
>> RAM window as a guest RAM size. For example, in QEMU HMP "info mtree -f":
>>
>> FlatView #2
>>  AS "memory", root: system
>>  AS "cpu-memory-0", root: system
>>  Root memory region: system
>>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
>>   0000010000000000-0000011fffffffff (prio 0, ram): nvlink2-mr
>>
>> So previously the DMA window would cover 0x7fffffff+1, now it has to
>> cover 0x11fffffffff+1.
> 
> This looks like a chicken and egg problem, you're saying libvirt needs
> to query mtree to understand the extent of the GPU layout, but we need
> to specify the locked memory limits in order for QEMU to start?  Is
> libvirt supposed to start the VM with unlimited locked memory and fix
> it at some indeterminate point in the future?  Run a dummy VM with
> unlimited locked memory in order to determine the limits for the real
> VM?  Neither of these sound practical.  Thanks,


QEMU maps GPU RAM at known locations (which only depends on the vPHB's
index or can be set explicitely) and libvirt knows how many GPUs are
passed so it is quite easy to calculate the required amount of memory.

Here is the window start calculation:
https://github.com/aik/qemu/commit/7073cad3ae7708d657e01672bcf53092808b54fb#diff-662409c2a5a150fe231d07ea8384b920R3812

We do not exactly know the GPU RAM window size until QEMU reads it from
VFIO/nvlink2 but we know that all existing hardware has a window of
128GB (the adapters I have access to only have 16/32GB on board).


-- 
Alexey

  reply	other threads:[~2019-02-08  2:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-17  2:51 [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2019-01-17  2:51 ` [Qemu-devel] [PATCH qemu 1/3] vfio/spapr: Fix indirect levels calculation Alexey Kardashevskiy
2019-02-05  5:54   ` David Gibson
2019-01-17  2:51 ` [Qemu-devel] [PATCH qemu 2/3] vfio: Make vfio_get_region_info_cap public Alexey Kardashevskiy
2019-01-17  2:51 ` [Qemu-devel] [PATCH qemu 3/3] spapr: Support NVIDIA V100 GPU with NVLink2 Alexey Kardashevskiy
2019-02-03 23:59 ` [Qemu-devel] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2019-02-06 17:22 ` Daniel Henrique Barboza
2019-02-07  4:43   ` Alexey Kardashevskiy
2019-02-07 15:18     ` Alex Williamson
2019-02-08  2:29       ` Alexey Kardashevskiy [this message]
2019-02-08  3:26         ` Alex Williamson
2019-02-08  5:28           ` David Gibson
2019-02-08 15:52             ` Alex Williamson
2019-02-08 16:25               ` Daniel Henrique Barboza
2019-02-11  3:49             ` Alexey Kardashevskiy
2019-02-11  6:07               ` Alex Williamson
2019-02-11  7:46                 ` Alexey Kardashevskiy
2019-02-14  5:02                   ` David Gibson
2019-02-14  4:59               ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=295fa9ca-29c1-33e6-5168-8991bc0ef7b1@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=joserz@linux.ibm.com \
    --cc=pjaroszynski@nvidia.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.