All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb
@ 2022-01-11 12:39 David Hildenbrand
  2022-01-11 12:54 ` David Hildenbrand
  2022-01-17  5:19 ` Raphael Norwitz
  0 siblings, 2 replies; 3+ messages in thread
From: David Hildenbrand @ 2022-01-11 12:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, David Hildenbrand, Dr . David Alan Gilbert,
	Raphael Norwitz, Stefan Hajnoczi, Paolo Bonzini,
	Marc-André Lureau

For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
otherwise it's ignored. Older Linux versions that didn't support
reservation of huge pages ignored MAP_NORESERVE completely.

The first client to mmap a hugetlb fd without MAP_NORESERVE will
trigger reservation of huge pages for the whole mmapped range. There are
two cases to consider:

1) QEMU mapped RAM without MAP_NORESERVE

We're not dealing with a sparse mapping, huge pages for the whole range
have already been reserved by QEMU. An additional mmap() without
MAP_NORESERVE won't have any effect on the reservation.

2) QEMU mapped RAM with MAP_NORESERVE

We're delaing with a sparse mapping, no huge pages should be reserved.
Further mappings without MAP_NORESERVE should be avoided.

For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
simply set it. For 2), we'd be overriding QEMUs decision and trigger
reservation of huge pages, which might just fail if there are not
sufficient huge pages around. We must map with MAP_NORESERVE.

This change is required to support virtio-mem with hugetlb: a
virtio-mem device mapped into the guest physical memory corresponds to
a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
Whenever memory in that sparse region will be accessed by the VM, QEMU
populates huge pages for the affected range by preallocating memory
and handling any preallocation errors gracefully.

So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
supports Linux, there shouldn't be anything to take care of in regard of
other OS support.

Without this change, libvhost-user will fail mapping the region if there
are currently not enough huge pages to perform the reservation:
 fv_panic: libvhost-user: region mmap error: Cannot allocate memory

Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 787f4d2d4f..3b538930be 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -728,12 +728,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
          * accessing it before we userfault.
          */
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
-                         PROT_NONE, MAP_SHARED,
+                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
                          vmsg->fds[0], 0);
     } else {
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
-                         PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
-                         0);
+                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
+                         vmsg->fds[0], 0);
     }
 
     if (mmap_addr == MAP_FAILED) {
@@ -878,7 +878,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
          * accessing it before we userfault
          */
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
-                         PROT_NONE, MAP_SHARED,
+                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
                          vmsg->fds[i], 0);
 
         if (mmap_addr == MAP_FAILED) {
@@ -965,7 +965,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
          * mapped address has to be page aligned, and we use huge
          * pages.  */
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
-                         PROT_READ | PROT_WRITE, MAP_SHARED,
+                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
                          vmsg->fds[i], 0);
 
         if (mmap_addr == MAP_FAILED) {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb
  2022-01-11 12:39 [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb David Hildenbrand
@ 2022-01-11 12:54 ` David Hildenbrand
  2022-01-17  5:19 ` Raphael Norwitz
  1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2022-01-11 12:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Dr . David Alan Gilbert, Raphael Norwitz,
	Stefan Hajnoczi, Paolo Bonzini, Marc-André Lureau

On 11.01.22 13:39, David Hildenbrand wrote:
> For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
> otherwise it's ignored. Older Linux versions that didn't support
> reservation of huge pages ignored MAP_NORESERVE completely.
> 
> The first client to mmap a hugetlb fd without MAP_NORESERVE will
> trigger reservation of huge pages for the whole mmapped range. There are
> two cases to consider:
> 
> 1) QEMU mapped RAM without MAP_NORESERVE
> 
> We're not dealing with a sparse mapping, huge pages for the whole range
> have already been reserved by QEMU. An additional mmap() without
> MAP_NORESERVE won't have any effect on the reservation.
> 
> 2) QEMU mapped RAM with MAP_NORESERVE
> 
> We're delaing with a sparse mapping, no huge pages should be reserved.
> Further mappings without MAP_NORESERVE should be avoided.
> 
> For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
> simply set it. For 2), we'd be overriding QEMUs decision and trigger
> reservation of huge pages, which might just fail if there are not
> sufficient huge pages around. We must map with MAP_NORESERVE.
> 
> This change is required to support virtio-mem with hugetlb: a
> virtio-mem device mapped into the guest physical memory corresponds to
> a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
> Whenever memory in that sparse region will be accessed by the VM, QEMU
> populates huge pages for the affected range by preallocating memory
> and handling any preallocation errors gracefully.
> 
> So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
> supports Linux, there shouldn't be anything to take care of in regard of
> other OS support.
> 
> Without this change, libvhost-user will fail mapping the region if there
> are currently not enough huge pages to perform the reservation:
>  fv_panic: libvhost-user: region mmap error: Cannot allocate memory
> 
> Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Note: I was assuming rust vhost-user-backend would need similar care,
but vm-memory already does the right thing by supplying MAP_NORESERVE:

https://github.com/rust-vmm/vm-memory/blob/7a5e0696dc4170f590ac9b837e65cc4136b30e38/src/mmap_unix.rs#L264


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb
  2022-01-11 12:39 [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb David Hildenbrand
  2022-01-11 12:54 ` David Hildenbrand
@ 2022-01-17  5:19 ` Raphael Norwitz
  1 sibling, 0 replies; 3+ messages in thread
From: Raphael Norwitz @ 2022-01-17  5:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S. Tsirkin, qemu-devel, Raphael Norwitz, Stefan Hajnoczi,
	Paolo Bonzini, Marc-André Lureau, Dr . David Alan Gilbert

On Tue, Jan 11, 2022 at 01:39:39PM +0100, David Hildenbrand wrote:
> For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
> otherwise it's ignored. Older Linux versions that didn't support
> reservation of huge pages ignored MAP_NORESERVE completely.
> 
> The first client to mmap a hugetlb fd without MAP_NORESERVE will
> trigger reservation of huge pages for the whole mmapped range. There are
> two cases to consider:
> 
> 1) QEMU mapped RAM without MAP_NORESERVE
> 
> We're not dealing with a sparse mapping, huge pages for the whole range
> have already been reserved by QEMU. An additional mmap() without
> MAP_NORESERVE won't have any effect on the reservation.
> 
> 2) QEMU mapped RAM with MAP_NORESERVE
> 
> We're delaing with a sparse mapping, no huge pages should be reserved.
> Further mappings without MAP_NORESERVE should be avoided.
> 
> For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
> simply set it. For 2), we'd be overriding QEMUs decision and trigger
> reservation of huge pages, which might just fail if there are not
> sufficient huge pages around. We must map with MAP_NORESERVE.
> 
> This change is required to support virtio-mem with hugetlb: a
> virtio-mem device mapped into the guest physical memory corresponds to
> a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
> Whenever memory in that sparse region will be accessed by the VM, QEMU
> populates huge pages for the affected range by preallocating memory
> and handling any preallocation errors gracefully.
> 
> So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
> supports Linux, there shouldn't be anything to take care of in regard of
> other OS support.
> 
> Without this change, libvhost-user will fail mapping the region if there
> are currently not enough huge pages to perform the reservation:
>  fv_panic: libvhost-user: region mmap error: Cannot allocate memory
> 
> Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 787f4d2d4f..3b538930be 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -728,12 +728,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>           * accessing it before we userfault.
>           */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_NONE, MAP_SHARED,
> +                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[0], 0);
>      } else {
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
> -                         0);
> +                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> +                         vmsg->fds[0], 0);
>      }
>  
>      if (mmap_addr == MAP_FAILED) {
> @@ -878,7 +878,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
>           * accessing it before we userfault
>           */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_NONE, MAP_SHARED,
> +                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[i], 0);
>  
>          if (mmap_addr == MAP_FAILED) {
> @@ -965,7 +965,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>           * mapped address has to be page aligned, and we use huge
>           * pages.  */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_READ | PROT_WRITE, MAP_SHARED,
> +                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[i], 0);
>  
>          if (mmap_addr == MAP_FAILED) {

Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-17  5:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 12:39 [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb David Hildenbrand
2022-01-11 12:54 ` David Hildenbrand
2022-01-17  5:19 ` Raphael Norwitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.