All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>,
	Richard Henderson <rth@twiddle.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PULL 23/25] mmap-alloc: fix hugetlbfs misaligned length in ppc64
Date: Mon, 4 Feb 2019 16:15:54 +0100	[thread overview]
Message-ID: <20190204161554.253d810b@bahia.lan> (raw)
In-Reply-To: <20190204142638.27021-24-mst@redhat.com>

Hi Michael,

These two patches (22 and 23) from Murilo already got merged with a pull request
from David earlier today.

Cheers,

--
Greg

On Mon, 4 Feb 2019 09:44:04 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> From: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> 
> The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc:
> fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.
> 
> However, we still need to consider the underlying huge page size
> during munmap() because it requires that both address and length be a
> multiple of the underlying huge page size for Huge TLB mappings.
> Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
> section of the munmap(2) manual:
> 
>   "For munmap(), addr and length must both be a multiple of the
>   underlying huge page size."
> 
> On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
> mappings because the mapped segment can be aligned with the underlying
> huge page size, not aligned with the native system page size, as
> returned by getpagesize().
> 
> This has the side effect of not releasing huge pages back to the pool
> after a hugetlbfs file-backed memory device is hot-unplugged.
> 
> This patch fixes the situation in qemu_ram_mmap() and
> qemu_ram_munmap() by considering the underlying page size on ppc64.
> 
> After this patch, memory hot-unplug releases huge pages back to the
> pool.
> 
> Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> ---
>  include/qemu/mmap-alloc.h |  2 +-
>  exec.c                    |  4 ++--
>  util/mmap-alloc.c         | 22 ++++++++++++++++------
>  util/oslib-posix.c        |  2 +-
>  4 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3f81..ef04f0ed5b 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -9,6 +9,6 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>  
>  void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
>  
> -void qemu_ram_munmap(void *ptr, size_t size);
> +void qemu_ram_munmap(int fd, void *ptr, size_t size);
>  
>  #endif
> diff --git a/exec.c b/exec.c
> index 25f3938a27..03dd673d36 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1873,7 +1873,7 @@ static void *file_ram_alloc(RAMBlock *block,
>      if (mem_prealloc) {
>          os_mem_prealloc(fd, area, memory, smp_cpus, errp);
>          if (errp && *errp) {
> -            qemu_ram_munmap(area, memory);
> +            qemu_ram_munmap(fd, area, memory);
>              return NULL;
>          }
>      }
> @@ -2394,7 +2394,7 @@ static void reclaim_ramblock(RAMBlock *block)
>          xen_invalidate_map_cache_entry(block->host);
>  #ifndef _WIN32
>      } else if (block->fd >= 0) {
> -        qemu_ram_munmap(block->host, block->max_length);
> +        qemu_ram_munmap(block->fd, block->host, block->max_length);
>          close(block->fd);
>  #endif
>      } else {
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index f71ea038c8..8565885420 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -80,6 +80,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>      int flags;
>      int guardfd;
>      size_t offset;
> +    size_t pagesize;
>      size_t total;
>      void *guardptr;
>      void *ptr;
> @@ -100,7 +101,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * anonymous memory is OK.
>       */
>      flags = MAP_PRIVATE;
> -    if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
> +    pagesize = qemu_fd_getpagesize(fd);
> +    if (fd == -1 || pagesize == getpagesize()) {
>          guardfd = -1;
>          flags |= MAP_ANONYMOUS;
>      } else {
> @@ -109,6 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>      }
>  #else
>      guardfd = -1;
> +    pagesize = getpagesize();
>      flags = MAP_PRIVATE | MAP_ANONYMOUS;
>  #endif
>  
> @@ -120,7 +123,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>  
>      assert(is_power_of_2(align));
>      /* Always align to host page size */
> -    assert(align >= getpagesize());
> +    assert(align >= pagesize);
>  
>      flags = MAP_FIXED;
>      flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> @@ -143,17 +146,24 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * a guard page guarding against potential buffer overflows.
>       */
>      total -= offset;
> -    if (total > size + getpagesize()) {
> -        munmap(ptr + size + getpagesize(), total - size - getpagesize());
> +    if (total > size + pagesize) {
> +        munmap(ptr + size + pagesize, total - size - pagesize);
>      }
>  
>      return ptr;
>  }
>  
> -void qemu_ram_munmap(void *ptr, size_t size)
> +void qemu_ram_munmap(int fd, void *ptr, size_t size)
>  {
> +    size_t pagesize;
> +
>      if (ptr) {
>          /* Unmap both the RAM block and the guard page */
> -        munmap(ptr, size + getpagesize());
> +#if defined(__powerpc64__) && defined(__linux__)
> +        pagesize = qemu_fd_getpagesize(fd);
> +#else
> +        pagesize = getpagesize();
> +#endif
> +        munmap(ptr, size + pagesize);
>      }
>  }
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 4ce1ba9ca4..37c5854b9c 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -226,7 +226,7 @@ void qemu_vfree(void *ptr)
>  void qemu_anon_ram_free(void *ptr, size_t size)
>  {
>      trace_qemu_anon_ram_free(ptr, size);
> -    qemu_ram_munmap(ptr, size);
> +    qemu_ram_munmap(-1, ptr, size);
>  }
>  
>  void qemu_set_block(int fd)

  reply	other threads:[~2019-02-04 15:19 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 14:43 [Qemu-devel] [PULL 00/25] pci, pc, virtio: fixes, cleanups, features Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 01/25] virtio: add checks for the size of the indirect table Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 02/25] contrib/libvhost-user: switch to uint64_t Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 03/25] scripts/update-linux-headers.sh: adjust for Linux 4.21-rc1 (or 5.0-rc1) Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 04/25] include: update Linux headers to 4.21-rc1/5.0-rc1 Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 10/25] hw: virtio-pci: drop DO_UPCAST Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 11/25] intel_iommu: fix operator in vtd_switch_address_space Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 12/25] intel_iommu: reset intr_enabled when system reset Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 13/25] pci/msi: export msi_is_masked() Michael S. Tsirkin
2019-02-04 14:43 ` [PULL 14/25] i386/kvm: ignore masked irqs when update msi routes Michael S. Tsirkin
2019-02-04 14:43   ` [Qemu-devel] " Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 15/25] contrib: compile vhost-user-blk tool by default Michael S. Tsirkin
2019-02-04 15:07   ` Daniel P. Berrangé
2019-02-04 15:19     ` Michael S. Tsirkin
2019-02-04 15:29       ` Daniel P. Berrangé
2019-02-05  1:48         ` Michael S. Tsirkin
2019-02-08  7:13           ` Stefan Hajnoczi
2019-02-04 14:43 ` [Qemu-devel] [PULL 16/25] contrib/vhost-user-blk: fix the compilation issue Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 17/25] vhost-user-blk: add discard/write zeroes features support Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 18/25] hw/virtio: Use CONFIG_VIRTIO_PCI switch instead of CONFIG_PCI Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 19/25] acpi: Make TPM 2.0 with TIS available as MSFT0101 Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 20/25] fw_cfg: fix the life cycle and the name of "qemu_extra_params_fw" Michael S. Tsirkin
2019-02-04 14:43 ` [Qemu-devel] [PULL 21/25] i386, acpi: cleanup build_facs by removing second unused argument Michael S. Tsirkin
2019-02-04 14:44 ` [Qemu-devel] [PULL 22/25] mmap-alloc: unfold qemu_ram_mmap() Michael S. Tsirkin
2019-02-04 14:44 ` [Qemu-devel] [PULL 23/25] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Michael S. Tsirkin
2019-02-04 15:15   ` Greg Kurz [this message]
2019-02-04 15:20     ` Michael S. Tsirkin
2019-02-04 14:44 ` [Qemu-devel] [PULL 24/25] r2d: fix build on mingw Michael S. Tsirkin
2019-02-04 14:44 ` [Qemu-devel] [PULL 25/25] contrib/libvhost-user: cleanup casts Michael S. Tsirkin
2019-02-04 17:59 ` [Qemu-devel] [PULL 00/25] pci, pc, virtio: fixes, cleanups, features Peter Maydell
2019-02-04 19:39   ` Michael S. Tsirkin
2019-02-05  1:50   ` Michael S. Tsirkin
2019-02-05  1:51   ` Michael S. Tsirkin
2019-02-05 12:41     ` Peter Maydell
2019-02-05 16:06       ` Michael S. Tsirkin
2019-02-05 17:38         ` Peter Maydell
2019-02-12  7:11         ` Peter Xu
2019-02-12 10:39           ` Philippe Mathieu-Daudé
2019-02-12 13:04             ` Michael S. Tsirkin
2019-02-12 13:15               ` Philippe Mathieu-Daudé
2019-02-12 13:24                 ` Michael S. Tsirkin
2019-02-12 13:53                   ` Philippe Mathieu-Daudé
2019-02-12 14:04                     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204161554.253d810b@bahia.lan \
    --to=groug@kaod.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=mst@redhat.com \
    --cc=muriloo@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.