All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
@ 2015-11-30 10:51 Greg Kurz
  2015-11-30 10:53 ` Paolo Bonzini
  2015-11-30 13:06 ` Michael S. Tsirkin
  0 siblings, 2 replies; 15+ messages in thread
From: Greg Kurz @ 2015-11-30 10:51 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin; +Cc: qemu-devel

Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
it is no longer possible to back guest RAM with hugepages on ppc64 hosts:

mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)

This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
the same page size as other mappings already present in the same "slice" of
virtual address space (Cc'ing Ben for details). This is exactly what happens
when calling mmap() above: first one uses native host page size (64k) and
second one uses huge page size (16M).

To be sure we always have the same page size, let's use the same backend for
both calls to mmap(): this is enough to fix the ppc64 issue.

This has no effect on RAM based mappings.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---

This is a bug fix for 2.5

 util/mmap-alloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index c37acbe58ede..0ff221dd94f4 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
      * space, even if size is already aligned.
      */
     size_t total = size + align;
-    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+    void *ptr = mmap(0, total, PROT_NONE,
+                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
     size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
     void *ptr1;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 10:51 [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings Greg Kurz
@ 2015-11-30 10:53 ` Paolo Bonzini
  2015-11-30 13:12   ` Michael S. Tsirkin
  2015-11-30 13:06 ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2015-11-30 10:53 UTC (permalink / raw)
  To: Greg Kurz, Michael S. Tsirkin; +Cc: qemu-devel



On 30/11/2015 11:51, Greg Kurz wrote:
> Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> 
> mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> 
> This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> the same page size as other mappings already present in the same "slice" of
> virtual address space (Cc'ing Ben for details). This is exactly what happens
> when calling mmap() above: first one uses native host page size (64k) and
> second one uses huge page size (16M).
> 
> To be sure we always have the same page size, let's use the same backend for
> both calls to mmap(): this is enough to fix the ppc64 issue.
> 
> This has no effect on RAM based mappings.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> 
> This is a bug fix for 2.5
> 
>  util/mmap-alloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index c37acbe58ede..0ff221dd94f4 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * space, even if size is already aligned.
>       */
>      size_t total = size + align;
> -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +    void *ptr = mmap(0, total, PROT_NONE,
> +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
>      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>      void *ptr1;
>  
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 10:51 [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings Greg Kurz
  2015-11-30 10:53 ` Paolo Bonzini
@ 2015-11-30 13:06 ` Michael S. Tsirkin
  2015-11-30 13:46   ` Greg Kurz
  1 sibling, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-11-30 13:06 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 30, 2015 at 11:51:57AM +0100, Greg Kurz wrote:
> Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> 
> mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> 
> This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> the same page size as other mappings already present in the same "slice" of
> virtual address space (Cc'ing Ben for details).

I'd like some details please.
What do you mean when you say "same page size" and "slice"?

> This is exactly what happens
> when calling mmap() above: first one uses native host page size (64k) and
> second one uses huge page size (16M).
> 
> To be sure we always have the same page size, let's use the same backend for
> both calls to mmap(): this is enough to fix the ppc64 issue.
> 
> This has no effect on RAM based mappings.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> 
> This is a bug fix for 2.5
> 
>  util/mmap-alloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index c37acbe58ede..0ff221dd94f4 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>       * space, even if size is already aligned.
>       */
>      size_t total = size + align;
> -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +    void *ptr = mmap(0, total, PROT_NONE,
> +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
>      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>      void *ptr1;
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 10:53 ` Paolo Bonzini
@ 2015-11-30 13:12   ` Michael S. Tsirkin
  2015-12-01 10:42     ` Greg Kurz
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-11-30 13:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Greg Kurz

On Mon, Nov 30, 2015 at 11:53:41AM +0100, Paolo Bonzini wrote:
> 
> 
> On 30/11/2015 11:51, Greg Kurz wrote:
> > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > 
> > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > 
> > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > the same page size as other mappings already present in the same "slice" of
> > virtual address space (Cc'ing Ben for details). This is exactly what happens
> > when calling mmap() above: first one uses native host page size (64k) and
> > second one uses huge page size (16M).
> > 
> > To be sure we always have the same page size, let's use the same backend for
> > both calls to mmap(): this is enough to fix the ppc64 issue.
> > 
> > This has no effect on RAM based mappings.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > 
> > This is a bug fix for 2.5
> > 
> >  util/mmap-alloc.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index c37acbe58ede..0ff221dd94f4 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> >       * space, even if size is already aligned.
> >       */
> >      size_t total = size + align;
> > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > +    void *ptr = mmap(0, total, PROT_NONE,
> > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> >      void *ptr1;
> >  
> > 
> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

But why does this patch have any effect?
I'm worried that extra memory is still allocated
with this, even if it's not accessible.

If yes, we are better off disabling the protection for ppc.

-- 
MST

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 13:06 ` Michael S. Tsirkin
@ 2015-11-30 13:46   ` Greg Kurz
  2015-11-30 16:59     ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Greg Kurz @ 2015-11-30 13:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

On Mon, 30 Nov 2015 15:06:33 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Nov 30, 2015 at 11:51:57AM +0100, Greg Kurz wrote:
> > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > 
> > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > 
> > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > the same page size as other mappings already present in the same "slice" of
> > virtual address space (Cc'ing Ben for details).
> 
> I'd like some details please.
> What do you mean when you say "same page size" and "slice"?
> 

On ppc64, the address space is divided in 256MB-sized segments where all pages
have the same size. This is a hw limitation IIUC. I don't know if it can be
fixed and I'll let Ben comment on it.

Hugepage support is implemented using an abstraction of segments called
"slices". Here's a quote from the related commit changelog in the kernel
tree:

commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Tue May 8 16:27:27 2007 +1000

    [POWERPC] Introduce address space "slices"

...

    The main issues are:
    
     - To maintain/keep track of the page size per "segment" (as we can
    only have one page size per segment on powerpc, which are 256MB
    divisions of the address space).
    
     - To make sure special mappings stay within their allotted
    "segments" (including MAP_FIXED crap)
    
     - To make sure everybody else doesn't mmap/brk/grow_stack into a
    "segment" that is used for a special mapping
...

> > This is exactly what happens
> > when calling mmap() above: first one uses native host page size (64k) and
> > second one uses huge page size (16M).
> > 
> > To be sure we always have the same page size, let's use the same backend for
> > both calls to mmap(): this is enough to fix the ppc64 issue.
> > 
> > This has no effect on RAM based mappings.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > 
> > This is a bug fix for 2.5
> > 
> >  util/mmap-alloc.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index c37acbe58ede..0ff221dd94f4 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> >       * space, even if size is already aligned.
> >       */
> >      size_t total = size + align;
> > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > +    void *ptr = mmap(0, total, PROT_NONE,
> > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> >      void *ptr1;
> >  
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 13:46   ` Greg Kurz
@ 2015-11-30 16:59     ` Michael S. Tsirkin
  2015-12-01 10:37       ` Greg Kurz
  2015-12-01 10:53       ` Aneesh Kumar K.V
  0 siblings, 2 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-11-30 16:59 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> On Mon, 30 Nov 2015 15:06:33 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Nov 30, 2015 at 11:51:57AM +0100, Greg Kurz wrote:
> > > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > > 
> > > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > > 
> > > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > > the same page size as other mappings already present in the same "slice" of
> > > virtual address space (Cc'ing Ben for details).
> > 
> > I'd like some details please.
> > What do you mean when you say "same page size" and "slice"?
> > 
> 
> On ppc64, the address space is divided in 256MB-sized segments where all pages
> have the same size. This is a hw limitation IIUC. I don't know if it can be
> fixed and I'll let Ben comment on it.

But it's anonymous memory with PROT_NONE.  There should be no pages there:
just a chunk of virtual memory reserved.

> Hugepage support is implemented using an abstraction of segments called
> "slices". Here's a quote from the related commit changelog in the kernel
> tree:
> 
> commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd
> Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date:   Tue May 8 16:27:27 2007 +1000
> 
>     [POWERPC] Introduce address space "slices"
> 
> ...
> 
>     The main issues are:
>     
>      - To maintain/keep track of the page size per "segment" (as we can
>     only have one page size per segment on powerpc, which are 256MB
>     divisions of the address space).
>     
>      - To make sure special mappings stay within their allotted
>     "segments" (including MAP_FIXED crap)
>     
>      - To make sure everybody else doesn't mmap/brk/grow_stack into a
>     "segment" that is used for a special mapping
> ...
> 
> > > This is exactly what happens
> > > when calling mmap() above: first one uses native host page size (64k) and
> > > second one uses huge page size (16M).
> > > 
> > > To be sure we always have the same page size, let's use the same backend for
> > > both calls to mmap(): this is enough to fix the ppc64 issue.
> > > 
> > > This has no effect on RAM based mappings.
> > > 
> > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > ---
> > > 
> > > This is a bug fix for 2.5
> > > 
> > >  util/mmap-alloc.c |    3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index c37acbe58ede..0ff221dd94f4 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > >       * space, even if size is already aligned.
> > >       */
> > >      size_t total = size + align;
> > > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > +    void *ptr = mmap(0, total, PROT_NONE,
> > > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> > >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > >      void *ptr1;
> > >  
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 16:59     ` Michael S. Tsirkin
@ 2015-12-01 10:37       ` Greg Kurz
  2015-12-01 10:53       ` Aneesh Kumar K.V
  1 sibling, 0 replies; 15+ messages in thread
From: Greg Kurz @ 2015-12-01 10:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

On Mon, 30 Nov 2015 18:59:23 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> > On Mon, 30 Nov 2015 15:06:33 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Mon, Nov 30, 2015 at 11:51:57AM +0100, Greg Kurz wrote:
> > > > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > > > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > > > 
> > > > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > > > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > > > 
> > > > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > > > the same page size as other mappings already present in the same "slice" of
> > > > virtual address space (Cc'ing Ben for details).
> > > 
> > > I'd like some details please.
> > > What do you mean when you say "same page size" and "slice"?
> > > 
> > 
> > On ppc64, the address space is divided in 256MB-sized segments where all pages
> > have the same size. This is a hw limitation IIUC. I don't know if it can be
> > fixed and I'll let Ben comment on it.
> 
> But it's anonymous memory with PROT_NONE.  There should be no pages there:
> just a chunk of virtual memory reserved.
> 

This is orthogonal: the page size check happens when doing get_unmapped_area() where
we don't care for protection bits... On ppc64, it is about finding a "chunk" of virtual
memory with the same page size because hw requires it. In the case of MAP_FIXED, this
becomes an error because we already have a "chunk" with incompatible page size.

> > Hugepage support is implemented using an abstraction of segments called
> > "slices". Here's a quote from the related commit changelog in the kernel
> > tree:
> > 
> > commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd
> > Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Date:   Tue May 8 16:27:27 2007 +1000
> > 
> >     [POWERPC] Introduce address space "slices"
> > 
> > ...
> > 
> >     The main issues are:
> >     
> >      - To maintain/keep track of the page size per "segment" (as we can
> >     only have one page size per segment on powerpc, which are 256MB
> >     divisions of the address space).
> >     
> >      - To make sure special mappings stay within their allotted
> >     "segments" (including MAP_FIXED crap)
> >     
> >      - To make sure everybody else doesn't mmap/brk/grow_stack into a
> >     "segment" that is used for a special mapping
> > ...
> > 
> > > > This is exactly what happens
> > > > when calling mmap() above: first one uses native host page size (64k) and
> > > > second one uses huge page size (16M).
> > > > 
> > > > To be sure we always have the same page size, let's use the same backend for
> > > > both calls to mmap(): this is enough to fix the ppc64 issue.
> > > > 
> > > > This has no effect on RAM based mappings.
> > > > 
> > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > > ---
> > > > 
> > > > This is a bug fix for 2.5
> > > > 
> > > >  util/mmap-alloc.c |    3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > > index c37acbe58ede..0ff221dd94f4 100644
> > > > --- a/util/mmap-alloc.c
> > > > +++ b/util/mmap-alloc.c
> > > > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > > >       * space, even if size is already aligned.
> > > >       */
> > > >      size_t total = size + align;
> > > > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > > +    void *ptr = mmap(0, total, PROT_NONE,
> > > > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> > > >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > > >      void *ptr1;
> > > >  
> > > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 13:12   ` Michael S. Tsirkin
@ 2015-12-01 10:42     ` Greg Kurz
  2015-12-01 10:52       ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Greg Kurz @ 2015-12-01 10:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

On Mon, 30 Nov 2015 15:12:08 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Nov 30, 2015 at 11:53:41AM +0100, Paolo Bonzini wrote:
> > 
> > 
> > On 30/11/2015 11:51, Greg Kurz wrote:
> > > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > > 
> > > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > > 
> > > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > > the same page size as other mappings already present in the same "slice" of
> > > virtual address space (Cc'ing Ben for details). This is exactly what happens
> > > when calling mmap() above: first one uses native host page size (64k) and
> > > second one uses huge page size (16M).
> > > 
> > > To be sure we always have the same page size, let's use the same backend for
> > > both calls to mmap(): this is enough to fix the ppc64 issue.
> > > 
> > > This has no effect on RAM based mappings.
> > > 
> > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > ---
> > > 
> > > This is a bug fix for 2.5
> > > 
> > >  util/mmap-alloc.c |    3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > index c37acbe58ede..0ff221dd94f4 100644
> > > --- a/util/mmap-alloc.c
> > > +++ b/util/mmap-alloc.c
> > > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > >       * space, even if size is already aligned.
> > >       */
> > >      size_t total = size + align;
> > > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > +    void *ptr = mmap(0, total, PROT_NONE,
> > > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> > >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > >      void *ptr1;
> > >  
> > > 
> > 
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> But why does this patch have any effect?
> I'm worried that extra memory is still allocated
> with this, even if it's not accessible.
> 

And you are right because that is exactly what is happening
with hugetlbfs_file_mmap()->hugetlb_reserve_pages() :-\

> If yes, we are better off disabling the protection for ppc.
> 

Yes, this is the only alternative... I'll send a patch ASAP.

Thanks !

--
Greg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 10:42     ` Greg Kurz
@ 2015-12-01 10:52       ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-12-01 10:52 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Paolo Bonzini, qemu-devel

On Tue, Dec 01, 2015 at 11:42:15AM +0100, Greg Kurz wrote:
> On Mon, 30 Nov 2015 15:12:08 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Nov 30, 2015 at 11:53:41AM +0100, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 30/11/2015 11:51, Greg Kurz wrote:
> > > > Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM",
> > > > it is no longer possible to back guest RAM with hugepages on ppc64 hosts:
> > > > 
> > > > mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000
> > > > mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)
> > > > 
> > > > This is due to a limitation on ppc64 that requires MAP_FIXED mappings to have
> > > > the same page size as other mappings already present in the same "slice" of
> > > > virtual address space (Cc'ing Ben for details). This is exactly what happens
> > > > when calling mmap() above: first one uses native host page size (64k) and
> > > > second one uses huge page size (16M).
> > > > 
> > > > To be sure we always have the same page size, let's use the same backend for
> > > > both calls to mmap(): this is enough to fix the ppc64 issue.
> > > > 
> > > > This has no effect on RAM based mappings.
> > > > 
> > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > > ---
> > > > 
> > > > This is a bug fix for 2.5
> > > > 
> > > >  util/mmap-alloc.c |    3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > > > index c37acbe58ede..0ff221dd94f4 100644
> > > > --- a/util/mmap-alloc.c
> > > > +++ b/util/mmap-alloc.c
> > > > @@ -21,7 +21,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> > > >       * space, even if size is already aligned.
> > > >       */
> > > >      size_t total = size + align;
> > > > -    void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > > +    void *ptr = mmap(0, total, PROT_NONE,
> > > > +                     (fd == -1 ? MAP_ANONYMOUS : 0) | MAP_PRIVATE, fd, 0);
> > > >      size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> > > >      void *ptr1;
> > > >  
> > > > 
> > > 
> > > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > But why does this patch have any effect?
> > I'm worried that extra memory is still allocated
> > with this, even if it's not accessible.
> > 
> 
> And you are right because that is exactly what is happening
> with hugetlbfs_file_mmap()->hugetlb_reserve_pages() :-\

By the way, this also means we were wasting a bunch of
memory when trying to get aligned pages.

> > If yes, we are better off disabling the protection for ppc.
> > 
> 
> Yes, this is the only alternative... I'll send a patch ASAP.
> 
> Thanks !

Does MAP_HUGETLB does anything?
I would expect it to get a slice with the correct page size.
If not, this might be a reasonable thing to implement in kernel.

> --
> Greg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-11-30 16:59     ` Michael S. Tsirkin
  2015-12-01 10:37       ` Greg Kurz
@ 2015-12-01 10:53       ` Aneesh Kumar K.V
  2015-12-01 10:57         ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2015-12-01 10:53 UTC (permalink / raw)
  To: Michael S. Tsirkin, Greg Kurz; +Cc: Paolo Bonzini, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
>> On Mon, 30 Nov 2015 15:06:33 +0200
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> 


....
>> 
>> On ppc64, the address space is divided in 256MB-sized segments where all pages
>> have the same size. This is a hw limitation IIUC. I don't know if it can be
>> fixed and I'll let Ben comment on it.
>
> But it's anonymous memory with PROT_NONE.  There should be no pages there:
> just a chunk of virtual memory reserved.
>

ppc64 use page size (called as base page size) to find the hash slot in
which we find the virtual address to real address translation. All the
pages in a segment should have same base page size. Hugetlb pages have a
base page size of 16M whereas a regular linux page have 64K. mmap will
fail to map a hugetlb mapping in a segment that already have regular
pages mapped.

-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 10:53       ` Aneesh Kumar K.V
@ 2015-12-01 10:57         ` Michael S. Tsirkin
  2015-12-01 12:15           ` Aneesh Kumar K.V
  2015-12-01 13:31           ` Greg Kurz
  0 siblings, 2 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-12-01 10:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Paolo Bonzini, qemu-devel, Greg Kurz

On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> >> On Mon, 30 Nov 2015 15:06:33 +0200
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> 
> 
> 
> ....
> >> 
> >> On ppc64, the address space is divided in 256MB-sized segments where all pages
> >> have the same size. This is a hw limitation IIUC. I don't know if it can be
> >> fixed and I'll let Ben comment on it.
> >
> > But it's anonymous memory with PROT_NONE.  There should be no pages there:
> > just a chunk of virtual memory reserved.
> >
> 
> ppc64 use page size (called as base page size) to find the hash slot in
> which we find the virtual address to real address translation. All the
> pages in a segment should have same base page size. Hugetlb pages have a
> base page size of 16M whereas a regular linux page have 64K. mmap will
> fail to map a hugetlb mapping in a segment that already have regular
> pages mapped.
> 
> -aneesh


I see this in kernel:

       } else if (flags & MAP_HUGETLB) {
                struct user_struct *user = NULL;
                struct hstate *hs;

                hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
                if (!hs)
                        return -EINVAL;

                len = ALIGN(len, huge_page_size(hs));
                /*
                 * VM_NORESERVE is used because the reservations will be
                 * taken when vm_ops->mmap() is called
                 * A dummy user value is used because we are not locking
                 * memory so no accounting is necessary
                 */
                file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
                                VM_NORESERVE,
                                &user, HUGETLB_ANONHUGE_INODE,
                                (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
                if (IS_ERR(file))
                        return PTR_ERR(file);
        }

So maybe it's a question of passing in MAP_HUGETLB and the
correct size mask.

-- 
MST

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 10:57         ` Michael S. Tsirkin
@ 2015-12-01 12:15           ` Aneesh Kumar K.V
  2015-12-01 14:25             ` Michael S. Tsirkin
  2015-12-01 13:31           ` Greg Kurz
  1 sibling, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2015-12-01 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel, Greg Kurz

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
>> >> On Mon, 30 Nov 2015 15:06:33 +0200
>> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> 
>> 
>> 
>> ....
>> >> 
>> >> On ppc64, the address space is divided in 256MB-sized segments where all pages
>> >> have the same size. This is a hw limitation IIUC. I don't know if it can be
>> >> fixed and I'll let Ben comment on it.
>> >
>> > But it's anonymous memory with PROT_NONE.  There should be no pages there:
>> > just a chunk of virtual memory reserved.
>> >
>> 
>> ppc64 use page size (called as base page size) to find the hash slot in
>> which we find the virtual address to real address translation. All the
>> pages in a segment should have same base page size. Hugetlb pages have a
>> base page size of 16M whereas a regular linux page have 64K. mmap will
>> fail to map a hugetlb mapping in a segment that already have regular
>> pages mapped.
>> 
>> -aneesh
>
>
> I see this in kernel:
>
>        } else if (flags & MAP_HUGETLB) {
>                 struct user_struct *user = NULL;
>                 struct hstate *hs;
>
>                 hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
>                 if (!hs)
>                         return -EINVAL;
>
>                 len = ALIGN(len, huge_page_size(hs));
>                 /*
>                  * VM_NORESERVE is used because the reservations will be
>                  * taken when vm_ops->mmap() is called
>                  * A dummy user value is used because we are not locking
>                  * memory so no accounting is necessary
>                  */
>                 file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
>                                 VM_NORESERVE,
>                                 &user, HUGETLB_ANONHUGE_INODE,
>                                 (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>                 if (IS_ERR(file))
>                         return PTR_ERR(file);
>         }
>
> So maybe it's a question of passing in MAP_HUGETLB and the
> correct size mask.
>

Can you explain this more ?

If the question is do we need to pass fd and remove MAP_ANONYMOUS to map
hugetlb, we don't. A good example is
tools/testing/selftest/vm/map_hugetlb.c

If the question is whether we will loose hugepages on mmap even if the
mapping is PROT_NONE, then the answer is we do in the form of hugetlb
reservation.

-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 10:57         ` Michael S. Tsirkin
  2015-12-01 12:15           ` Aneesh Kumar K.V
@ 2015-12-01 13:31           ` Greg Kurz
  2015-12-01 14:19             ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Greg Kurz @ 2015-12-01 13:31 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, Aneesh Kumar K.V, qemu-devel

On Tue, 1 Dec 2015 12:57:47 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > 
> > > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> > >> On Mon, 30 Nov 2015 15:06:33 +0200
> > >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >> 
> > 
> > 
> > ....
> > >> 
> > >> On ppc64, the address space is divided in 256MB-sized segments where all pages
> > >> have the same size. This is a hw limitation IIUC. I don't know if it can be
> > >> fixed and I'll let Ben comment on it.
> > >
> > > But it's anonymous memory with PROT_NONE.  There should be no pages there:
> > > just a chunk of virtual memory reserved.
> > >
> > 
> > ppc64 use page size (called as base page size) to find the hash slot in
> > which we find the virtual address to real address translation. All the
> > pages in a segment should have same base page size. Hugetlb pages have a
> > base page size of 16M whereas a regular linux page have 64K. mmap will
> > fail to map a hugetlb mapping in a segment that already have regular
> > pages mapped.
> > 
> > -aneesh
> 
> 
> I see this in kernel:
> 
>        } else if (flags & MAP_HUGETLB) {
>                 struct user_struct *user = NULL;
>                 struct hstate *hs;
> 
>                 hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
>                 if (!hs)
>                         return -EINVAL;
> 
>                 len = ALIGN(len, huge_page_size(hs));
>                 /*
>                  * VM_NORESERVE is used because the reservations will be
>                  * taken when vm_ops->mmap() is called
>                  * A dummy user value is used because we are not locking
>                  * memory so no accounting is necessary
>                  */
>                 file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
>                                 VM_NORESERVE,
>                                 &user, HUGETLB_ANONHUGE_INODE,
>                                 (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>                 if (IS_ERR(file))
>                         return PTR_ERR(file);
>         }
> 
> So maybe it's a question of passing in MAP_HUGETLB and the
> correct size mask.
> 

I guess you are talking about the PROT_NONE mapping here ^^.

How do we know that the fd points to hugepages ?

And what's the difference between passing MAP_HUGETLB and passing a
hugetlbfs backed fd + MAP_NORESERVE ? I think the latter is easier
because we don't need to guess if backend is hugetlbfs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 13:31           ` Greg Kurz
@ 2015-12-01 14:19             ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-12-01 14:19 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Paolo Bonzini, Aneesh Kumar K.V, qemu-devel

On Tue, Dec 01, 2015 at 02:31:19PM +0100, Greg Kurz wrote:
> On Tue, 1 Dec 2015 12:57:47 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > 
> > > > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> > > >> On Mon, 30 Nov 2015 15:06:33 +0200
> > > >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >> 
> > > 
> > > 
> > > ....
> > > >> 
> > > >> On ppc64, the address space is divided in 256MB-sized segments where all pages
> > > >> have the same size. This is a hw limitation IIUC. I don't know if it can be
> > > >> fixed and I'll let Ben comment on it.
> > > >
> > > > But it's anonymous memory with PROT_NONE.  There should be no pages there:
> > > > just a chunk of virtual memory reserved.
> > > >
> > > 
> > > ppc64 use page size (called as base page size) to find the hash slot in
> > > which we find the virtual address to real address translation. All the
> > > pages in a segment should have same base page size. Hugetlb pages have a
> > > base page size of 16M whereas a regular linux page have 64K. mmap will
> > > fail to map a hugetlb mapping in a segment that already have regular
> > > pages mapped.
> > > 
> > > -aneesh
> > 
> > 
> > I see this in kernel:
> > 
> >        } else if (flags & MAP_HUGETLB) {
> >                 struct user_struct *user = NULL;
> >                 struct hstate *hs;
> > 
> >                 hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
> >                 if (!hs)
> >                         return -EINVAL;
> > 
> >                 len = ALIGN(len, huge_page_size(hs));
> >                 /*
> >                  * VM_NORESERVE is used because the reservations will be
> >                  * taken when vm_ops->mmap() is called
> >                  * A dummy user value is used because we are not locking
> >                  * memory so no accounting is necessary
> >                  */
> >                 file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
> >                                 VM_NORESERVE,
> >                                 &user, HUGETLB_ANONHUGE_INODE,
> >                                 (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> >                 if (IS_ERR(file))
> >                         return PTR_ERR(file);
> >         }
> > 
> > So maybe it's a question of passing in MAP_HUGETLB and the
> > correct size mask.
> > 
> 
> I guess you are talking about the PROT_NONE mapping here ^^.

Yes.

> How do we know that the fd points to hugepages ?

Donnu ... I guess we can just try this if the regular
mmap fails?

> And what's the difference between passing MAP_HUGETLB and passing a
> hugetlbfs backed fd + MAP_NORESERVE ?

Does MAP_NORESERVE have the desired effect?

I need to look at the kernel code, man page merely
mentions swap space use.

> I think the latter is easier
> because we don't need to guess if backend is hugetlbfs.

If this helps, that's fine by me.

It's probably a good idea to set this anyway.

-- 
MST

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings
  2015-12-01 12:15           ` Aneesh Kumar K.V
@ 2015-12-01 14:25             ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2015-12-01 14:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Paolo Bonzini, qemu-devel, Greg Kurz

On Tue, Dec 01, 2015 at 05:45:27PM +0530, Aneesh Kumar K.V wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote:
> >> >> On Mon, 30 Nov 2015 15:06:33 +0200
> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> 
> >> 
> >> 
> >> ....
> >> >> 
> >> >> On ppc64, the address space is divided in 256MB-sized segments where all pages
> >> >> have the same size. This is a hw limitation IIUC. I don't know if it can be
> >> >> fixed and I'll let Ben comment on it.
> >> >
> >> > But it's anonymous memory with PROT_NONE.  There should be no pages there:
> >> > just a chunk of virtual memory reserved.
> >> >
> >> 
> >> ppc64 use page size (called as base page size) to find the hash slot in
> >> which we find the virtual address to real address translation. All the
> >> pages in a segment should have same base page size. Hugetlb pages have a
> >> base page size of 16M whereas a regular linux page have 64K. mmap will
> >> fail to map a hugetlb mapping in a segment that already have regular
> >> pages mapped.
> >> 
> >> -aneesh
> >
> >
> > I see this in kernel:
> >
> >        } else if (flags & MAP_HUGETLB) {
> >                 struct user_struct *user = NULL;
> >                 struct hstate *hs;
> >
> >                 hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
> >                 if (!hs)
> >                         return -EINVAL;
> >
> >                 len = ALIGN(len, huge_page_size(hs));
> >                 /*
> >                  * VM_NORESERVE is used because the reservations will be
> >                  * taken when vm_ops->mmap() is called
> >                  * A dummy user value is used because we are not locking
> >                  * memory so no accounting is necessary
> >                  */
> >                 file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
> >                                 VM_NORESERVE,
> >                                 &user, HUGETLB_ANONHUGE_INODE,
> >                                 (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> >                 if (IS_ERR(file))
> >                         return PTR_ERR(file);
> >         }
> >
> > So maybe it's a question of passing in MAP_HUGETLB and the
> > correct size mask.
> >
> 
> Can you explain this more ?
> 
> If the question is do we need to pass fd and remove MAP_ANONYMOUS to map
> hugetlb, we don't. A good example is
> tools/testing/selftest/vm/map_hugetlb.c
> 
> If the question is whether we will loose hugepages on mmap even if the
> mapping is PROT_NONE, then the answer is we do in the form of hugetlb
> reservation.
> 
> -aneesh

The question is whether passing MAP_HUGETLB to the PROT_NONE
mapping with fd == -1 will get a mapping in the correct slice on ppc.

-- 
MST

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-12-01 14:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30 10:51 [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings Greg Kurz
2015-11-30 10:53 ` Paolo Bonzini
2015-11-30 13:12   ` Michael S. Tsirkin
2015-12-01 10:42     ` Greg Kurz
2015-12-01 10:52       ` Michael S. Tsirkin
2015-11-30 13:06 ` Michael S. Tsirkin
2015-11-30 13:46   ` Greg Kurz
2015-11-30 16:59     ` Michael S. Tsirkin
2015-12-01 10:37       ` Greg Kurz
2015-12-01 10:53       ` Aneesh Kumar K.V
2015-12-01 10:57         ` Michael S. Tsirkin
2015-12-01 12:15           ` Aneesh Kumar K.V
2015-12-01 14:25             ` Michael S. Tsirkin
2015-12-01 13:31           ` Greg Kurz
2015-12-01 14:19             ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.