All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2014-10-23  2:26 ` Davide Libenzi
  0 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2014-10-23  2:26 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Hugh Dickins, KOSAKI Motohiro, linux-mm

[Resending with proper CC list suggested by Andrew]

Calling munmap on a MAP_HUGETLB area, and a size which is not 2MB aligned, 
causes munmap to fail.  Tested on 3.13.x but tracking back to 3.2.x.
In do_munmap() we forcibly want a 4KB default page, and we wrongly 
calculate the end of the map.  Since the calculated end is within the end 
address of the target vma, we try to do a split with an address right in 
the middle of a huge page, which would fail with EINVAL.

Tentative (untested) patch and test case attached (be sure you have a few 
huge pages available via /proc/sys/vm/nr_hugepages tinkering).


Signed-Off-By: Davide Libenzi <davidel@xmailserver.org>


- Davide


diff --git a/mm/mmap.c b/mm/mmap.c
index 7f85520..6dba257 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2528,10 +2528,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	if ((start & ~PAGE_MASK) || start > TASK_SIZE || len > TASK_SIZE-start)
 		return -EINVAL;
 
-	len = PAGE_ALIGN(len);
-	if (len == 0)
-		return -EINVAL;
-
 	/* Find the first overlapping VMA */
 	vma = find_vma(mm, start);
 	if (!vma)
@@ -2539,6 +2535,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	prev = vma->vm_prev;
 	/* we have  start < vma->vm_end  */
 
+	if (likely(!is_vm_hugetlb_page(vma)))
+		len = PAGE_ALIGN(len);
+	else {
+		unsigned long hpage_size = huge_page_size(hstate_vma(vma));
+
+		len = ALIGN(len, hpage_size);
+	}
+	if (unlikely(len == 0))
+		return -EINVAL;
+
 	/* if it doesn't overlap, we have nothing.. */
 	end = start + len;
 	if (vma->vm_start >= end)




[hugebug.c]

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

static void test(int flags, size_t size)
{
    void* addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
                      flags | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    if (addr == MAP_FAILED)
    {
        perror("mmap");
        exit(1);
    }
    *(char*) addr = 17;

    if (munmap(addr, size) != 0)
    {
        perror("munmap");
        exit(1);
    }
}

int main(int ac, const char** av)
{
    static const size_t hugepage_size = 2 * 1024 * 1024;

    printf("Testing normal pages with 2MB size ...\n");
    test(0, hugepage_size);
    printf("OK\n");

    printf("Testing huge pages with 2MB size ...\n");
    test(MAP_HUGETLB, hugepage_size);
    printf("OK\n");


    printf("Testing normal pages with 4KB byte size ...\n");
    test(0, 4096);
    printf("OK\n");

    printf("Testing huge pages with 4KB byte size ...\n");
    test(MAP_HUGETLB, 4096);
    printf("OK\n");


    printf("Testing normal pages with 1 byte size ...\n");
    test(0, 1);
    printf("OK\n");

    printf("Testing huge pages with 1 byte size ...\n");
    test(MAP_HUGETLB, 1);
    printf("OK\n");

    return 0;
}


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2014-10-23  2:26 ` Davide Libenzi
  0 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2014-10-23  2:26 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Hugh Dickins, KOSAKI Motohiro, linux-mm

[Resending with proper CC list suggested by Andrew]

Calling munmap on a MAP_HUGETLB area, and a size which is not 2MB aligned, 
causes munmap to fail.  Tested on 3.13.x but tracking back to 3.2.x.
In do_munmap() we forcibly want a 4KB default page, and we wrongly 
calculate the end of the map.  Since the calculated end is within the end 
address of the target vma, we try to do a split with an address right in 
the middle of a huge page, which would fail with EINVAL.

Tentative (untested) patch and test case attached (be sure you have a few 
huge pages available via /proc/sys/vm/nr_hugepages tinkering).


Signed-Off-By: Davide Libenzi <davidel@xmailserver.org>


- Davide


diff --git a/mm/mmap.c b/mm/mmap.c
index 7f85520..6dba257 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2528,10 +2528,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	if ((start & ~PAGE_MASK) || start > TASK_SIZE || len > TASK_SIZE-start)
 		return -EINVAL;
 
-	len = PAGE_ALIGN(len);
-	if (len == 0)
-		return -EINVAL;
-
 	/* Find the first overlapping VMA */
 	vma = find_vma(mm, start);
 	if (!vma)
@@ -2539,6 +2535,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	prev = vma->vm_prev;
 	/* we have  start < vma->vm_end  */
 
+	if (likely(!is_vm_hugetlb_page(vma)))
+		len = PAGE_ALIGN(len);
+	else {
+		unsigned long hpage_size = huge_page_size(hstate_vma(vma));
+
+		len = ALIGN(len, hpage_size);
+	}
+	if (unlikely(len == 0))
+		return -EINVAL;
+
 	/* if it doesn't overlap, we have nothing.. */
 	end = start + len;
 	if (vma->vm_start >= end)




[hugebug.c]

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

static void test(int flags, size_t size)
{
    void* addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
                      flags | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    if (addr == MAP_FAILED)
    {
        perror("mmap");
        exit(1);
    }
    *(char*) addr = 17;

    if (munmap(addr, size) != 0)
    {
        perror("munmap");
        exit(1);
    }
}

int main(int ac, const char** av)
{
    static const size_t hugepage_size = 2 * 1024 * 1024;

    printf("Testing normal pages with 2MB size ...\n");
    test(0, hugepage_size);
    printf("OK\n");

    printf("Testing huge pages with 2MB size ...\n");
    test(MAP_HUGETLB, hugepage_size);
    printf("OK\n");


    printf("Testing normal pages with 4KB byte size ...\n");
    test(0, 4096);
    printf("OK\n");

    printf("Testing huge pages with 4KB byte size ...\n");
    test(MAP_HUGETLB, 4096);
    printf("OK\n");


    printf("Testing normal pages with 1 byte size ...\n");
    test(0, 1);
    printf("OK\n");

    printf("Testing huge pages with 1 byte size ...\n");
    test(MAP_HUGETLB, 1);
    printf("OK\n");

    return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2014-10-23  2:26 ` Davide Libenzi
@ 2015-03-26  0:47   ` Hugh Dickins
  -1 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2015-03-26  0:47 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, Hugh Dickins, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, David Rientjes, linux-mm,
	linux-kernel

On Wed, 22 Oct 2014, Davide Libenzi wrote:

> [Resending with proper CC list suggested by Andrew]

I have recently been reminded of this languishing in my inbox ;)
(along with many others that I won't get to answer so quickly).

And in turn it reminds me of an older from Joern, who was annoyed
that he couldn't mmap a hugetlbfs file with MAP_HUGETLB.

Cc'ing more people, including Eric, the father of MAP_HUGETLB.

> 
> Calling munmap on a MAP_HUGETLB area, and a size which is not 2MB aligned, 
> causes munmap to fail.  Tested on 3.13.x but tracking back to 3.2.x.

When you say "tracking back to 3.2.x", I think you mean you've tried as
far back as 3.2.x and found the same behaviour, but not tried further?

>From the source, it looks like this is unchanged since MAP_HUGETLB was
introduced in 2.6.32.  And is the same behaviour as you've been given
with hugetlbfs since it arrived in 2.5.46.

> In do_munmap() we forcibly want a 4KB default page, and we wrongly 
> calculate the end of the map.  Since the calculated end is within the end 
> address of the target vma, we try to do a split with an address right in 
> the middle of a huge page, which would fail with EINVAL.
> 
> Tentative (untested) patch and test case attached (be sure you have a few 
> huge pages available via /proc/sys/vm/nr_hugepages tinkering).
> 
> 
> Signed-Off-By: Davide Libenzi <davidel@xmailserver.org>

The patch looks to me as if it will do what you want, and I agree
that it's displeasing that you can mmap a size, and then fail to
munmap that same size.

But I tend to think that's simply typical of the clunkiness we offer
you with hugetlbfs and MAP_HUGETLB: that it would have been better to
make a different choice all those years ago, but wrong to change the
user interface now.

Perhaps others will disagree.  And if I'm wrong, and the behaviour
got somehow changed in 3.2, then that's a different story and we
should fix it back.

Hugh

> 
> 
> - Davide
> 
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 7f85520..6dba257 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2528,10 +2528,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
>  	if ((start & ~PAGE_MASK) || start > TASK_SIZE || len > TASK_SIZE-start)
>  		return -EINVAL;
>  
> -	len = PAGE_ALIGN(len);
> -	if (len == 0)
> -		return -EINVAL;
> -
>  	/* Find the first overlapping VMA */
>  	vma = find_vma(mm, start);
>  	if (!vma)
> @@ -2539,6 +2535,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
>  	prev = vma->vm_prev;
>  	/* we have  start < vma->vm_end  */
>  
> +	if (likely(!is_vm_hugetlb_page(vma)))
> +		len = PAGE_ALIGN(len);
> +	else {
> +		unsigned long hpage_size = huge_page_size(hstate_vma(vma));
> +
> +		len = ALIGN(len, hpage_size);
> +	}
> +	if (unlikely(len == 0))
> +		return -EINVAL;
> +
>  	/* if it doesn't overlap, we have nothing.. */
>  	end = start + len;
>  	if (vma->vm_start >= end)
> 
> 
> 
> 
> [hugebug.c]
> 
> #include <sys/mman.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <errno.h>
> 
> static void test(int flags, size_t size)
> {
>     void* addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
>                       flags | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
>     if (addr == MAP_FAILED)
>     {
>         perror("mmap");
>         exit(1);
>     }
>     *(char*) addr = 17;
> 
>     if (munmap(addr, size) != 0)
>     {
>         perror("munmap");
>         exit(1);
>     }
> }
> 
> int main(int ac, const char** av)
> {
>     static const size_t hugepage_size = 2 * 1024 * 1024;
> 
>     printf("Testing normal pages with 2MB size ...\n");
>     test(0, hugepage_size);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 2MB size ...\n");
>     test(MAP_HUGETLB, hugepage_size);
>     printf("OK\n");
> 
> 
>     printf("Testing normal pages with 4KB byte size ...\n");
>     test(0, 4096);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 4KB byte size ...\n");
>     test(MAP_HUGETLB, 4096);
>     printf("OK\n");
> 
> 
>     printf("Testing normal pages with 1 byte size ...\n");
>     test(0, 1);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 1 byte size ...\n");
>     test(MAP_HUGETLB, 1);
>     printf("OK\n");
> 
>     return 0;
> }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26  0:47   ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2015-03-26  0:47 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, Hugh Dickins, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, David Rientjes, linux-mm,
	linux-kernel

On Wed, 22 Oct 2014, Davide Libenzi wrote:

> [Resending with proper CC list suggested by Andrew]

I have recently been reminded of this languishing in my inbox ;)
(along with many others that I won't get to answer so quickly).

And in turn it reminds me of an older from Joern, who was annoyed
that he couldn't mmap a hugetlbfs file with MAP_HUGETLB.

Cc'ing more people, including Eric, the father of MAP_HUGETLB.

> 
> Calling munmap on a MAP_HUGETLB area, and a size which is not 2MB aligned, 
> causes munmap to fail.  Tested on 3.13.x but tracking back to 3.2.x.

When you say "tracking back to 3.2.x", I think you mean you've tried as
far back as 3.2.x and found the same behaviour, but not tried further?

>From the source, it looks like this is unchanged since MAP_HUGETLB was
introduced in 2.6.32.  And is the same behaviour as you've been given
with hugetlbfs since it arrived in 2.5.46.

> In do_munmap() we forcibly want a 4KB default page, and we wrongly 
> calculate the end of the map.  Since the calculated end is within the end 
> address of the target vma, we try to do a split with an address right in 
> the middle of a huge page, which would fail with EINVAL.
> 
> Tentative (untested) patch and test case attached (be sure you have a few 
> huge pages available via /proc/sys/vm/nr_hugepages tinkering).
> 
> 
> Signed-Off-By: Davide Libenzi <davidel@xmailserver.org>

The patch looks to me as if it will do what you want, and I agree
that it's displeasing that you can mmap a size, and then fail to
munmap that same size.

But I tend to think that's simply typical of the clunkiness we offer
you with hugetlbfs and MAP_HUGETLB: that it would have been better to
make a different choice all those years ago, but wrong to change the
user interface now.

Perhaps others will disagree.  And if I'm wrong, and the behaviour
got somehow changed in 3.2, then that's a different story and we
should fix it back.

Hugh

> 
> 
> - Davide
> 
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 7f85520..6dba257 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2528,10 +2528,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
>  	if ((start & ~PAGE_MASK) || start > TASK_SIZE || len > TASK_SIZE-start)
>  		return -EINVAL;
>  
> -	len = PAGE_ALIGN(len);
> -	if (len == 0)
> -		return -EINVAL;
> -
>  	/* Find the first overlapping VMA */
>  	vma = find_vma(mm, start);
>  	if (!vma)
> @@ -2539,6 +2535,16 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
>  	prev = vma->vm_prev;
>  	/* we have  start < vma->vm_end  */
>  
> +	if (likely(!is_vm_hugetlb_page(vma)))
> +		len = PAGE_ALIGN(len);
> +	else {
> +		unsigned long hpage_size = huge_page_size(hstate_vma(vma));
> +
> +		len = ALIGN(len, hpage_size);
> +	}
> +	if (unlikely(len == 0))
> +		return -EINVAL;
> +
>  	/* if it doesn't overlap, we have nothing.. */
>  	end = start + len;
>  	if (vma->vm_start >= end)
> 
> 
> 
> 
> [hugebug.c]
> 
> #include <sys/mman.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <errno.h>
> 
> static void test(int flags, size_t size)
> {
>     void* addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
>                       flags | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
>     if (addr == MAP_FAILED)
>     {
>         perror("mmap");
>         exit(1);
>     }
>     *(char*) addr = 17;
> 
>     if (munmap(addr, size) != 0)
>     {
>         perror("munmap");
>         exit(1);
>     }
> }
> 
> int main(int ac, const char** av)
> {
>     static const size_t hugepage_size = 2 * 1024 * 1024;
> 
>     printf("Testing normal pages with 2MB size ...\n");
>     test(0, hugepage_size);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 2MB size ...\n");
>     test(MAP_HUGETLB, hugepage_size);
>     printf("OK\n");
> 
> 
>     printf("Testing normal pages with 4KB byte size ...\n");
>     test(0, 4096);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 4KB byte size ...\n");
>     test(MAP_HUGETLB, 4096);
>     printf("OK\n");
> 
> 
>     printf("Testing normal pages with 1 byte size ...\n");
>     test(0, 1);
>     printf("OK\n");
> 
>     printf("Testing huge pages with 1 byte size ...\n");
>     test(MAP_HUGETLB, 1);
>     printf("OK\n");
> 
>     return 0;
> }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26  0:47   ` Hugh Dickins
@ 2015-03-26  1:06     ` Davide Libenzi
  -1 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26  1:06 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli, Joern Engel,
	Jianguo Wu, Eric B Munson, David Rientjes, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, Hugh Dickins wrote:

> When you say "tracking back to 3.2.x", I think you mean you've tried as
> far back as 3.2.x and found the same behaviour, but not tried further?
> 
> From the source, it looks like this is unchanged since MAP_HUGETLB was
> introduced in 2.6.32.  And is the same behaviour as you've been given
> with hugetlbfs since it arrived in 2.5.46.

Went back checking the application logs, an I had to patch the code (the 
application one - to align size on munmap()) on May 2014.
But it might be we started noticing it at that time, because before the 
allocation pattern was simply monotonic, so it could be it was always 
there.
The bug test app is ten lines of code, to verify that.


> The patch looks to me as if it will do what you want, and I agree
> that it's displeasing that you can mmap a size, and then fail to
> munmap that same size.
> 
> But I tend to think that's simply typical of the clunkiness we offer
> you with hugetlbfs and MAP_HUGETLB: that it would have been better to
> make a different choice all those years ago, but wrong to change the
> user interface now.
> 
> Perhaps others will disagree.  And if I'm wrong, and the behaviour
> got somehow changed in 3.2, then that's a different story and we
> should fix it back.

This is not an interface change, in the sense old clients will continue to 
work.
This is simply respecting the mmap(2) POSIX specification, for a feature 
which has been exposed via mmap(2).



- Davide



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26  1:06     ` Davide Libenzi
  0 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26  1:06 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli, Joern Engel,
	Jianguo Wu, Eric B Munson, David Rientjes, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, Hugh Dickins wrote:

> When you say "tracking back to 3.2.x", I think you mean you've tried as
> far back as 3.2.x and found the same behaviour, but not tried further?
> 
> From the source, it looks like this is unchanged since MAP_HUGETLB was
> introduced in 2.6.32.  And is the same behaviour as you've been given
> with hugetlbfs since it arrived in 2.5.46.

Went back checking the application logs, an I had to patch the code (the 
application one - to align size on munmap()) on May 2014.
But it might be we started noticing it at that time, because before the 
allocation pattern was simply monotonic, so it could be it was always 
there.
The bug test app is ten lines of code, to verify that.


> The patch looks to me as if it will do what you want, and I agree
> that it's displeasing that you can mmap a size, and then fail to
> munmap that same size.
> 
> But I tend to think that's simply typical of the clunkiness we offer
> you with hugetlbfs and MAP_HUGETLB: that it would have been better to
> make a different choice all those years ago, but wrong to change the
> user interface now.
> 
> Perhaps others will disagree.  And if I'm wrong, and the behaviour
> got somehow changed in 3.2, then that's a different story and we
> should fix it back.

This is not an interface change, in the sense old clients will continue to 
work.
This is simply respecting the mmap(2) POSIX specification, for a feature 
which has been exposed via mmap(2).



- Davide


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26  1:06     ` Davide Libenzi
@ 2015-03-26  3:17       ` David Rientjes
  -1 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26  3:17 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, Davide Libenzi wrote:

> > When you say "tracking back to 3.2.x", I think you mean you've tried as
> > far back as 3.2.x and found the same behaviour, but not tried further?
> > 
> > From the source, it looks like this is unchanged since MAP_HUGETLB was
> > introduced in 2.6.32.  And is the same behaviour as you've been given
> > with hugetlbfs since it arrived in 2.5.46.
> 
> Went back checking the application logs, an I had to patch the code (the 
> application one - to align size on munmap()) on May 2014.
> But it might be we started noticing it at that time, because before the 
> allocation pattern was simply monotonic, so it could be it was always 
> there.
> The bug test app is ten lines of code, to verify that.
> 

I looked at this thread at http://marc.info/?t=141392508800001 since I 
didn't have it in my mailbox, and I didn't get a chance to actually run 
your test code.

In short, I think what you're saying is that

	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
	munmap(ptr, 4KB) == EINVAL

> > The patch looks to me as if it will do what you want, and I agree
> > that it's displeasing that you can mmap a size, and then fail to
> > munmap that same size.
> > 
> > But I tend to think that's simply typical of the clunkiness we offer
> > you with hugetlbfs and MAP_HUGETLB: that it would have been better to
> > make a different choice all those years ago, but wrong to change the
> > user interface now.
> > 
> > Perhaps others will disagree.  And if I'm wrong, and the behaviour
> > got somehow changed in 3.2, then that's a different story and we
> > should fix it back.
> 
> This is not an interface change, in the sense old clients will continue to 
> work.
> This is simply respecting the mmap(2) POSIX specification, for a feature 
> which has been exposed via mmap(2).
> 

Respecting the mmap(2) POSIX specification?  I don't think 
mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates 
POSIX.1-2001 and not only because it obviously doesn't address 
MAP_HUGETLB, but I don't think the spec says the system cannot map more 
memory than len.

Using MAP_HUGETLB is really more a library function than anything else 
since you could easily implement the same behavior in a library.  That 
function includes aligning len to the hugepage size, so doing

	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)

is the equivalent to doing

	ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)

and that doesn't violate any spec.  But your patch doesn't change mmap() 
at all, so let's forget about that.

The question you pose is whether munmap(ptr, 4KB) should succeed for a 
hugetlb vma and in your patch you align this to the hugepage size of the 
vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
for a non-hugetlb vma.

The munmap() spec says the whole pages that include any part of the passed 
length should be unmapped.  In spirit, I would agree with you that the 
page size for the vma is the hugepage size so that would be what would be 
unmapped.

But that's going by a spec that doesn't address hugepages and is worded in 
a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
done.  It carries no notion of variable page sizes and how hugepages 
should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
but the munmap() behavior for hugetlb vmas is not restricted.

It would seem too dangerous at this point to change the behavior of 
munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
arise from aligning to the hugepage size.

Some applications purposefully reserve hugetlb pages by mmap() and never 
munmap() them so they have exclusive access to hugepages that were 
allocated either at boot or runtime by the sysadmin.  If they depend on 
the return value of munmap() to determine if memory to free is memory 
dynamically allocated by the application or reserved as hugetlb memory, 
then this would cause them to break.  I can't say for certain that no such 
application exists.

Since hugetlb memory is beyond the scope of the POSIX.1-2001 munmap() 
specification, and there's a potential userspace breakage if the length 
becomes hugepage aligned, I think the do_unmap() implementation is correct 
as it stands.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26  3:17       ` David Rientjes
  0 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26  3:17 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, Davide Libenzi wrote:

> > When you say "tracking back to 3.2.x", I think you mean you've tried as
> > far back as 3.2.x and found the same behaviour, but not tried further?
> > 
> > From the source, it looks like this is unchanged since MAP_HUGETLB was
> > introduced in 2.6.32.  And is the same behaviour as you've been given
> > with hugetlbfs since it arrived in 2.5.46.
> 
> Went back checking the application logs, an I had to patch the code (the 
> application one - to align size on munmap()) on May 2014.
> But it might be we started noticing it at that time, because before the 
> allocation pattern was simply monotonic, so it could be it was always 
> there.
> The bug test app is ten lines of code, to verify that.
> 

I looked at this thread at http://marc.info/?t=141392508800001 since I 
didn't have it in my mailbox, and I didn't get a chance to actually run 
your test code.

In short, I think what you're saying is that

	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
	munmap(ptr, 4KB) == EINVAL

> > The patch looks to me as if it will do what you want, and I agree
> > that it's displeasing that you can mmap a size, and then fail to
> > munmap that same size.
> > 
> > But I tend to think that's simply typical of the clunkiness we offer
> > you with hugetlbfs and MAP_HUGETLB: that it would have been better to
> > make a different choice all those years ago, but wrong to change the
> > user interface now.
> > 
> > Perhaps others will disagree.  And if I'm wrong, and the behaviour
> > got somehow changed in 3.2, then that's a different story and we
> > should fix it back.
> 
> This is not an interface change, in the sense old clients will continue to 
> work.
> This is simply respecting the mmap(2) POSIX specification, for a feature 
> which has been exposed via mmap(2).
> 

Respecting the mmap(2) POSIX specification?  I don't think 
mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates 
POSIX.1-2001 and not only because it obviously doesn't address 
MAP_HUGETLB, but I don't think the spec says the system cannot map more 
memory than len.

Using MAP_HUGETLB is really more a library function than anything else 
since you could easily implement the same behavior in a library.  That 
function includes aligning len to the hugepage size, so doing

	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)

is the equivalent to doing

	ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)

and that doesn't violate any spec.  But your patch doesn't change mmap() 
at all, so let's forget about that.

The question you pose is whether munmap(ptr, 4KB) should succeed for a 
hugetlb vma and in your patch you align this to the hugepage size of the 
vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
for a non-hugetlb vma.

The munmap() spec says the whole pages that include any part of the passed 
length should be unmapped.  In spirit, I would agree with you that the 
page size for the vma is the hugepage size so that would be what would be 
unmapped.

But that's going by a spec that doesn't address hugepages and is worded in 
a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
done.  It carries no notion of variable page sizes and how hugepages 
should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
but the munmap() behavior for hugetlb vmas is not restricted.

It would seem too dangerous at this point to change the behavior of 
munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
arise from aligning to the hugepage size.

Some applications purposefully reserve hugetlb pages by mmap() and never 
munmap() them so they have exclusive access to hugepages that were 
allocated either at boot or runtime by the sysadmin.  If they depend on 
the return value of munmap() to determine if memory to free is memory 
dynamically allocated by the application or reserved as hugetlb memory, 
then this would cause them to break.  I can't say for certain that no such 
application exists.

Since hugetlb memory is beyond the scope of the POSIX.1-2001 munmap() 
specification, and there's a potential userspace breakage if the length 
becomes hugepage aligned, I think the do_unmap() implementation is correct 
as it stands.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26  3:17       ` David Rientjes
@ 2015-03-26 11:56         ` Davide Libenzi
  -1 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26 11:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, David Rientjes wrote:

> I looked at this thread at http://marc.info/?t=141392508800001 since I 
> didn't have it in my mailbox, and I didn't get a chance to actually run 
> your test code.
> 
> In short, I think what you're saying is that
> 
> 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> 	munmap(ptr, 4KB) == EINVAL

I am not sure you have read the email correctly:

munmap(mmap(size, HUGETLB), size) = EFAIL

For every size not multiple of the huge page size.
Whereas:

munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK


> Respecting the mmap(2) POSIX specification?  I don't think 
> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates 
> POSIX.1-2001 and not only because it obviously doesn't address 
> MAP_HUGETLB, but I don't think the spec says the system cannot map more 
> memory than len.
> 
> Using MAP_HUGETLB is really more a library function than anything else 
> since you could easily implement the same behavior in a library.  That 
> function includes aligning len to the hugepage size, so doing
> 
> 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> 
> is the equivalent to doing
> 
> 	ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
> 
> and that doesn't violate any spec.  But your patch doesn't change mmap() 
> at all, so let's forget about that.

That is what every mmap() implementation does, irrespectively of any page 
size. And that is also what the POSIX spec states.
The size will be automatically rounded up to a multiple of the underline 
physical page size.
The problem is not mmap() though, in this case.


> The question you pose is whether munmap(ptr, 4KB) should succeed for a 
> hugetlb vma and in your patch you align this to the hugepage size of the 
> vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
> for a non-hugetlb vma.
> 
> The munmap() spec says the whole pages that include any part of the passed 
> length should be unmapped.  In spirit, I would agree with you that the 
> page size for the vma is the hugepage size so that would be what would be 
> unmapped.
> 
> But that's going by a spec that doesn't address hugepages and is worded in 
> a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
> done.  It carries no notion of variable page sizes and how hugepages 
> should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
> this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
> but the munmap() behavior for hugetlb vmas is not restricted.
> 
> It would seem too dangerous at this point to change the behavior of 
> munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
> arise from aligning to the hugepage size.

You mean, there is an harder failure than the current failure? :)


> Some applications purposefully reserve hugetlb pages by mmap() and never 
> munmap() them so they have exclusive access to hugepages that were 
> allocated either at boot or runtime by the sysadmin.  If they depend on 
> the return value of munmap() to determine if memory to free is memory 
> dynamically allocated by the application or reserved as hugetlb memory, 
> then this would cause them to break.  I can't say for certain that no such 
> application exists.

The fact that certain applications will seldomly call an API, should be no 
reason for API to have bugs, or at the very least, a bahviour which not 
only in not documented in the man pages, but also totally unrespectful of 
the normal mmap/munmap semantics.
Again, the scenario that you are picturing, is one where an application 
relies on a permanent (that is what it is - it always fails unless the 
munmap size is multiple than huge page size) failure of munmap, to do some 
productive task.
An munmap() of huge page aligned size, will succeed in both case (vanilla, 
and patch).


> Since hugetlb memory is beyond the scope of the POSIX.1-2001 munmap() 
> specification, and there's a potential userspace breakage if the length 
> becomes hugepage aligned, I think the do_unmap() implementation is correct 
> as it stands.

If the length is huge page aligned, it will be working with or without 
patch applied.
The problem is for the other 2097151 out of 2097152 cases, where length is 
not indeed aligned to 2MB (or whatever hugepage size is for the 
architecture).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26 11:56         ` Davide Libenzi
  0 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26 11:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Wed, 25 Mar 2015, David Rientjes wrote:

> I looked at this thread at http://marc.info/?t=141392508800001 since I 
> didn't have it in my mailbox, and I didn't get a chance to actually run 
> your test code.
> 
> In short, I think what you're saying is that
> 
> 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> 	munmap(ptr, 4KB) == EINVAL

I am not sure you have read the email correctly:

munmap(mmap(size, HUGETLB), size) = EFAIL

For every size not multiple of the huge page size.
Whereas:

munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK


> Respecting the mmap(2) POSIX specification?  I don't think 
> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates 
> POSIX.1-2001 and not only because it obviously doesn't address 
> MAP_HUGETLB, but I don't think the spec says the system cannot map more 
> memory than len.
> 
> Using MAP_HUGETLB is really more a library function than anything else 
> since you could easily implement the same behavior in a library.  That 
> function includes aligning len to the hugepage size, so doing
> 
> 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> 
> is the equivalent to doing
> 
> 	ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
> 
> and that doesn't violate any spec.  But your patch doesn't change mmap() 
> at all, so let's forget about that.

That is what every mmap() implementation does, irrespectively of any page 
size. And that is also what the POSIX spec states.
The size will be automatically rounded up to a multiple of the underline 
physical page size.
The problem is not mmap() though, in this case.


> The question you pose is whether munmap(ptr, 4KB) should succeed for a 
> hugetlb vma and in your patch you align this to the hugepage size of the 
> vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
> for a non-hugetlb vma.
> 
> The munmap() spec says the whole pages that include any part of the passed 
> length should be unmapped.  In spirit, I would agree with you that the 
> page size for the vma is the hugepage size so that would be what would be 
> unmapped.
> 
> But that's going by a spec that doesn't address hugepages and is worded in 
> a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
> done.  It carries no notion of variable page sizes and how hugepages 
> should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
> this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
> but the munmap() behavior for hugetlb vmas is not restricted.
> 
> It would seem too dangerous at this point to change the behavior of 
> munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
> arise from aligning to the hugepage size.

You mean, there is an harder failure than the current failure? :)


> Some applications purposefully reserve hugetlb pages by mmap() and never 
> munmap() them so they have exclusive access to hugepages that were 
> allocated either at boot or runtime by the sysadmin.  If they depend on 
> the return value of munmap() to determine if memory to free is memory 
> dynamically allocated by the application or reserved as hugetlb memory, 
> then this would cause them to break.  I can't say for certain that no such 
> application exists.

The fact that certain applications will seldomly call an API, should be no 
reason for API to have bugs, or at the very least, a bahviour which not 
only in not documented in the man pages, but also totally unrespectful of 
the normal mmap/munmap semantics.
Again, the scenario that you are picturing, is one where an application 
relies on a permanent (that is what it is - it always fails unless the 
munmap size is multiple than huge page size) failure of munmap, to do some 
productive task.
An munmap() of huge page aligned size, will succeed in both case (vanilla, 
and patch).


> Since hugetlb memory is beyond the scope of the POSIX.1-2001 munmap() 
> specification, and there's a potential userspace breakage if the length 
> becomes hugepage aligned, I think the do_unmap() implementation is correct 
> as it stands.

If the length is huge page aligned, it will be working with or without 
patch applied.
The problem is for the other 2097151 out of 2097152 cases, where length is 
not indeed aligned to 2MB (or whatever hugepage size is for the 
architecture).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 11:56         ` Davide Libenzi
@ 2015-03-26 14:08           ` Eric B Munson
  -1 siblings, 0 replies; 29+ messages in thread
From: Eric B Munson @ 2015-03-26 14:08 UTC (permalink / raw)
  To: Davide Libenzi, David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, linux-mm, Linux Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> On Wed, 25 Mar 2015, David Rientjes wrote:
> 
>> I looked at this thread at http://marc.info/?t=141392508800001
>> since I didn't have it in my mailbox, and I didn't get a chance
>> to actually run your test code.
>> 
>> In short, I think what you're saying is that
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>> 4KB) == EINVAL
> 
> I am not sure you have read the email correctly:
> 
> munmap(mmap(size, HUGETLB), size) = EFAIL
> 
> For every size not multiple of the huge page size. Whereas:
> 
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK

I think Davide is right here, this is a long existing bug in the
MAP_HUGETLB implementation.  Specifically, the mmap man page says:

All pages containing a part of the indicated range are unmapped, and
subsequent references to these pages will generate SIGSEGV.

I realize that huge pages may not have been considered by those that
wrote the spec.  But if I read this I would assume that all pages,
regardless of size, touched by the munmap() request should be unmapped.

Please include
Acked-by: Eric B Munson <emunson@akamai.com>
to the original patch.  I would like to see the mmap man page adjusted
to make note of this behavior as well.

> 
> 
>> Respecting the mmap(2) POSIX specification?  I don't think 
>> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates
>>  POSIX.1-2001 and not only because it obviously doesn't address 
>> MAP_HUGETLB, but I don't think the spec says the system cannot
>> map more memory than len.
>> 
>> Using MAP_HUGETLB is really more a library function than anything
>> else since you could easily implement the same behavior in a
>> library.  That function includes aligning len to the hugepage
>> size, so doing
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
>> 
>> is the equivalent to doing
>> 
>> ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
>> 
>> and that doesn't violate any spec.  But your patch doesn't change
>> mmap() at all, so let's forget about that.
> 
> That is what every mmap() implementation does, irrespectively of
> any page size. And that is also what the POSIX spec states. The
> size will be automatically rounded up to a multiple of the
> underline physical page size. The problem is not mmap() though, in
> this case.
> 
> 
>> The question you pose is whether munmap(ptr, 4KB) should succeed
>> for a hugetlb vma and in your patch you align this to the
>> hugepage size of the vma in the same manner that munmap(ptr, 2KB)
>> would be aligned to PAGE_SIZE for a non-hugetlb vma.
>> 
>> The munmap() spec says the whole pages that include any part of
>> the passed length should be unmapped.  In spirit, I would agree
>> with you that the page size for the vma is the hugepage size so
>> that would be what would be unmapped.
>> 
>> But that's going by a spec that doesn't address hugepages and is
>> worded in a way that {PAGE_SIZE} is the base unit that both
>> mmap() and munmap() is done.  It carries no notion of variable
>> page sizes and how hugepages should be handled with respect to
>> pages of {PAGE_SIZE} length.  So I think this is beyond the scope
>> of the spec: any length is aligned to PAGE_SIZE, but the munmap()
>> behavior for hugetlb vmas is not restricted.
>> 
>> It would seem too dangerous at this point to change the behavior
>> of munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs
>> could actually arise from aligning to the hugepage size.
> 
> You mean, there is an harder failure than the current failure? :)
> 
> 
>> Some applications purposefully reserve hugetlb pages by mmap()
>> and never munmap() them so they have exclusive access to
>> hugepages that were allocated either at boot or runtime by the
>> sysadmin.  If they depend on the return value of munmap() to
>> determine if memory to free is memory dynamically allocated by
>> the application or reserved as hugetlb memory, then this would
>> cause them to break.  I can't say for certain that no such 
>> application exists.
> 
> The fact that certain applications will seldomly call an API,
> should be no reason for API to have bugs, or at the very least, a
> bahviour which not only in not documented in the man pages, but
> also totally unrespectful of the normal mmap/munmap semantics. 
> Again, the scenario that you are picturing, is one where an
> application relies on a permanent (that is what it is - it always
> fails unless the munmap size is multiple than huge page size)
> failure of munmap, to do some productive task. An munmap() of huge
> page aligned size, will succeed in both case (vanilla, and patch).
> 
> 
>> Since hugetlb memory is beyond the scope of the POSIX.1-2001
>> munmap() specification, and there's a potential userspace
>> breakage if the length becomes hugepage aligned, I think the
>> do_unmap() implementation is correct as it stands.
> 
> If the length is huge page aligned, it will be working with or
> without patch applied. The problem is for the other 2097151 out of
> 2097152 cases, where length is not indeed aligned to 2MB (or
> whatever hugepage size is for the architecture).
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBAgAGBQJVFBL4AAoJELbVsDOpoOa9KoIQAIi/gAmfaQMvv9GEskFMM8Su
x3RKDf7ldHfDYbrwqKBllp/Hp3hrF0IZM5PFvq7edOEAYbgSkhIQAu0QdI2RdVlU
iKCjk4/RI2sAtNjHXZ7XrPSipBhWmbWc83cCTo38ZCbaWRwsiZ0p5t7U/K1I35tb
IKJPu8vTU3d1VupNp2Fse7VD/ImwO2HWMPXCkTT8KUcyVVKLWGQ37cSsN4jPbkQP
yqHhpzGX9P8Qkh0Hs6BFl6kjRheIKIhTD4o8Yq5kJGgXH6O8r09riMIquhogCru+
TFFVW97zjgyjRqSYE3GHu2oCdc4LLuAoRxxDMzaRqcw1C3nqKmtjB3wJEf/Pcina
UcbTlqdDjDAHy6adNb06k2q2WNNn3CdoAQIRs/mjSvdDMN+gh7TDWKMXqtv4AODc
3xedLGL2bidHCzmgrxDU1hkRjMR8DoW2MCayQoOSIpOwVhXeA2koNbgHpJD3Gboh
0y9FvLNoXMQkBi8eksatfT/kT4xn25F7OMwdP5u0euGidXsOSLz4p/87bBJpCzmr
CptQ/V+T7HgMglby8fLfZK9GE5CuwskMzrevF2cUAhyVIkXoiItD6FAh9v6SOjbh
jy4Ctq2xkCbAKhsmrUnoUOVRNlnJ8m0I9Eq1gyWHy6qU3UDmY6XBXbJEPERRbn2T
MZfuVLJLSR4Nrm7fdHoL
=C3GM
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26 14:08           ` Eric B Munson
  0 siblings, 0 replies; 29+ messages in thread
From: Eric B Munson @ 2015-03-26 14:08 UTC (permalink / raw)
  To: Davide Libenzi, David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, linux-mm, Linux Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> On Wed, 25 Mar 2015, David Rientjes wrote:
> 
>> I looked at this thread at http://marc.info/?t=141392508800001
>> since I didn't have it in my mailbox, and I didn't get a chance
>> to actually run your test code.
>> 
>> In short, I think what you're saying is that
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>> 4KB) == EINVAL
> 
> I am not sure you have read the email correctly:
> 
> munmap(mmap(size, HUGETLB), size) = EFAIL
> 
> For every size not multiple of the huge page size. Whereas:
> 
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK

I think Davide is right here, this is a long existing bug in the
MAP_HUGETLB implementation.  Specifically, the mmap man page says:

All pages containing a part of the indicated range are unmapped, and
subsequent references to these pages will generate SIGSEGV.

I realize that huge pages may not have been considered by those that
wrote the spec.  But if I read this I would assume that all pages,
regardless of size, touched by the munmap() request should be unmapped.

Please include
Acked-by: Eric B Munson <emunson@akamai.com>
to the original patch.  I would like to see the mmap man page adjusted
to make note of this behavior as well.

> 
> 
>> Respecting the mmap(2) POSIX specification?  I don't think 
>> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates
>>  POSIX.1-2001 and not only because it obviously doesn't address 
>> MAP_HUGETLB, but I don't think the spec says the system cannot
>> map more memory than len.
>> 
>> Using MAP_HUGETLB is really more a library function than anything
>> else since you could easily implement the same behavior in a
>> library.  That function includes aligning len to the hugepage
>> size, so doing
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
>> 
>> is the equivalent to doing
>> 
>> ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
>> 
>> and that doesn't violate any spec.  But your patch doesn't change
>> mmap() at all, so let's forget about that.
> 
> That is what every mmap() implementation does, irrespectively of
> any page size. And that is also what the POSIX spec states. The
> size will be automatically rounded up to a multiple of the
> underline physical page size. The problem is not mmap() though, in
> this case.
> 
> 
>> The question you pose is whether munmap(ptr, 4KB) should succeed
>> for a hugetlb vma and in your patch you align this to the
>> hugepage size of the vma in the same manner that munmap(ptr, 2KB)
>> would be aligned to PAGE_SIZE for a non-hugetlb vma.
>> 
>> The munmap() spec says the whole pages that include any part of
>> the passed length should be unmapped.  In spirit, I would agree
>> with you that the page size for the vma is the hugepage size so
>> that would be what would be unmapped.
>> 
>> But that's going by a spec that doesn't address hugepages and is
>> worded in a way that {PAGE_SIZE} is the base unit that both
>> mmap() and munmap() is done.  It carries no notion of variable
>> page sizes and how hugepages should be handled with respect to
>> pages of {PAGE_SIZE} length.  So I think this is beyond the scope
>> of the spec: any length is aligned to PAGE_SIZE, but the munmap()
>> behavior for hugetlb vmas is not restricted.
>> 
>> It would seem too dangerous at this point to change the behavior
>> of munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs
>> could actually arise from aligning to the hugepage size.
> 
> You mean, there is an harder failure than the current failure? :)
> 
> 
>> Some applications purposefully reserve hugetlb pages by mmap()
>> and never munmap() them so they have exclusive access to
>> hugepages that were allocated either at boot or runtime by the
>> sysadmin.  If they depend on the return value of munmap() to
>> determine if memory to free is memory dynamically allocated by
>> the application or reserved as hugetlb memory, then this would
>> cause them to break.  I can't say for certain that no such 
>> application exists.
> 
> The fact that certain applications will seldomly call an API,
> should be no reason for API to have bugs, or at the very least, a
> bahviour which not only in not documented in the man pages, but
> also totally unrespectful of the normal mmap/munmap semantics. 
> Again, the scenario that you are picturing, is one where an
> application relies on a permanent (that is what it is - it always
> fails unless the munmap size is multiple than huge page size)
> failure of munmap, to do some productive task. An munmap() of huge
> page aligned size, will succeed in both case (vanilla, and patch).
> 
> 
>> Since hugetlb memory is beyond the scope of the POSIX.1-2001
>> munmap() specification, and there's a potential userspace
>> breakage if the length becomes hugepage aligned, I think the
>> do_unmap() implementation is correct as it stands.
> 
> If the length is huge page aligned, it will be working with or
> without patch applied. The problem is for the other 2097151 out of
> 2097152 cases, where length is not indeed aligned to 2MB (or
> whatever hugepage size is for the architecture).
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBAgAGBQJVFBL4AAoJELbVsDOpoOa9KoIQAIi/gAmfaQMvv9GEskFMM8Su
x3RKDf7ldHfDYbrwqKBllp/Hp3hrF0IZM5PFvq7edOEAYbgSkhIQAu0QdI2RdVlU
iKCjk4/RI2sAtNjHXZ7XrPSipBhWmbWc83cCTo38ZCbaWRwsiZ0p5t7U/K1I35tb
IKJPu8vTU3d1VupNp2Fse7VD/ImwO2HWMPXCkTT8KUcyVVKLWGQ37cSsN4jPbkQP
yqHhpzGX9P8Qkh0Hs6BFl6kjRheIKIhTD4o8Yq5kJGgXH6O8r09riMIquhogCru+
TFFVW97zjgyjRqSYE3GHu2oCdc4LLuAoRxxDMzaRqcw1C3nqKmtjB3wJEf/Pcina
UcbTlqdDjDAHy6adNb06k2q2WNNn3CdoAQIRs/mjSvdDMN+gh7TDWKMXqtv4AODc
3xedLGL2bidHCzmgrxDU1hkRjMR8DoW2MCayQoOSIpOwVhXeA2koNbgHpJD3Gboh
0y9FvLNoXMQkBi8eksatfT/kT4xn25F7OMwdP5u0euGidXsOSLz4p/87bBJpCzmr
CptQ/V+T7HgMglby8fLfZK9GE5CuwskMzrevF2cUAhyVIkXoiItD6FAh9v6SOjbh
jy4Ctq2xkCbAKhsmrUnoUOVRNlnJ8m0I9Eq1gyWHy6qU3UDmY6XBXbJEPERRbn2T
MZfuVLJLSR4Nrm7fdHoL
=C3GM
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 11:56         ` Davide Libenzi
@ 2015-03-26 19:15           ` David Rientjes
  -1 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26 19:15 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > I looked at this thread at http://marc.info/?t=141392508800001 since I 
> > didn't have it in my mailbox, and I didn't get a chance to actually run 
> > your test code.
> > 
> > In short, I think what you're saying is that
> > 
> > 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> > 	munmap(ptr, 4KB) == EINVAL
> 
> I am not sure you have read the email correctly:
> 
> munmap(mmap(size, HUGETLB), size) = EFAIL
> 
> For every size not multiple of the huge page size.
> Whereas:
> 
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
> 

Yes, I read it correctly, and wrote how your test case should have failed 
above.  It fails when you do the 4KB mmap() with MAP_HUGETLB and pass 4KB 
to munmap(), correct?

I have no idea what EFAIL is, though.

> > The question you pose is whether munmap(ptr, 4KB) should succeed for a 
> > hugetlb vma and in your patch you align this to the hugepage size of the 
> > vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
> > for a non-hugetlb vma.
> > 
> > The munmap() spec says the whole pages that include any part of the passed 
> > length should be unmapped.  In spirit, I would agree with you that the 
> > page size for the vma is the hugepage size so that would be what would be 
> > unmapped.
> > 
> > But that's going by a spec that doesn't address hugepages and is worded in 
> > a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
> > done.  It carries no notion of variable page sizes and how hugepages 
> > should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
> > this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
> > but the munmap() behavior for hugetlb vmas is not restricted.
> > 
> > It would seem too dangerous at this point to change the behavior of 
> > munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
> > arise from aligning to the hugepage size.
> 
> You mean, there is an harder failure than the current failure? :)
> 

Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
hugetlb vma is long standing and there may be applications that break as a 
result of changing the behavior: a database that reserves all allocated 
hugetlb memory with mmap() so that it always has exclusive access to those 
hugepages, whether they are faulted or not, and maintains its own hugepage 
pool (which is common), may test the return value of munmap() and depend 
on it returning -EINVAL to determine if it is freeing memory that was 
either dynamically allocated or mapped from the hugetlb reserved pool.

> > Some applications purposefully reserve hugetlb pages by mmap() and never 
> > munmap() them so they have exclusive access to hugepages that were 
> > allocated either at boot or runtime by the sysadmin.  If they depend on 
> > the return value of munmap() to determine if memory to free is memory 
> > dynamically allocated by the application or reserved as hugetlb memory, 
> > then this would cause them to break.  I can't say for certain that no such 
> > application exists.
> 
> The fact that certain applications will seldomly call an API, should be no 
> reason for API to have bugs, or at the very least, a bahviour which not 
> only in not documented in the man pages, but also totally unrespectful of 
> the normal mmap/munmap semantics.

If we were to go back in time and decide this when the munmap() behavior 
for hugetlb vmas was originally introduced, that would be valid.  The 
problem is that it could lead to userspace breakage and that's a 
non-starter.

What we can do is improve the documentation and man-page to clearly 
specify the long-standing behavior so that nobody encounters unexpected 
results in the future.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26 19:15           ` David Rientjes
  0 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26 19:15 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > I looked at this thread at http://marc.info/?t=141392508800001 since I 
> > didn't have it in my mailbox, and I didn't get a chance to actually run 
> > your test code.
> > 
> > In short, I think what you're saying is that
> > 
> > 	ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> > 	munmap(ptr, 4KB) == EINVAL
> 
> I am not sure you have read the email correctly:
> 
> munmap(mmap(size, HUGETLB), size) = EFAIL
> 
> For every size not multiple of the huge page size.
> Whereas:
> 
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
> 

Yes, I read it correctly, and wrote how your test case should have failed 
above.  It fails when you do the 4KB mmap() with MAP_HUGETLB and pass 4KB 
to munmap(), correct?

I have no idea what EFAIL is, though.

> > The question you pose is whether munmap(ptr, 4KB) should succeed for a 
> > hugetlb vma and in your patch you align this to the hugepage size of the 
> > vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE 
> > for a non-hugetlb vma.
> > 
> > The munmap() spec says the whole pages that include any part of the passed 
> > length should be unmapped.  In spirit, I would agree with you that the 
> > page size for the vma is the hugepage size so that would be what would be 
> > unmapped.
> > 
> > But that's going by a spec that doesn't address hugepages and is worded in 
> > a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is 
> > done.  It carries no notion of variable page sizes and how hugepages 
> > should be handled with respect to pages of {PAGE_SIZE} length.  So I think 
> > this is beyond the scope of the spec: any length is aligned to PAGE_SIZE, 
> > but the munmap() behavior for hugetlb vmas is not restricted.
> > 
> > It would seem too dangerous at this point to change the behavior of 
> > munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually 
> > arise from aligning to the hugepage size.
> 
> You mean, there is an harder failure than the current failure? :)
> 

Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
hugetlb vma is long standing and there may be applications that break as a 
result of changing the behavior: a database that reserves all allocated 
hugetlb memory with mmap() so that it always has exclusive access to those 
hugepages, whether they are faulted or not, and maintains its own hugepage 
pool (which is common), may test the return value of munmap() and depend 
on it returning -EINVAL to determine if it is freeing memory that was 
either dynamically allocated or mapped from the hugetlb reserved pool.

> > Some applications purposefully reserve hugetlb pages by mmap() and never 
> > munmap() them so they have exclusive access to hugepages that were 
> > allocated either at boot or runtime by the sysadmin.  If they depend on 
> > the return value of munmap() to determine if memory to free is memory 
> > dynamically allocated by the application or reserved as hugetlb memory, 
> > then this would cause them to break.  I can't say for certain that no such 
> > application exists.
> 
> The fact that certain applications will seldomly call an API, should be no 
> reason for API to have bugs, or at the very least, a bahviour which not 
> only in not documented in the man pages, but also totally unrespectful of 
> the normal mmap/munmap semantics.

If we were to go back in time and decide this when the munmap() behavior 
for hugetlb vmas was originally introduced, that would be valid.  The 
problem is that it could lead to userspace breakage and that's a 
non-starter.

What we can do is improve the documentation and man-page to clearly 
specify the long-standing behavior so that nobody encounters unexpected 
results in the future.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 19:15           ` David Rientjes
@ 2015-03-26 19:39             ` Davide Libenzi
  -1 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26 19:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, David Rientjes wrote:

> Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> hugetlb vma is long standing and there may be applications that break as a 
> result of changing the behavior: a database that reserves all allocated 
> hugetlb memory with mmap() so that it always has exclusive access to those 
> hugepages, whether they are faulted or not, and maintains its own hugepage 
> pool (which is common), may test the return value of munmap() and depend 
> on it returning -EINVAL to determine if it is freeing memory that was 
> either dynamically allocated or mapped from the hugetlb reserved pool.

You went a long way to create such a case.
But, in your case, that application will erroneously considering hugepage 
mmaped memory, as dynamically allocated, since it will always get EINVAL, 
unless it passes an aligned size. Aligned size, which a fix like the one 
posted in the patch will still leave as success.
OTOH, an application, which might be more common than the one you posted,
which calls munmap() to release a pointer which it validly got from a 
previous mmap(), will leak huge pages as all the issued munmaps will fail.


> If we were to go back in time and decide this when the munmap() behavior 
> for hugetlb vmas was originally introduced, that would be valid.  The 
> problem is that it could lead to userspace breakage and that's a 
> non-starter.
> 
> What we can do is improve the documentation and man-page to clearly 
> specify the long-standing behavior so that nobody encounters unexpected 
> results in the future.

This way you will leave the mmap API with broken semantics.
In any case, I am done arguing.
I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
the mmap man pages with the new funny behaviour.



- Davide



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26 19:39             ` Davide Libenzi
  0 siblings, 0 replies; 29+ messages in thread
From: Davide Libenzi @ 2015-03-26 19:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, David Rientjes wrote:

> Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> hugetlb vma is long standing and there may be applications that break as a 
> result of changing the behavior: a database that reserves all allocated 
> hugetlb memory with mmap() so that it always has exclusive access to those 
> hugepages, whether they are faulted or not, and maintains its own hugepage 
> pool (which is common), may test the return value of munmap() and depend 
> on it returning -EINVAL to determine if it is freeing memory that was 
> either dynamically allocated or mapped from the hugetlb reserved pool.

You went a long way to create such a case.
But, in your case, that application will erroneously considering hugepage 
mmaped memory, as dynamically allocated, since it will always get EINVAL, 
unless it passes an aligned size. Aligned size, which a fix like the one 
posted in the patch will still leave as success.
OTOH, an application, which might be more common than the one you posted,
which calls munmap() to release a pointer which it validly got from a 
previous mmap(), will leak huge pages as all the issued munmaps will fail.


> If we were to go back in time and decide this when the munmap() behavior 
> for hugetlb vmas was originally introduced, that would be valid.  The 
> problem is that it could lead to userspace breakage and that's a 
> non-starter.
> 
> What we can do is improve the documentation and man-page to clearly 
> specify the long-standing behavior so that nobody encounters unexpected 
> results in the future.

This way you will leave the mmap API with broken semantics.
In any case, I am done arguing.
I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
the mmap man pages with the new funny behaviour.



- Davide


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 19:39             ` Davide Libenzi
@ 2015-03-26 20:03               ` David Rientjes
  -1 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26 20:03 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> > hugetlb vma is long standing and there may be applications that break as a 
> > result of changing the behavior: a database that reserves all allocated 
> > hugetlb memory with mmap() so that it always has exclusive access to those 
> > hugepages, whether they are faulted or not, and maintains its own hugepage 
> > pool (which is common), may test the return value of munmap() and depend 
> > on it returning -EINVAL to determine if it is freeing memory that was 
> > either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.

There was a patch proposed last week to add reserved pools to the 
hugetlbfs mount option specifically for the case where a large database 
wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
pages become reserved on mmap().  In that case, the database never wants 
to do munmap() and instead maintains its own hugepage pool.

That makes the usual database case, mmap() all necessary hugetlb pages to 
reserve them, even easier since they have historically had to maintain 
this pool amongst various processes.

Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
true, returns ptr to its hugepage pool?  I can't say for certain that none 
exist, that's why the potential for breakage exists.

> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 

That application would have to be ignoring an EINVAL return value.

> > If we were to go back in time and decide this when the munmap() behavior 
> > for hugetlb vmas was originally introduced, that would be valid.  The 
> > problem is that it could lead to userspace breakage and that's a 
> > non-starter.
> > 
> > What we can do is improve the documentation and man-page to clearly 
> > specify the long-standing behavior so that nobody encounters unexpected 
> > results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.
> 

The behavior is certainly not new, it has always been the case for 
munmap() on hugetlb vmas.

In a strict POSIX interpretation, it refers only to pages in the sense of
what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
It would be best to modify the man page to explicitly state this for 
MAP_HUGETLB.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-26 20:03               ` David Rientjes
  0 siblings, 0 replies; 29+ messages in thread
From: David Rientjes @ 2015-03-26 20:03 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List

On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> > hugetlb vma is long standing and there may be applications that break as a 
> > result of changing the behavior: a database that reserves all allocated 
> > hugetlb memory with mmap() so that it always has exclusive access to those 
> > hugepages, whether they are faulted or not, and maintains its own hugepage 
> > pool (which is common), may test the return value of munmap() and depend 
> > on it returning -EINVAL to determine if it is freeing memory that was 
> > either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.

There was a patch proposed last week to add reserved pools to the 
hugetlbfs mount option specifically for the case where a large database 
wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
pages become reserved on mmap().  In that case, the database never wants 
to do munmap() and instead maintains its own hugepage pool.

That makes the usual database case, mmap() all necessary hugetlb pages to 
reserve them, even easier since they have historically had to maintain 
this pool amongst various processes.

Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
true, returns ptr to its hugepage pool?  I can't say for certain that none 
exist, that's why the potential for breakage exists.

> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 

That application would have to be ignoring an EINVAL return value.

> > If we were to go back in time and decide this when the munmap() behavior 
> > for hugetlb vmas was originally introduced, that would be valid.  The 
> > problem is that it could lead to userspace breakage and that's a 
> > non-starter.
> > 
> > What we can do is improve the documentation and man-page to clearly 
> > specify the long-standing behavior so that nobody encounters unexpected 
> > results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.
> 

The behavior is certainly not new, it has always been the case for 
munmap() on hugetlb vmas.

In a strict POSIX interpretation, it refers only to pages in the sense of
what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
It would be best to modify the man page to explicitly state this for 
MAP_HUGETLB.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 19:39             ` Davide Libenzi
  (?)
@ 2015-03-27  9:45               ` Vlastimil Babka
  -1 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:45 UTC (permalink / raw)
  To: Davide Libenzi, David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List, linux-man, Linux API, Michael Kerrisk

On 03/26/2015 08:39 PM, Davide Libenzi wrote:
> On Thu, 26 Mar 2015, David Rientjes wrote:
> 
>> Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> hugetlb vma is long standing and there may be applications that break as a 
>> result of changing the behavior: a database that reserves all allocated 
>> hugetlb memory with mmap() so that it always has exclusive access to those 
>> hugepages, whether they are faulted or not, and maintains its own hugepage 
>> pool (which is common), may test the return value of munmap() and depend 
>> on it returning -EINVAL to determine if it is freeing memory that was 
>> either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.
> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 
> 
>> If we were to go back in time and decide this when the munmap() behavior 
>> for hugetlb vmas was originally introduced, that would be valid.  The 
>> problem is that it could lead to userspace breakage and that's a 
>> non-starter.
>> 
>> What we can do is improve the documentation and man-page to clearly 
>> specify the long-standing behavior so that nobody encounters unexpected 
>> results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.

+ CC's

You know that people don't always magically CC themselves, or read all of
lkml/linux-mm? :)

> 
> 
> - Davide
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-27  9:45               ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:45 UTC (permalink / raw)
  To: Davide Libenzi, David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Linux Kernel Mailing List,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Linux API, Michael Kerrisk

On 03/26/2015 08:39 PM, Davide Libenzi wrote:
> On Thu, 26 Mar 2015, David Rientjes wrote:
> 
>> Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> hugetlb vma is long standing and there may be applications that break as a 
>> result of changing the behavior: a database that reserves all allocated 
>> hugetlb memory with mmap() so that it always has exclusive access to those 
>> hugepages, whether they are faulted or not, and maintains its own hugepage 
>> pool (which is common), may test the return value of munmap() and depend 
>> on it returning -EINVAL to determine if it is freeing memory that was 
>> either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.
> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 
> 
>> If we were to go back in time and decide this when the munmap() behavior 
>> for hugetlb vmas was originally introduced, that would be valid.  The 
>> problem is that it could lead to userspace breakage and that's a 
>> non-starter.
>> 
>> What we can do is improve the documentation and man-page to clearly 
>> specify the long-standing behavior so that nobody encounters unexpected 
>> results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.

+ CC's

You know that people don't always magically CC themselves, or read all of
lkml/linux-mm? :)

> 
> 
> - Davide
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-27  9:45               ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:45 UTC (permalink / raw)
  To: Davide Libenzi, David Rientjes
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List, linux-man, Linux API, Michael Kerrisk

On 03/26/2015 08:39 PM, Davide Libenzi wrote:
> On Thu, 26 Mar 2015, David Rientjes wrote:
> 
>> Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> hugetlb vma is long standing and there may be applications that break as a 
>> result of changing the behavior: a database that reserves all allocated 
>> hugetlb memory with mmap() so that it always has exclusive access to those 
>> hugepages, whether they are faulted or not, and maintains its own hugepage 
>> pool (which is common), may test the return value of munmap() and depend 
>> on it returning -EINVAL to determine if it is freeing memory that was 
>> either dynamically allocated or mapped from the hugetlb reserved pool.
> 
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage 
> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> unless it passes an aligned size. Aligned size, which a fix like the one 
> posted in the patch will still leave as success.
> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a 
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
> 
> 
>> If we were to go back in time and decide this when the munmap() behavior 
>> for hugetlb vmas was originally introduced, that would be valid.  The 
>> problem is that it could lead to userspace breakage and that's a 
>> non-starter.
>> 
>> What we can do is improve the documentation and man-page to clearly 
>> specify the long-standing behavior so that nobody encounters unexpected 
>> results in the future.
> 
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> the mmap man pages with the new funny behaviour.

+ CC's

You know that people don't always magically CC themselves, or read all of
lkml/linux-mm? :)

> 
> 
> - Davide
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-27  9:47                 ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:47 UTC (permalink / raw)
  To: David Rientjes, Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List, Linux API, linux-man, Michael Kerrisk

Might be too late in this thread, but in case you are going to continue and/or
repost:

[CC += linux-api@vger.kernel.org]
(also linux-man and Michael to match my other reply)

    Since this is a kernel-user-space API change, please CC linux-api@. The
kernel source file Documentation/SubmitChecklist notes that all Linux kernel
patches that change userspace interfaces should be CCed to
linux-api@vger.kernel.org, so that the various parties who are interested in API
changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html


On 03/26/2015 09:03 PM, David Rientjes wrote:
> On Thu, 26 Mar 2015, Davide Libenzi wrote:
> 
>> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> > hugetlb vma is long standing and there may be applications that break as a 
>> > result of changing the behavior: a database that reserves all allocated 
>> > hugetlb memory with mmap() so that it always has exclusive access to those 
>> > hugepages, whether they are faulted or not, and maintains its own hugepage 
>> > pool (which is common), may test the return value of munmap() and depend 
>> > on it returning -EINVAL to determine if it is freeing memory that was 
>> > either dynamically allocated or mapped from the hugetlb reserved pool.
>> 
>> You went a long way to create such a case.
>> But, in your case, that application will erroneously considering hugepage 
>> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
>> unless it passes an aligned size. Aligned size, which a fix like the one 
>> posted in the patch will still leave as success.
> 
> There was a patch proposed last week to add reserved pools to the 
> hugetlbfs mount option specifically for the case where a large database 
> wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
> pages become reserved on mmap().  In that case, the database never wants 
> to do munmap() and instead maintains its own hugepage pool.
> 
> That makes the usual database case, mmap() all necessary hugetlb pages to 
> reserve them, even easier since they have historically had to maintain 
> this pool amongst various processes.
> 
> Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
> true, returns ptr to its hugepage pool?  I can't say for certain that none 
> exist, that's why the potential for breakage exists.
> 
>> OTOH, an application, which might be more common than the one you posted,
>> which calls munmap() to release a pointer which it validly got from a 
>> previous mmap(), will leak huge pages as all the issued munmaps will fail.
>> 
> 
> That application would have to be ignoring an EINVAL return value.
> 
>> > If we were to go back in time and decide this when the munmap() behavior 
>> > for hugetlb vmas was originally introduced, that would be valid.  The 
>> > problem is that it could lead to userspace breakage and that's a 
>> > non-starter.
>> > 
>> > What we can do is improve the documentation and man-page to clearly 
>> > specify the long-standing behavior so that nobody encounters unexpected 
>> > results in the future.
>> 
>> This way you will leave the mmap API with broken semantics.
>> In any case, I am done arguing.
>> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
>> the mmap man pages with the new funny behaviour.
>> 
> 
> The behavior is certainly not new, it has always been the case for 
> munmap() on hugetlb vmas.
> 
> In a strict POSIX interpretation, it refers only to pages in the sense of
> what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
> any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
> It would be best to modify the man page to explicitly state this for 
> MAP_HUGETLB.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-27  9:47                 ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:47 UTC (permalink / raw)
  To: David Rientjes, Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Linux Kernel Mailing List,
	Linux API, linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

Might be too late in this thread, but in case you are going to continue and/or
repost:

[CC += linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org]
(also linux-man and Michael to match my other reply)

    Since this is a kernel-user-space API change, please CC linux-api@. The
kernel source file Documentation/SubmitChecklist notes that all Linux kernel
patches that change userspace interfaces should be CCed to
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, so that the various parties who are interested in API
changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html


On 03/26/2015 09:03 PM, David Rientjes wrote:
> On Thu, 26 Mar 2015, Davide Libenzi wrote:
> 
>> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> > hugetlb vma is long standing and there may be applications that break as a 
>> > result of changing the behavior: a database that reserves all allocated 
>> > hugetlb memory with mmap() so that it always has exclusive access to those 
>> > hugepages, whether they are faulted or not, and maintains its own hugepage 
>> > pool (which is common), may test the return value of munmap() and depend 
>> > on it returning -EINVAL to determine if it is freeing memory that was 
>> > either dynamically allocated or mapped from the hugetlb reserved pool.
>> 
>> You went a long way to create such a case.
>> But, in your case, that application will erroneously considering hugepage 
>> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
>> unless it passes an aligned size. Aligned size, which a fix like the one 
>> posted in the patch will still leave as success.
> 
> There was a patch proposed last week to add reserved pools to the 
> hugetlbfs mount option specifically for the case where a large database 
> wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
> pages become reserved on mmap().  In that case, the database never wants 
> to do munmap() and instead maintains its own hugepage pool.
> 
> That makes the usual database case, mmap() all necessary hugetlb pages to 
> reserve them, even easier since they have historically had to maintain 
> this pool amongst various processes.
> 
> Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
> true, returns ptr to its hugepage pool?  I can't say for certain that none 
> exist, that's why the potential for breakage exists.
> 
>> OTOH, an application, which might be more common than the one you posted,
>> which calls munmap() to release a pointer which it validly got from a 
>> previous mmap(), will leak huge pages as all the issued munmaps will fail.
>> 
> 
> That application would have to be ignoring an EINVAL return value.
> 
>> > If we were to go back in time and decide this when the munmap() behavior 
>> > for hugetlb vmas was originally introduced, that would be valid.  The 
>> > problem is that it could lead to userspace breakage and that's a 
>> > non-starter.
>> > 
>> > What we can do is improve the documentation and man-page to clearly 
>> > specify the long-standing behavior so that nobody encounters unexpected 
>> > results in the future.
>> 
>> This way you will leave the mmap API with broken semantics.
>> In any case, I am done arguing.
>> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
>> the mmap man pages with the new funny behaviour.
>> 
> 
> The behavior is certainly not new, it has always been the case for 
> munmap() on hugetlb vmas.
> 
> In a strict POSIX interpretation, it refers only to pages in the sense of
> what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
> any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
> It would be best to modify the man page to explicitly state this for 
> MAP_HUGETLB.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-27  9:47                 ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2015-03-27  9:47 UTC (permalink / raw)
  To: David Rientjes, Davide Libenzi
  Cc: Hugh Dickins, Andrew Morton, KOSAKI Motohiro, Andrea Arcangeli,
	Joern Engel, Jianguo Wu, Eric B Munson, linux-mm,
	Linux Kernel Mailing List, Linux API, linux-man, Michael Kerrisk

Might be too late in this thread, but in case you are going to continue and/or
repost:

[CC += linux-api@vger.kernel.org]
(also linux-man and Michael to match my other reply)

    Since this is a kernel-user-space API change, please CC linux-api@. The
kernel source file Documentation/SubmitChecklist notes that all Linux kernel
patches that change userspace interfaces should be CCed to
linux-api@vger.kernel.org, so that the various parties who are interested in API
changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html


On 03/26/2015 09:03 PM, David Rientjes wrote:
> On Thu, 26 Mar 2015, Davide Libenzi wrote:
> 
>> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
>> > hugetlb vma is long standing and there may be applications that break as a 
>> > result of changing the behavior: a database that reserves all allocated 
>> > hugetlb memory with mmap() so that it always has exclusive access to those 
>> > hugepages, whether they are faulted or not, and maintains its own hugepage 
>> > pool (which is common), may test the return value of munmap() and depend 
>> > on it returning -EINVAL to determine if it is freeing memory that was 
>> > either dynamically allocated or mapped from the hugetlb reserved pool.
>> 
>> You went a long way to create such a case.
>> But, in your case, that application will erroneously considering hugepage 
>> mmaped memory, as dynamically allocated, since it will always get EINVAL, 
>> unless it passes an aligned size. Aligned size, which a fix like the one 
>> posted in the patch will still leave as success.
> 
> There was a patch proposed last week to add reserved pools to the 
> hugetlbfs mount option specifically for the case where a large database 
> wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
> pages become reserved on mmap().  In that case, the database never wants 
> to do munmap() and instead maintains its own hugepage pool.
> 
> That makes the usual database case, mmap() all necessary hugetlb pages to 
> reserve them, even easier since they have historically had to maintain 
> this pool amongst various processes.
> 
> Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
> true, returns ptr to its hugepage pool?  I can't say for certain that none 
> exist, that's why the potential for breakage exists.
> 
>> OTOH, an application, which might be more common than the one you posted,
>> which calls munmap() to release a pointer which it validly got from a 
>> previous mmap(), will leak huge pages as all the issued munmaps will fail.
>> 
> 
> That application would have to be ignoring an EINVAL return value.
> 
>> > If we were to go back in time and decide this when the munmap() behavior 
>> > for hugetlb vmas was originally introduced, that would be valid.  The 
>> > problem is that it could lead to userspace breakage and that's a 
>> > non-starter.
>> > 
>> > What we can do is improve the documentation and man-page to clearly 
>> > specify the long-standing behavior so that nobody encounters unexpected 
>> > results in the future.
>> 
>> This way you will leave the mmap API with broken semantics.
>> In any case, I am done arguing.
>> I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
>> the mmap man pages with the new funny behaviour.
>> 
> 
> The behavior is certainly not new, it has always been the case for 
> munmap() on hugetlb vmas.
> 
> In a strict POSIX interpretation, it refers only to pages in the sense of
> what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
> any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
> It would be best to modify the man page to explicitly state this for 
> MAP_HUGETLB.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 20:03               ` David Rientjes
  (?)
  (?)
@ 2015-03-27 13:51               ` Eric B Munson
  -1 siblings, 0 replies; 29+ messages in thread
From: Eric B Munson @ 2015-03-27 13:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Davide Libenzi, Hugh Dickins, Andrew Morton, KOSAKI Motohiro,
	Andrea Arcangeli, Joern Engel, Jianguo Wu, linux-mm,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3658 bytes --]

On Thu, 26 Mar 2015, David Rientjes wrote:

> On Thu, 26 Mar 2015, Davide Libenzi wrote:
> 
> > > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a 
> > > hugetlb vma is long standing and there may be applications that break as a 
> > > result of changing the behavior: a database that reserves all allocated 
> > > hugetlb memory with mmap() so that it always has exclusive access to those 
> > > hugepages, whether they are faulted or not, and maintains its own hugepage 
> > > pool (which is common), may test the return value of munmap() and depend 
> > > on it returning -EINVAL to determine if it is freeing memory that was 
> > > either dynamically allocated or mapped from the hugetlb reserved pool.
> > 
> > You went a long way to create such a case.
> > But, in your case, that application will erroneously considering hugepage 
> > mmaped memory, as dynamically allocated, since it will always get EINVAL, 
> > unless it passes an aligned size. Aligned size, which a fix like the one 
> > posted in the patch will still leave as success.
> 
> There was a patch proposed last week to add reserved pools to the 
> hugetlbfs mount option specifically for the case where a large database 
> wants sole reserved access to the hugepage pool.  This is why hugetlbfs 
> pages become reserved on mmap().  In that case, the database never wants 
> to do munmap() and instead maintains its own hugepage pool.
> 
> That makes the usual database case, mmap() all necessary hugetlb pages to 
> reserve them, even easier since they have historically had to maintain 
> this pool amongst various processes.
> 
> Is there a process out there that tests for munmap(ptr) == EINVAL and, if 
> true, returns ptr to its hugepage pool?  I can't say for certain that none 
> exist, that's why the potential for breakage exists.

Such an application can use /proc/pid/smaps to determine the page size
of a mapping.  IMO, this is relying on broken behavior but I see where
you are coming from that this behavior has been present for a long time.

As I stated before, I think we should fix this bug and make munmap()
behavior match what is described in the man page.

> 
> > OTOH, an application, which might be more common than the one you posted,
> > which calls munmap() to release a pointer which it validly got from a 
> > previous mmap(), will leak huge pages as all the issued munmaps will fail.
> > 
> 
> That application would have to be ignoring an EINVAL return value.
> 
> > > If we were to go back in time and decide this when the munmap() behavior 
> > > for hugetlb vmas was originally introduced, that would be valid.  The 
> > > problem is that it could lead to userspace breakage and that's a 
> > > non-starter.
> > > 
> > > What we can do is improve the documentation and man-page to clearly 
> > > specify the long-standing behavior so that nobody encounters unexpected 
> > > results in the future.
> > 
> > This way you will leave the mmap API with broken semantics.
> > In any case, I am done arguing.
> > I will leave to Andrew to sort it out, and to Michael Kerrisk to update 
> > the mmap man pages with the new funny behaviour.
> > 
> 
> The behavior is certainly not new, it has always been the case for 
> munmap() on hugetlb vmas.
> 
> In a strict POSIX interpretation, it refers only to pages in the sense of
> what is returned by sysconf(_SC_PAGESIZE).  Such vmas are not backed by 
> any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.  
> It would be best to modify the man page to explicitly state this for 
> MAP_HUGETLB.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-26 14:08           ` Eric B Munson
@ 2015-03-30 16:03             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2015-03-30 16:03 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Davide Libenzi, David Rientjes, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Joern Engel, Jianguo Wu, linux-mm,
	Linux Kernel Mailing List

On Thu, Mar 26, 2015 at 10:08 AM, Eric B Munson <emunson@akamai.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/26/2015 07:56 AM, Davide Libenzi wrote:
>> On Wed, 25 Mar 2015, David Rientjes wrote:
>>
>>> I looked at this thread at http://marc.info/?t=141392508800001
>>> since I didn't have it in my mailbox, and I didn't get a chance
>>> to actually run your test code.
>>>
>>> In short, I think what you're saying is that
>>>
>>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>>> 4KB) == EINVAL
>>
>> I am not sure you have read the email correctly:
>>
>> munmap(mmap(size, HUGETLB), size) = EFAIL
>>
>> For every size not multiple of the huge page size. Whereas:
>>
>> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
>
> I think Davide is right here, this is a long existing bug in the
> MAP_HUGETLB implementation.  Specifically, the mmap man page says:
>
> All pages containing a part of the indicated range are unmapped, and
> subsequent references to these pages will generate SIGSEGV.
>
> I realize that huge pages may not have been considered by those that
> wrote the spec.  But if I read this I would assume that all pages,
> regardless of size, touched by the munmap() request should be unmapped.
>
> Please include
> Acked-by: Eric B Munson <emunson@akamai.com>
> to the original patch.  I would like to see the mmap man page adjusted
> to make note of this behavior as well.

This is just a bug fix and I never think this has large risk. But
caution, we might revert immediately
if this patch arise some regression even if it's come from broken
application code.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-30 16:03             ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2015-03-30 16:03 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Davide Libenzi, David Rientjes, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Joern Engel, Jianguo Wu, linux-mm,
	Linux Kernel Mailing List

On Thu, Mar 26, 2015 at 10:08 AM, Eric B Munson <emunson@akamai.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/26/2015 07:56 AM, Davide Libenzi wrote:
>> On Wed, 25 Mar 2015, David Rientjes wrote:
>>
>>> I looked at this thread at http://marc.info/?t=141392508800001
>>> since I didn't have it in my mailbox, and I didn't get a chance
>>> to actually run your test code.
>>>
>>> In short, I think what you're saying is that
>>>
>>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>>> 4KB) == EINVAL
>>
>> I am not sure you have read the email correctly:
>>
>> munmap(mmap(size, HUGETLB), size) = EFAIL
>>
>> For every size not multiple of the huge page size. Whereas:
>>
>> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
>
> I think Davide is right here, this is a long existing bug in the
> MAP_HUGETLB implementation.  Specifically, the mmap man page says:
>
> All pages containing a part of the indicated range are unmapped, and
> subsequent references to these pages will generate SIGSEGV.
>
> I realize that huge pages may not have been considered by those that
> wrote the spec.  But if I read this I would assume that all pages,
> regardless of size, touched by the munmap() request should be unmapped.
>
> Please include
> Acked-by: Eric B Munson <emunson@akamai.com>
> to the original patch.  I would like to see the mmap man page adjusted
> to make note of this behavior as well.

This is just a bug fix and I never think this has large risk. But
caution, we might revert immediately
if this patch arise some regression even if it's come from broken
application code.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
  2015-03-30 16:03             ` KOSAKI Motohiro
@ 2015-03-30 20:32               ` Hugh Dickins
  -1 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2015-03-30 20:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric B Munson, Davide Libenzi, David Rientjes, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, Joern Engel, Jianguo Wu,
	linux-mm, Linux Kernel Mailing List

On Mon, 30 Mar 2015, KOSAKI Motohiro wrote:
> On Thu, Mar 26, 2015 at 10:08 AM, Eric B Munson <emunson@akamai.com> wrote:
> > On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> >> On Wed, 25 Mar 2015, David Rientjes wrote:
> >>
> >>> I looked at this thread at http://marc.info/?t=141392508800001
> >>> since I didn't have it in my mailbox, and I didn't get a chance
> >>> to actually run your test code.
> >>>
> >>> In short, I think what you're saying is that
> >>>
> >>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
> >>> 4KB) == EINVAL
> >>
> >> I am not sure you have read the email correctly:
> >>
> >> munmap(mmap(size, HUGETLB), size) = EFAIL
> >>
> >> For every size not multiple of the huge page size. Whereas:
> >>
> >> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
> >
> > I think Davide is right here, this is a long existing bug in the
> > MAP_HUGETLB implementation.  Specifically, the mmap man page says:
> >
> > All pages containing a part of the indicated range are unmapped, and
> > subsequent references to these pages will generate SIGSEGV.
> >
> > I realize that huge pages may not have been considered by those that
> > wrote the spec.  But if I read this I would assume that all pages,
> > regardless of size, touched by the munmap() request should be unmapped.
> >
> > Please include
> > Acked-by: Eric B Munson <emunson@akamai.com>
> > to the original patch.  I would like to see the mmap man page adjusted
> > to make note of this behavior as well.
> 
> This is just a bug fix and I never think this has large risk. But
> caution, we might revert immediately
> if this patch arise some regression even if it's come from broken
> application code.
> 
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

and, without wishing to be confrontational,
Nacked-by: Hugh Dickins <hughd@google.com>

I agree with you that the risk on munmap is probably not large;
but I still have no interest in spending more time on changing
twelve-year-old established behaviour in this way, then looking
out for the regressions and preparing to revert.

The risk is larger on mprotect, and the other calls which this
patch as is would leave inconsistent with munmap.

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
@ 2015-03-30 20:32               ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2015-03-30 20:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric B Munson, Davide Libenzi, David Rientjes, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, Joern Engel, Jianguo Wu,
	linux-mm, Linux Kernel Mailing List

On Mon, 30 Mar 2015, KOSAKI Motohiro wrote:
> On Thu, Mar 26, 2015 at 10:08 AM, Eric B Munson <emunson@akamai.com> wrote:
> > On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> >> On Wed, 25 Mar 2015, David Rientjes wrote:
> >>
> >>> I looked at this thread at http://marc.info/?t=141392508800001
> >>> since I didn't have it in my mailbox, and I didn't get a chance
> >>> to actually run your test code.
> >>>
> >>> In short, I think what you're saying is that
> >>>
> >>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
> >>> 4KB) == EINVAL
> >>
> >> I am not sure you have read the email correctly:
> >>
> >> munmap(mmap(size, HUGETLB), size) = EFAIL
> >>
> >> For every size not multiple of the huge page size. Whereas:
> >>
> >> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
> >
> > I think Davide is right here, this is a long existing bug in the
> > MAP_HUGETLB implementation.  Specifically, the mmap man page says:
> >
> > All pages containing a part of the indicated range are unmapped, and
> > subsequent references to these pages will generate SIGSEGV.
> >
> > I realize that huge pages may not have been considered by those that
> > wrote the spec.  But if I read this I would assume that all pages,
> > regardless of size, touched by the munmap() request should be unmapped.
> >
> > Please include
> > Acked-by: Eric B Munson <emunson@akamai.com>
> > to the original patch.  I would like to see the mmap man page adjusted
> > to make note of this behavior as well.
> 
> This is just a bug fix and I never think this has large risk. But
> caution, we might revert immediately
> if this patch arise some regression even if it's come from broken
> application code.
> 
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

and, without wishing to be confrontational,
Nacked-by: Hugh Dickins <hughd@google.com>

I agree with you that the risk on munmap is probably not large;
but I still have no interest in spending more time on changing
twelve-year-old established behaviour in this way, then looking
out for the regressions and preparing to revert.

The risk is larger on mprotect, and the other calls which this
patch as is would leave inconsistent with munmap.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-03-30 20:32 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-23  2:26 [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned Davide Libenzi
2014-10-23  2:26 ` Davide Libenzi
2015-03-26  0:47 ` Hugh Dickins
2015-03-26  0:47   ` Hugh Dickins
2015-03-26  1:06   ` Davide Libenzi
2015-03-26  1:06     ` Davide Libenzi
2015-03-26  3:17     ` David Rientjes
2015-03-26  3:17       ` David Rientjes
2015-03-26 11:56       ` Davide Libenzi
2015-03-26 11:56         ` Davide Libenzi
2015-03-26 14:08         ` Eric B Munson
2015-03-26 14:08           ` Eric B Munson
2015-03-30 16:03           ` KOSAKI Motohiro
2015-03-30 16:03             ` KOSAKI Motohiro
2015-03-30 20:32             ` Hugh Dickins
2015-03-30 20:32               ` Hugh Dickins
2015-03-26 19:15         ` David Rientjes
2015-03-26 19:15           ` David Rientjes
2015-03-26 19:39           ` Davide Libenzi
2015-03-26 19:39             ` Davide Libenzi
2015-03-26 20:03             ` David Rientjes
2015-03-26 20:03               ` David Rientjes
2015-03-27  9:47               ` Vlastimil Babka
2015-03-27  9:47                 ` Vlastimil Babka
2015-03-27  9:47                 ` Vlastimil Babka
2015-03-27 13:51               ` Eric B Munson
2015-03-27  9:45             ` Vlastimil Babka
2015-03-27  9:45               ` Vlastimil Babka
2015-03-27  9:45               ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.