All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: allow unmapped hole at head side of mbind range
@ 2019-10-24  7:35 Li Xinhai
  2019-10-24 10:48 ` Vlastimil Babka
  2019-10-24 12:25 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Li Xinhai @ 2019-10-24  7:35 UTC (permalink / raw)
  To: linux-mm, linux-kernel

From: Li Xinhai  <xinhai.li@outlook.com>

mbind_range silently ignore unmapped hole at middle and tail of the 
specified range, but report EFAULT if hole at head side.
It is more reasonable to support silently ignore holes at any part of 
the range, only report EFAULT if the whole range is in hole.

Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
---

 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967bcf954..ae160d9936d9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
        unsigned long vmend;
 
        vma = find_vma(mm, start);
-       if (!vma || vma->vm_start > start)
+       if (!vma || vma->vm_start >= end)
                return -EFAULT;
 
        prev = vma->vm_prev;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: allow unmapped hole at head side of mbind range
  2019-10-24  7:35 [PATCH] mm: allow unmapped hole at head side of mbind range Li Xinhai
@ 2019-10-24 10:48 ` Vlastimil Babka
  2019-10-25  2:32     ` Hugh Dickins
  2019-10-24 12:25 ` Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2019-10-24 10:48 UTC (permalink / raw)
  To: Li Xinhai, linux-mm, linux-kernel, Linux API

+ linux-api

On 10/24/19 9:35 AM, Li Xinhai wrote:
> From: Li Xinhai  <xinhai.li@outlook.com>
> 
> mbind_range silently ignore unmapped hole at middle and tail of the 
> specified range, but report EFAULT if hole at head side.


Hmm that's unfortunate. mbind() manpage says:

EFAULT Part or all of the memory range specified by nodemask and maxnode
points outside your accessible address space.  Or, there was an unmapped
hole in  the  specified  memory range specified by addr and len.

That sounds like any hole inside the specified range should return
EFAULT. But perhaps it can be also interpreted as you suggest, that the
whole range is an unmapped hole. There's some risk of breaking existing
userspace if we change it either way.

> It is more reasonable to support silently ignore holes at any part of 
> the range, only report EFAULT if the whole range is in hole.
> 
> Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
> ---
> 
>  mm/mempolicy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4ae967bcf954..ae160d9936d9 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>         unsigned long vmend;
>  
>         vma = find_vma(mm, start);
> -       if (!vma || vma->vm_start > start)
> +       if (!vma || vma->vm_start >= end)
>                 return -EFAULT;
>  
>         prev = vma->vm_prev;
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: allow unmapped hole at head side of mbind range
  2019-10-24  7:35 [PATCH] mm: allow unmapped hole at head side of mbind range Li Xinhai
  2019-10-24 10:48 ` Vlastimil Babka
@ 2019-10-24 12:25 ` Michal Hocko
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2019-10-24 12:25 UTC (permalink / raw)
  To: Li Xinhai; +Cc: linux-mm, linux-kernel

On Thu 24-10-19 07:35:06, Li Xinhai wrote:
> From: Li Xinhai  <xinhai.li@outlook.com>
> 
> mbind_range silently ignore unmapped hole at middle and tail of the 
> specified range, but report EFAULT if hole at head side.
> It is more reasonable to support silently ignore holes at any part of 
> the range, only report EFAULT if the whole range is in hole.

The changelog is a bit cryptic but you are right. find_vma returns the
first vma that ends above the given address. If vm_start > start
then there is still an overlap possible
              [    vma      ]
    [start        end]
and we should mbind [vma->vm_start, end] at least. I haven't checked
whether changing the condition is sufficient for the rest of the code to
work properly. I am pretty sure a test case shouldn't be really hard
to construct and add to the kernel testing machinery.

Btw. when writing a changelog then it is always preferred to describe
user visible effect of the patch. In this case it would be an unexpected
EFAULT on a range that starts before an existing VMA while still
overlapping it. Make sure to note that.

Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
> Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
> ---
> 
>  mm/mempolicy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4ae967bcf954..ae160d9936d9 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>         unsigned long vmend;
>  
>         vma = find_vma(mm, start);
> -       if (!vma || vma->vm_start > start)
> +       if (!vma || vma->vm_start >= end)
>                 return -EFAULT;
>  
>         prev = vma->vm_prev;
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: allow unmapped hole at head side of mbind range
  2019-10-24 10:48 ` Vlastimil Babka
@ 2019-10-25  2:32     ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2019-10-25  2:32 UTC (permalink / raw)
  To: Li Xinhai
  Cc: Vlastimil Babka, Michal Hocko, linux-mm, linux-kernel, Linux API

On Thu, 24 Oct 2019, Vlastimil Babka wrote:

> + linux-api
> 
> On 10/24/19 9:35 AM, Li Xinhai wrote:
> > From: Li Xinhai  <xinhai.li@outlook.com>
> > 
> > mbind_range silently ignore unmapped hole at middle and tail of the 
> > specified range, but report EFAULT if hole at head side.
> 
> 
> Hmm that's unfortunate. mbind() manpage says:
> 
> EFAULT Part or all of the memory range specified by nodemask and maxnode
> points outside your accessible address space.  Or, there was an unmapped
> hole in  the  specified  memory range specified by addr and len.
> 
> That sounds like any hole inside the specified range should return
> EFAULT.

Yes (though an exception is allowed when restoring to default).

> But perhaps it can be also interpreted as you suggest, that the
> whole range is an unmapped hole.  There's some risk of breaking existing
> userspace if we change it either way.
> 
> > It is more reasonable to support silently ignore holes at any part of 
> > the range, only report EFAULT if the whole range is in hole.
> > 
> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>

Xinhai, I'm sceptical about this patch: is it something you found
by code inspection, or something you found when using mbind()?

I've not looked long enough to be certain, nor experimented, but:

mbind_range() is only one stage of the mbind() syscall implementation,
and is preceded by queue_pages_range(): look what queue_pages_test_walk()
does when MPOL_MF_DISCONTIG_OK not set.

My impression is that mbind_range() is merely correcting an omission
from the checks already made my queue_pages_test_walk() (an odd way
to proceed, I admit: would be better to check initially than later).

I do think that you should not make this change without considering
MPOL_MF_DISCONTIG_OK and its intention.

Hugh

> > ---
> > 
> >  mm/mempolicy.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4ae967bcf954..ae160d9936d9 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> >         unsigned long vmend;
> >  
> >         vma = find_vma(mm, start);
> > -       if (!vma || vma->vm_start > start)
> > +       if (!vma || vma->vm_start >= end)
> >                 return -EFAULT;
> >  
> >         prev = vma->vm_prev;
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: allow unmapped hole at head side of mbind range
@ 2019-10-25  2:32     ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2019-10-25  2:32 UTC (permalink / raw)
  To: Li Xinhai
  Cc: Vlastimil Babka, Michal Hocko, linux-mm, linux-kernel, Linux API

On Thu, 24 Oct 2019, Vlastimil Babka wrote:

> + linux-api
> 
> On 10/24/19 9:35 AM, Li Xinhai wrote:
> > From: Li Xinhai  <xinhai.li@outlook.com>
> > 
> > mbind_range silently ignore unmapped hole at middle and tail of the 
> > specified range, but report EFAULT if hole at head side.
> 
> 
> Hmm that's unfortunate. mbind() manpage says:
> 
> EFAULT Part or all of the memory range specified by nodemask and maxnode
> points outside your accessible address space.  Or, there was an unmapped
> hole in  the  specified  memory range specified by addr and len.
> 
> That sounds like any hole inside the specified range should return
> EFAULT.

Yes (though an exception is allowed when restoring to default).

> But perhaps it can be also interpreted as you suggest, that the
> whole range is an unmapped hole.  There's some risk of breaking existing
> userspace if we change it either way.
> 
> > It is more reasonable to support silently ignore holes at any part of 
> > the range, only report EFAULT if the whole range is in hole.
> > 
> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>

Xinhai, I'm sceptical about this patch: is it something you found
by code inspection, or something you found when using mbind()?

I've not looked long enough to be certain, nor experimented, but:

mbind_range() is only one stage of the mbind() syscall implementation,
and is preceded by queue_pages_range(): look what queue_pages_test_walk()
does when MPOL_MF_DISCONTIG_OK not set.

My impression is that mbind_range() is merely correcting an omission
from the checks already made my queue_pages_test_walk() (an odd way
to proceed, I admit: would be better to check initially than later).

I do think that you should not make this change without considering
MPOL_MF_DISCONTIG_OK and its intention.

Hugh

> > ---
> > 
> >  mm/mempolicy.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4ae967bcf954..ae160d9936d9 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> >         unsigned long vmend;
> >  
> >         vma = find_vma(mm, start);
> > -       if (!vma || vma->vm_start > start)
> > +       if (!vma || vma->vm_start >= end)
> >                 return -EFAULT;
> >  
> >         prev = vma->vm_prev;
> > 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: allow unmapped hole at head side of mbind range
  2019-10-25  2:32     ` Hugh Dickins
  (?)
@ 2019-10-28  8:12     ` lixinhai.lxh
  -1 siblings, 0 replies; 6+ messages in thread
From: lixinhai.lxh @ 2019-10-28  8:12 UTC (permalink / raw)
  To: Hugh Dickins, xinhai.li, lixinhai_lxh
  Cc: Vlastimil Babka, Michal Hocko, linux-mm, linux-kernel, Linux API

On 2019-10-28 at 15:14:51 Hugh Dickins wrote:
>On Thu, 24 Oct 2019, Vlastimil Babka wrote:
>
>> + linux-api
>>
>> On 10/24/19 9:35 AM, Li Xinhai wrote:
>> > From: Li Xinhai  <xinhai.li@outlook.com>
>> >
>> > mbind_range silently ignore unmapped hole at middle and tail of the
>> > specified range, but report EFAULT if hole at head side.
>>
>>
>> Hmm that's unfortunate. mbind() manpage says:
>>
>> EFAULT Part or all of the memory range specified by nodemask and maxnode
>> points outside your accessible address space.  Or, there was an unmapped
>> hole in  the  specified  memory range specified by addr and len.
>>
>> That sounds like any hole inside the specified range should return
>> EFAULT.
>
>Yes (though an exception is allowed when restoring to default).
>
>> But perhaps it can be also interpreted as you suggest, that the
>> whole range is an unmapped hole.  There's some risk of breaking existing
>> userspace if we change it either way.
>>
>> > It is more reasonable to support silently ignore holes at any part of
>> > the range, only report EFAULT if the whole range is in hole.
>> >
>> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
>
>Xinhai, I'm sceptical about this patch: is it something you found
>by code inspection, or something you found when using mbind()?
> 
I encountered issue when using mbind (my issue was about using nodemask 
parameter), and then found this special range checking in mbind_range().

>I've not looked long enough to be certain, nor experimented, but:
>
>mbind_range() is only one stage of the mbind() syscall implementation,
>and is preceded by queue_pages_range(): look what queue_pages_test_walk()
>does when MPOL_MF_DISCONTIG_OK not set.
>
>My impression is that mbind_range() is merely correcting an omission
>from the checks already made my queue_pages_test_walk() (an odd way
>to proceed, I admit: would be better to check initially than later).
>
>I do think that you should not make this change without considering
>MPOL_MF_DISCONTIG_OK and its intention.
>
>Hugh
> 

A program was used to reveal issues as below.

#include <stddef.h>
#include <sys/mman.h>
#include <numaif.h>

int main(int argc, char *argv[])
{
    void *mapAddr;
    unsigned long nodemask;

    mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE|
        MAP_ANONYMOUS, -1, 0);

    // BIND and leave 2 pages as hole in the middle
    nodemask = 0x1;
    mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0);
    munmap(mapAddr+2*(1<<12), 2*(1<<12));

    // part 1
    mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
    mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0);

    // part 2
    nodemask = 0x2;
    mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);

    return 0;
}

syscall results:
     83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000
     84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0
     85 munmap(0x7fbd24e15000, 8192)            = 0
// part 1
     86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address)
     87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0
// part 2
     88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
     90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0

The results on line 86 and line 89 were not correct (other lines were expected):
line 86: hole at head side of range was reported as error; this should 
  sucess for MPOL_DEFAULT;
line 89: hole at tail side of range was reported as success; this should 
  fail for !MPOL_DEFAULT cases;
  
My patch only corrected line 86 case, but didn't handle line 89 case. It 
is better to detect valid or invalid hole for MPOL_DEFAULT and 
!MPOL_DEFAULT cases in queue_pages_range phase.

New patch will be prepared, and fullfill the linux API description:
1. for MPOL_DEFAULT, hole at any part of specified range is allowed;
2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed.

Xinhai
(BTW, I am adding two more mail accounts of mine to check which is best for 
this mailling list...)

>> > ---
>> >
>> >  mm/mempolicy.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index 4ae967bcf954..ae160d9936d9 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> >         unsigned long vmend;
>> > 
>> >         vma = find_vma(mm, start);
>> > -       if (!vma || vma->vm_start > start)
>> > +       if (!vma || vma->vm_start >= end)
>> >                 return -EFAULT;
>> > 
>> >         prev = vma->vm_prev;
>> >
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-28  8:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  7:35 [PATCH] mm: allow unmapped hole at head side of mbind range Li Xinhai
2019-10-24 10:48 ` Vlastimil Babka
2019-10-25  2:32   ` Hugh Dickins
2019-10-25  2:32     ` Hugh Dickins
2019-10-28  8:12     ` lixinhai.lxh
2019-10-24 12:25 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.