All of lore.kernel.org
 help / color / mirror / Atom feed
* mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)
@ 2019-10-30  2:27 Li Xinhai
  2019-10-30  2:50 ` Yang Shi
  0 siblings, 1 reply; 5+ messages in thread
From: Li Xinhai @ 2019-10-30  2:27 UTC (permalink / raw)
  To: linux-mm, akpm, torvalds
  Cc: Vlastimil Babka, yang.shi, Linux API, Michal Hocko, Hugh Dickins,
	linux-kernel, lixinhai_lxh

One change in do_mbind() of this commit has suspicious usage of return value of 
queue_pages_range(), excerpt as below:

---
@@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
 	if (err)
 		goto mpol_out;
 
-	err = queue_pages_range(mm, start, end, nmask,
+	ret = queue_pages_range(mm, start, end, nmask,
 			  flags | MPOL_MF_INVERT, &pagelist);
-	if (!err)
-		err = mbind_range(mm, start, end, new);
+
+	if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<<
+		err = -EIO;
+		goto up_out;
+	}
+
+	err = mbind_range(mm, start, end, new);
 
 	if (!err) {
 		int nr_failed = 0;
---

Note that inside queue_pages_range(), the call to walk_page_range() may return 
errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error 
codes are no longer reported to user space application.

From user space, the mbind() call need to reported error, with EFAULT, as example:
EFAULT
Part or all of the memory range specified by nodemask and maxnode points 
outside your accessible address space. Or, there was an unmapped hole in the 
specified memory range specified by addr and len.

Please correct me if this is the intended change(and will have updated API 
definition), or something was misunderstood.

-Xinhai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)
  2019-10-30  2:27 mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified) Li Xinhai
@ 2019-10-30  2:50 ` Yang Shi
  2019-10-30  3:12   ` Li Xinhai
  0 siblings, 1 reply; 5+ messages in thread
From: Yang Shi @ 2019-10-30  2:50 UTC (permalink / raw)
  To: Li Xinhai, linux-mm, akpm, torvalds
  Cc: Vlastimil Babka, Linux API, Michal Hocko, Hugh Dickins,
	linux-kernel, lixinhai_lxh



On 10/29/19 7:27 PM, Li Xinhai wrote:
> One change in do_mbind() of this commit has suspicious usage of return value of
> queue_pages_range(), excerpt as below:
>
> ---
> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
>   	if (err)
>   		goto mpol_out;
>   
> -	err = queue_pages_range(mm, start, end, nmask,
> +	ret = queue_pages_range(mm, start, end, nmask,
>   			  flags | MPOL_MF_INVERT, &pagelist);
> -	if (!err)
> -		err = mbind_range(mm, start, end, new);
> +
> +	if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<<
> +		err = -EIO;
> +		goto up_out;
> +	}
> +
> +	err = mbind_range(mm, start, end, new);
>   
>   	if (!err) {
>   		int nr_failed = 0;
> ---
>
> Note that inside queue_pages_range(), the call to walk_page_range() may return
> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error
> codes are no longer reported to user space application.
>
>  From user space, the mbind() call need to reported error, with EFAULT, as example:
> EFAULT
> Part or all of the memory range specified by nodemask and maxnode points
> outside your accessible address space. Or, there was an unmapped hole in the
> specified memory range specified by addr and len.

Thanks for catching this. That commit was aimed to correct the return 
values for some corner cases in mbind(), but it should not alter the 
errno for other failure cases, i.e. -EFAULT.

Could you please try the below patch (build test only)?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967b..99df43a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned 
long len,
                           flags | MPOL_MF_INVERT, &pagelist);

         if (ret < 0) {
-               err = -EIO;
+               err = ret;
                 goto up_out;
         }


>
> Please correct me if this is the intended change(and will have updated API
> definition), or something was misunderstood.
>
> -Xinhai


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)
  2019-10-30  2:50 ` Yang Shi
@ 2019-10-30  3:12   ` Li Xinhai
  2019-10-30  4:32     ` Yang Shi
  0 siblings, 1 reply; 5+ messages in thread
From: Li Xinhai @ 2019-10-30  3:12 UTC (permalink / raw)
  To: yang.shi, linux-mm, akpm, torvalds
  Cc: Vlastimil Babka, Linux API, Michal Hocko, Hugh Dickins,
	linux-kernel, lixinhai_lxh

On 2019-10-30 at 10:50 Yang Shi wrote:
>
>
>On 10/29/19 7:27 PM, Li Xinhai wrote:
>> One change in do_mbind() of this commit has suspicious usage of return value of
>> queue_pages_range(), excerpt as below:
>>
>> ---
>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
>>   	if (err)
>>   	goto mpol_out;
>>  
>> -	err = queue_pages_range(mm, start, end, nmask,
>> +	ret = queue_pages_range(mm, start, end, nmask,
>>   	 flags | MPOL_MF_INVERT, &pagelist);
>> -	if (!err)
>> -	err = mbind_range(mm, start, end, new);
>> +
>> +	if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<<
>> +	err = -EIO;
>> +	goto up_out;
>> +	}
>> +
>> +	err = mbind_range(mm, start, end, new);
>>  
>>   	if (!err) {
>>   	int nr_failed = 0;
>> ---
>>
>> Note that inside queue_pages_range(), the call to walk_page_range() may return
>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error
>> codes are no longer reported to user space application.
>>
>>  From user space, the mbind() call need to reported error, with EFAULT, as example:
>> EFAULT
>> Part or all of the memory range specified by nodemask and maxnode points
>> outside your accessible address space. Or, there was an unmapped hole in the
>> specified memory range specified by addr and len.
>
>Thanks for catching this. That commit was aimed to correct the return
>values for some corner cases in mbind(), but it should not alter the
>errno for other failure cases, i.e. -EFAULT.
>
>Could you please try the below patch (build test only)?
>
>diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>index 4ae967b..99df43a 100644
>--- a/mm/mempolicy.c
>+++ b/mm/mempolicy.c
>@@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned
>long len,
>                           flags | MPOL_MF_INVERT, &pagelist);
>
>         if (ret < 0) {
>-               err = -EIO;
>+               err = ret;
>                 goto up_out;
>         }
>
> 
This seems do not work, because the 'pagelist' would have some pages queued 
into it, need to put back those pages instead of return quickly. 

So, we need to remove this page leak as well. <<<<<<

In my understanding, revert the changes as I quoted above may solve it, but not sure 
the details about changes at end of do_mbind(), should keep them at there without 
further change?

- Xinhai

>>
>> Please correct me if this is the intended change(and will have updated API
>> definition), or something was misunderstood.
>>
>> -Xinhai
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)
  2019-10-30  3:12   ` Li Xinhai
@ 2019-10-30  4:32     ` Yang Shi
  2019-10-30  5:59       ` Li Xinhai
  0 siblings, 1 reply; 5+ messages in thread
From: Yang Shi @ 2019-10-30  4:32 UTC (permalink / raw)
  To: Li Xinhai, linux-mm, akpm, torvalds
  Cc: Vlastimil Babka, Linux API, Michal Hocko, Hugh Dickins,
	linux-kernel, lixinhai_lxh



On 10/29/19 8:12 PM, Li Xinhai wrote:
> On 2019-10-30 at 10:50 Yang Shi wrote:
>>
>> On 10/29/19 7:27 PM, Li Xinhai wrote:
>>> One change in do_mbind() of this commit has suspicious usage of return value of
>>> queue_pages_range(), excerpt as below:
>>>
>>> ---
>>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
>>>     	if (err)
>>>     	goto mpol_out;
>>>    
>>> -	err = queue_pages_range(mm, start, end, nmask,
>>> +	ret = queue_pages_range(mm, start, end, nmask,
>>>     	 flags | MPOL_MF_INVERT, &pagelist);
>>> -	if (!err)
>>> -	err = mbind_range(mm, start, end, new);
>>> +
>>> +	if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<<
>>> +	err = -EIO;
>>> +	goto up_out;
>>> +	}
>>> +
>>> +	err = mbind_range(mm, start, end, new);
>>>    
>>>     	if (!err) {
>>>     	int nr_failed = 0;
>>> ---
>>>
>>> Note that inside queue_pages_range(), the call to walk_page_range() may return
>>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error
>>> codes are no longer reported to user space application.
>>>
>>>    From user space, the mbind() call need to reported error, with EFAULT, as example:
>>> EFAULT
>>> Part or all of the memory range specified by nodemask and maxnode points
>>> outside your accessible address space. Or, there was an unmapped hole in the
>>> specified memory range specified by addr and len.
>> Thanks for catching this. That commit was aimed to correct the return
>> values for some corner cases in mbind(), but it should not alter the
>> errno for other failure cases, i.e. -EFAULT.
>>
>> Could you please try the below patch (build test only)?
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 4ae967b..99df43a 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned
>> long len,
>>                            flags | MPOL_MF_INVERT, &pagelist);
>>
>>          if (ret < 0) {
>> -               err = -EIO;
>> +               err = ret;
>>                  goto up_out;
>>          }
>>
>>
> This seems do not work, because the 'pagelist' would have some pages queued
> into it, need to put back those pages instead of return quickly.
>
> So, we need to remove this page leak as well. <<<<<<
>
> In my understanding, revert the changes as I quoted above may solve it, but not sure
> the details about changes at end of do_mbind(), should keep them at there without
> further change?

Thanks for pointing this out. We don't have to revert this commit to 
handle the non-empty pagelist correctly. The simplest way is to just put 
those pages back and I'm supposed this is also the preferred way since 
mbind_range() is not called to really apply the policy so those pages 
should not be migrated.

The below patch should solve this:

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967b..d80025c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start, 
unsigned long len,
                           flags | MPOL_MF_INVERT, &pagelist);

         if (ret < 0) {
-               err = -EIO;
+               if (!list_empty(&pagelist))
+                       putback_movable_pages(&pagelist);
+
+               err = ret;
                 goto up_out;
         }

>
> - Xinhai
>
>>> Please correct me if this is the intended change(and will have updated API
>>> definition), or something was misunderstood.
>>>
>>> -Xinhai
> >


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)
  2019-10-30  4:32     ` Yang Shi
@ 2019-10-30  5:59       ` Li Xinhai
  0 siblings, 0 replies; 5+ messages in thread
From: Li Xinhai @ 2019-10-30  5:59 UTC (permalink / raw)
  To: yang.shi, linux-mm, akpm, torvalds
  Cc: Vlastimil Babka, Linux API, Michal Hocko, Hugh Dickins,
	linux-kernel, lixinhai_lxh

On 2019-10-30 at 12:32 Yang Shi wrote:
>
>
>On 10/29/19 8:12 PM, Li Xinhai wrote:
>> On 2019-10-30 at 10:50 Yang Shi wrote:
>>>
>>> On 10/29/19 7:27 PM, Li Xinhai wrote:
>>>> One change in do_mbind() of this commit has suspicious usage of return value of
>>>> queue_pages_range(), excerpt as below:
>>>>
>>>> ---
>>>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
>>>>     	if (err)
>>>>     	goto mpol_out;
>>>>   
>>>> -	err = queue_pages_range(mm, start, end, nmask,
>>>> +	ret = queue_pages_range(mm, start, end, nmask,
>>>>     	 flags | MPOL_MF_INVERT, &pagelist);
>>>> -	if (!err)
>>>> -	err = mbind_range(mm, start, end, new);
>>>> +
>>>> +	if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<<
>>>> +	err = -EIO;
>>>> +	goto up_out;
>>>> +	}
>>>> +
>>>> +	err = mbind_range(mm, start, end, new);
>>>>   
>>>>     	if (!err) {
>>>>     	int nr_failed = 0;
>>>> ---
>>>>
>>>> Note that inside queue_pages_range(), the call to walk_page_range() may return
>>>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error
>>>> codes are no longer reported to user space application.
>>>>
>>>>    From user space, the mbind() call need to reported error, with EFAULT, as example:
>>>> EFAULT
>>>> Part or all of the memory range specified by nodemask and maxnode points
>>>> outside your accessible address space. Or, there was an unmapped hole in the
>>>> specified memory range specified by addr and len.
>>> Thanks for catching this. That commit was aimed to correct the return
>>> values for some corner cases in mbind(), but it should not alter the
>>> errno for other failure cases, i.e. -EFAULT.
>>>
>>> Could you please try the below patch (build test only)?
>>>
>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>> index 4ae967b..99df43a 100644
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned
>>> long len,
>>>                            flags | MPOL_MF_INVERT, &pagelist);
>>>
>>>          if (ret < 0) {
>>> -               err = -EIO;
>>> +               err = ret;
>>>                  goto up_out;
>>>          }
>>>
>>>
>> This seems do not work, because the 'pagelist' would have some pages queued
>> into it, need to put back those pages instead of return quickly.
>>
>> So, we need to remove this page leak as well. <<<<<<
>>
>> In my understanding, revert the changes as I quoted above may solve it, but not sure
>> the details about changes at end of do_mbind(), should keep them at there without
>> further change?
>
>Thanks for pointing this out. We don't have to revert this commit to
>handle the non-empty pagelist correctly. The simplest way is to just put
>those pages back and I'm supposed this is also the preferred way since
>mbind_range() is not called to really apply the policy so those pages
>should not be migrated.
>
>The below patch should solve this:
>
>diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>index 4ae967b..d80025c 100644
>--- a/mm/mempolicy.c
>+++ b/mm/mempolicy.c
>@@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start,
>unsigned long len,
>                           flags | MPOL_MF_INVERT, &pagelist);
>
>         if (ret < 0) {
>-               err = -EIO;
>+               if (!list_empty(&pagelist))
>+                       putback_movable_pages(&pagelist);
>+
>+               err = ret;
>                 goto up_out;
>         }
> 
Checked the original commit about the 'ret >0' purpose, and understood that 
was for migrating page in best effort way.

Your patch looks correct to me.

>>
>> - Xinhai
>>
>>>> Please correct me if this is the intended change(and will have updated API
>>>> definition), or something was misunderstood.
>>>>
>>>> -Xinhai
>> >
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-30  6:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30  2:27 mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified) Li Xinhai
2019-10-30  2:50 ` Yang Shi
2019-10-30  3:12   ` Li Xinhai
2019-10-30  4:32     ` Yang Shi
2019-10-30  5:59       ` Li Xinhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.