linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
@ 2017-11-24  2:27 guoxuenan
  2017-11-24  8:05 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: guoxuenan @ 2017-11-24  2:27 UTC (permalink / raw)
  To: akpm, mhocko, minchan, linux-mm, linux-kernel
  Cc: rppt, hillf.zj, shli, aarcange, mgorman, kirill.shutemov,
	rientjes, khandual, riel

From: chenjie <chenjie6@huawei.com>

The madvise() system call supported a set of "conventional" advice values,
the MADV_WILLNEED parameter will trigger an infinite loop under direct
access mode(DAX). In DAX mode, the function madvise_vma() will return
directly without updating the pointer [prev].

For example:
Special circumstances:
1、init [ start < vam->vm_start < vam->vm_end < end ]
2、madvise_vma() using MADV_WILLNEED parameter ;
madvise_vma() -> madvise_willneed() -> return 0 && without updating [prev]

=======================================================================
in Function SYSCALL_DEFINE3(madvise,...)

for (;;)
{
//[first loop: start = vam->vm_start < vam->vm_end  <end ];
      update [start = vma->vm_start | end  ]

con0: if (start >= end)                 //false always;
	goto out;
      tmp = vma->vm_end;

//do not update [prev] and always return 0;
      error = madvise_willneed();

con1: if (error)                        //false always;
	goto out;

//[ vam->vm_start < start = vam->vm_end  <end ]
      update [start = tmp ]

con2: if (start >= end)                 //false always ;
	goto out;

//because of pointer [prev] did not change,[vma] keep as it was;
      update [ vma = prev->vm_next ]
}

=======================================================================
After the first cycle ;it will always keep
[ vam->vm_start < start = vam->vm_end  < end ].
since Circulation exit conditions (con{0,1,2}) will never meet ,the
program stuck in infinite loop.

Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
---
 mm/madvise.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 21261ff..c355fee 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -294,6 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
 #endif
 
 	if (IS_DAX(file_inode(file))) {
+		*prev = vma;
 		/* no bad return value, but ignore advice */
 		return 0;
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-24  2:27 [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances guoxuenan
@ 2017-11-24  8:05 ` Michal Hocko
       [not found]   ` <829af987-4d65-382c-dbd4-0c81222ebb51@huawei.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2017-11-24  8:05 UTC (permalink / raw)
  To: guoxuenan
  Cc: akpm, minchan, linux-mm, linux-kernel, rppt, hillf.zj, shli,
	aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel

On Fri 24-11-17 10:27:57, guoxuenan wrote:
> From: chenjie <chenjie6@huawei.com>
> 
> The madvise() system call supported a set of "conventional" advice values,
> the MADV_WILLNEED parameter will trigger an infinite loop under direct
> access mode(DAX). In DAX mode, the function madvise_vma() will return
> directly without updating the pointer [prev].
> 
> For example:
> Special circumstances:
> 1、init [ start < vam->vm_start < vam->vm_end < end ]
> 2、madvise_vma() using MADV_WILLNEED parameter ;
> madvise_vma() -> madvise_willneed() -> return 0 && without updating [prev]
> 
> =======================================================================
> in Function SYSCALL_DEFINE3(madvise,...)
> 
> for (;;)
> {
> //[first loop: start = vam->vm_start < vam->vm_end  <end ];
>       update [start = vma->vm_start | end  ]
> 
> con0: if (start >= end)                 //false always;
> 	goto out;
>       tmp = vma->vm_end;
> 
> //do not update [prev] and always return 0;
>       error = madvise_willneed();
> 
> con1: if (error)                        //false always;
> 	goto out;
> 
> //[ vam->vm_start < start = vam->vm_end  <end ]
>       update [start = tmp ]
> 
> con2: if (start >= end)                 //false always ;
> 	goto out;
> 
> //because of pointer [prev] did not change,[vma] keep as it was;
>       update [ vma = prev->vm_next ]
> }
> 
> =======================================================================
> After the first cycle ;it will always keep
> [ vam->vm_start < start = vam->vm_end  < end ].
> since Circulation exit conditions (con{0,1,2}) will never meet ,the
> program stuck in infinite loop.

Are you sure? Have you tested this? I might be missing something because
madvise code is a bit of a mess but AFAICS prev pointer (updated or not)
will allow to move advance
		if (prev)
			vma = prev->vm_next;
		else	/* madvise_remove dropped mmap_sem */
			vma = find_vma(current->mm, start);
note that start is vma->vm_end and find_vma will find a vma which
vma_end > addr

So either I am missing something or this code has actaully never worked
for DAX, XIP which I find rather suspicious.
 
> Signed-off-by: chenjie <chenjie6@huawei.com>
> Signed-off-by: guoxuenan <guoxuenan@huawei.com>
> ---
>  mm/madvise.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 21261ff..c355fee 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -294,6 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
>  #endif
>  
>  	if (IS_DAX(file_inode(file))) {
> +		*prev = vma;
>  		/* no bad return value, but ignore advice */
>  		return 0;
>  	}
> -- 
> 2.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
       [not found]     ` <20171124130803.hafb3zbhy7gdqkvi@dhcp22.suse.cz>
@ 2017-11-25  1:52       ` 郭雪楠
  2017-11-27  2:54         ` 郭雪楠
  0 siblings, 1 reply; 9+ messages in thread
From: 郭雪楠 @ 2017-11-25  1:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
	aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
	hillf.zj, shli

Yes , your modification is much better! thanks.

在 2017/11/24 21:08, Michal Hocko 写道:
> On Fri 24-11-17 20:51:29, 郭雪楠 wrote:
>> Sorry,I explained  wrong before. But,I've tested using trinity in DAX
>> mode,and I'am sure it has possibility of triggering an soft lockup. I have
>> encountered the problem of endless loop here .
>>
>> I had a little problem here,I correct it .
>> under Initial state :
>> [ start = vam->vm_start < vam->vm_end < end ]
>>
>> When [start = vam->vm_start] the program enters  for{;;} loop
>> ,find_vma_prev() will set the pointer vma and the pointer prev (prev =
>> vam->vm_prev ). Normally ,madvise_vma() will always move the pointer prev
>> ,but when use DAX mode , it will never update .
> [...]
>> if (prev) // here prev not NULL,it will always enter this branch ..
>> 	vma = prev->vm_next;
>> else	/* madvise_remove dropped mmap_sem */
>> 	vma = find_vma(current->mm, start);
> 
> You are right! My fault, I managed to confuse myself in the code flow.
> It really looks like this has been broken for more than 10 years since
> fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place").
> 
> Maybe the following would be more readable and less error prone?
> ---
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 375cf32087e4..a631c414f915 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -276,30 +276,26 @@ static long madvise_willneed(struct vm_area_struct *vma,
>   {
>   	struct file *file = vma->vm_file;
>   
> +	*prev = vma;
>   #ifdef CONFIG_SWAP
>   	if (!file) {
> -		*prev = vma;
>   		force_swapin_readahead(vma, start, end);
>   		return 0;
>   	}
>   
> -	if (shmem_mapping(file->f_mapping)) {
> -		*prev = vma;
> +	if (shmem_mapping(file->f_mapping))
>   		force_shm_swapin_readahead(vma, start, end,
>   					file->f_mapping);
>   		return 0;
> -	}
>   #else
>   	if (!file)
>   		return -EBADF;
>   #endif
>   
> -	if (IS_DAX(file_inode(file))) {
> +	if (IS_DAX(file_inode(file)))
>   		/* no bad return value, but ignore advice */
>   		return 0;
> -	}
>   
> -	*prev = vma;
>   	start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>   	if (end > vma->vm_end)
>   		end = vma->vm_end;
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-25  1:52       ` 郭雪楠
@ 2017-11-27  2:54         ` 郭雪楠
  2017-11-27  7:59           ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: 郭雪楠 @ 2017-11-27  2:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
	aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
	hillf.zj, shli

Hi,Michal, Whether  need me to modify according your modification and 
resubmit a new patch?

在 2017/11/25 9:52, 郭雪楠 写道:
> Yes , your modification is much better! thanks.
> 
> 在 2017/11/24 21:08, Michal Hocko 写道:
>> On Fri 24-11-17 20:51:29, 郭雪楠 wrote:
>>> Sorry,I explained  wrong before. But,I've tested using trinity in DAX
>>> mode,and I'am sure it has possibility of triggering an soft lockup. I 
>>> have
>>> encountered the problem of endless loop here .
>>>
>>> I had a little problem here,I correct it .
>>> under Initial state :
>>> [ start = vam->vm_start < vam->vm_end < end ]
>>>
>>> When [start = vam->vm_start] the program enters  for{;;} loop
>>> ,find_vma_prev() will set the pointer vma and the pointer prev (prev =
>>> vam->vm_prev ). Normally ,madvise_vma() will always move the pointer 
>>> prev
>>> ,but when use DAX mode , it will never update .
>> [...]
>>> if (prev) // here prev not NULL,it will always enter this branch ..
>>>     vma = prev->vm_next;
>>> else    /* madvise_remove dropped mmap_sem */
>>>     vma = find_vma(current->mm, start);
>>
>> You are right! My fault, I managed to confuse myself in the code flow.
>> It really looks like this has been broken for more than 10 years since
>> fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place").
>>
>> Maybe the following would be more readable and less error prone?
>> ---
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 375cf32087e4..a631c414f915 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -276,30 +276,26 @@ static long madvise_willneed(struct 
>> vm_area_struct *vma,
>>   {
>>       struct file *file = vma->vm_file;
>> +    *prev = vma;
>>   #ifdef CONFIG_SWAP
>>       if (!file) {
>> -        *prev = vma;
>>           force_swapin_readahead(vma, start, end);
>>           return 0;
>>       }
>> -    if (shmem_mapping(file->f_mapping)) {
>> -        *prev = vma;
>> +    if (shmem_mapping(file->f_mapping))
>>           force_shm_swapin_readahead(vma, start, end,
>>                       file->f_mapping);
>>           return 0;
>> -    }
>>   #else
>>       if (!file)
>>           return -EBADF;
>>   #endif
>> -    if (IS_DAX(file_inode(file))) {
>> +    if (IS_DAX(file_inode(file)))
>>           /* no bad return value, but ignore advice */
>>           return 0;
>> -    }
>> -    *prev = vma;
>>       start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>>       if (end > vma->vm_end)
>>           end = vma->vm_end;
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-27  2:54         ` 郭雪楠
@ 2017-11-27  7:59           ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-11-27  7:59 UTC (permalink / raw)
  To: 郭雪楠
  Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
	aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
	hillf.zj, shli

On Mon 27-11-17 10:54:39, 郭雪楠 wrote:
> Hi,Michal, Whether  need me to modify according your modification and
> resubmit a new patch?

please do
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-27 11:58 ` Michal Hocko
  2017-11-27 12:28   ` guoxuenan
@ 2017-11-27 12:42   ` Mike Rapoport
  1 sibling, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27 12:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: guoxuenan, akpm, minchan, linux-mm, linux-kernel, yi.zhang,
	miaoxie, shli, aarcange, mgorman, kirill.shutemov, rientjes,
	khandual, riel

On Mon, Nov 27, 2017 at 12:58:47PM +0100, Michal Hocko wrote:
> On Mon 27-11-17 19:53:18, guoxuenan wrote:
> > From: chenjie <chenjie6@huawei.com>
> > 
> > The madvise() system call supported a set of "conventional" advice values,
> > the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
> > direct access mode(DAX).
> > 
> > Infinite loop situation:
> > 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
> > 2、madvise_vma() using MADV_WILLNEED parameter;
> >    madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
> > 
> > In function SYSCALL_DEFINE3(madvise,...)
> > When [start = vam->vm_start] the program enters "for" loop,
> > find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
> > Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
> > it will never update the value of [prev].
> > 
> > =======================================================================
> > SYSCALL_DEFINE3(madvise,...)
> > {
> > 	[...]
> > 	//start = vam->start  => prev=vma->prev
> >     vma = find_vma_prev(current->mm, start, &prev);
> > 	[...]
> > 	for(;;)
> > 	{
> > 	      update [start = vma->vm_start]
> > 
> > 	con0: if (start >= end)                 //false always;
> > 	    goto out;
> > 	       tmp = vma->vm_end;
> > 
> > 	//do not update [prev] and always return 0;
> > 	       error = madvise_willneed();
> > 
> > 	con1: if (error)                        //false always;
> > 	    goto out;
> > 
> > 	//[ vam->vm_start < start = vam->vm_end  <end ]
> > 	       update [start = tmp ]
> > 
> > 	con2: if (start >= end)                 //false always ;
> > 	    goto out;
> > 
> > 	//because of pointer [prev] did not change,[vma] keep as it was;
> > 	       update [ vma = prev->vm_next ]
> > 	}
> > 	[...]
> > }
> > =======================================================================
> > After the first cycle ;it will always keep
> > vam->vm_start < start = vam->vm_end  < end  && vma = prev->vm_next;
> > since Circulation exit conditions (con{0,1,2}) will never meet ,the
> > program stuck in infinite loop.
> 
> I find your changelog a bit hard to parse. What would you think about
> the following:
> "
> MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
> Unfortunatelly madvise_willneed doesn't communicate this information
> properly to the generic madvise syscall implementation. The calling
> converion is quite subtle there. madvise_vma is supposed to either

spelling: "The calling convention"

> return an error or update &prev otherwise the main loop will never
> advance to the next vma and it will keep looping for ever without a way
> to get out of the kernel.
> 
> It seems this has been broken since introduced. Nobody has noticed
> because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
> 
> Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
> Cc: stable
> "
> 
> > Signed-off-by: chenjie <chenjie6@huawei.com>
> > Signed-off-by: guoxuenan <guoxuenan@huawei.com>
> 
> Other than that
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> > ---
> >  mm/madvise.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 375cf32..751e97a 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
> >  {
> >  	struct file *file = vma->vm_file;
> >  
> > +	*prev = vma;
> >  #ifdef CONFIG_SWAP
> >  	if (!file) {
> > -		*prev = vma;
> >  		force_swapin_readahead(vma, start, end);
> >  		return 0;
> >  	}
> >  
> >  	if (shmem_mapping(file->f_mapping)) {
> > -		*prev = vma;
> >  		force_shm_swapin_readahead(vma, start, end,
> >  					file->f_mapping);
> >  		return 0;
> > @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
> >  		return 0;
> >  	}
> >  
> > -	*prev = vma;
> >  	start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> >  	if (end > vma->vm_end)
> >  		end = vma->vm_end;
> > -- 
> > 2.9.5
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-27 11:58 ` Michal Hocko
@ 2017-11-27 12:28   ` guoxuenan
  2017-11-27 12:42   ` Mike Rapoport
  1 sibling, 0 replies; 9+ messages in thread
From: guoxuenan @ 2017-11-27 12:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, minchan, linux-mm, linux-kernel, yi.zhang, miaoxie, rppt,
	shli, aarcange, mgorman, kirill.shutemov, rientjes, khandual,
	riel

Of course! Thank you, you save my poor english :).

在 2017/11/27 19:58, Michal Hocko 写道:
> On Mon 27-11-17 19:53:18, guoxuenan wrote:
>> From: chenjie <chenjie6@huawei.com>
>>
>> The madvise() system call supported a set of "conventional" advice values,
>> the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
>> direct access mode(DAX).
>>
>> Infinite loop situation:
>> 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
>> 2、madvise_vma() using MADV_WILLNEED parameter;
>>     madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
>>
>> In function SYSCALL_DEFINE3(madvise,...)
>> When [start = vam->vm_start] the program enters "for" loop,
>> find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
>> Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
>> it will never update the value of [prev].
>>
>> =======================================================================
>> SYSCALL_DEFINE3(madvise,...)
>> {
>> 	[...]
>> 	//start = vam->start  => prev=vma->prev
>>      vma = find_vma_prev(current->mm, start, &prev);
>> 	[...]
>> 	for(;;)
>> 	{
>> 	      update [start = vma->vm_start]
>>
>> 	con0: if (start >= end)                 //false always;
>> 	    goto out;
>> 	       tmp = vma->vm_end;
>>
>> 	//do not update [prev] and always return 0;
>> 	       error = madvise_willneed();
>>
>> 	con1: if (error)                        //false always;
>> 	    goto out;
>>
>> 	//[ vam->vm_start < start = vam->vm_end  <end ]
>> 	       update [start = tmp ]
>>
>> 	con2: if (start >= end)                 //false always ;
>> 	    goto out;
>>
>> 	//because of pointer [prev] did not change,[vma] keep as it was;
>> 	       update [ vma = prev->vm_next ]
>> 	}
>> 	[...]
>> }
>> =======================================================================
>> After the first cycle ;it will always keep
>> vam->vm_start < start = vam->vm_end  < end  && vma = prev->vm_next;
>> since Circulation exit conditions (con{0,1,2}) will never meet ,the
>> program stuck in infinite loop.
> 
> I find your changelog a bit hard to parse. What would you think about
> the following:
> "
> MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
> Unfortunatelly madvise_willneed doesn't communicate this information
> properly to the generic madvise syscall implementation. The calling
> converion is quite subtle there. madvise_vma is supposed to either
> return an error or update &prev otherwise the main loop will never
> advance to the next vma and it will keep looping for ever without a way
> to get out of the kernel.
> 
> It seems this has been broken since introduced. Nobody has noticed
> because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
> 
> Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
> Cc: stable
> "
> 
>> Signed-off-by: chenjie <chenjie6@huawei.com>
>> Signed-off-by: guoxuenan <guoxuenan@huawei.com>
> 
> Other than that
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
>> ---
>>   mm/madvise.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 375cf32..751e97a 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
>>   {
>>   	struct file *file = vma->vm_file;
>>   
>> +	*prev = vma;
>>   #ifdef CONFIG_SWAP
>>   	if (!file) {
>> -		*prev = vma;
>>   		force_swapin_readahead(vma, start, end);
>>   		return 0;
>>   	}
>>   
>>   	if (shmem_mapping(file->f_mapping)) {
>> -		*prev = vma;
>>   		force_shm_swapin_readahead(vma, start, end,
>>   					file->f_mapping);
>>   		return 0;
>> @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
>>   		return 0;
>>   	}
>>   
>> -	*prev = vma;
>>   	start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>>   	if (end > vma->vm_end)
>>   		end = vma->vm_end;
>> -- 
>> 2.9.5
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
  2017-11-27 11:53 guoxuenan
@ 2017-11-27 11:58 ` Michal Hocko
  2017-11-27 12:28   ` guoxuenan
  2017-11-27 12:42   ` Mike Rapoport
  0 siblings, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2017-11-27 11:58 UTC (permalink / raw)
  To: guoxuenan
  Cc: akpm, minchan, linux-mm, linux-kernel, yi.zhang, miaoxie, rppt,
	shli, aarcange, mgorman, kirill.shutemov, rientjes, khandual,
	riel

On Mon 27-11-17 19:53:18, guoxuenan wrote:
> From: chenjie <chenjie6@huawei.com>
> 
> The madvise() system call supported a set of "conventional" advice values,
> the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
> direct access mode(DAX).
> 
> Infinite loop situation:
> 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
> 2、madvise_vma() using MADV_WILLNEED parameter;
>    madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
> 
> In function SYSCALL_DEFINE3(madvise,...)
> When [start = vam->vm_start] the program enters "for" loop,
> find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
> Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
> it will never update the value of [prev].
> 
> =======================================================================
> SYSCALL_DEFINE3(madvise,...)
> {
> 	[...]
> 	//start = vam->start  => prev=vma->prev
>     vma = find_vma_prev(current->mm, start, &prev);
> 	[...]
> 	for(;;)
> 	{
> 	      update [start = vma->vm_start]
> 
> 	con0: if (start >= end)                 //false always;
> 	    goto out;
> 	       tmp = vma->vm_end;
> 
> 	//do not update [prev] and always return 0;
> 	       error = madvise_willneed();
> 
> 	con1: if (error)                        //false always;
> 	    goto out;
> 
> 	//[ vam->vm_start < start = vam->vm_end  <end ]
> 	       update [start = tmp ]
> 
> 	con2: if (start >= end)                 //false always ;
> 	    goto out;
> 
> 	//because of pointer [prev] did not change,[vma] keep as it was;
> 	       update [ vma = prev->vm_next ]
> 	}
> 	[...]
> }
> =======================================================================
> After the first cycle ;it will always keep
> vam->vm_start < start = vam->vm_end  < end  && vma = prev->vm_next;
> since Circulation exit conditions (con{0,1,2}) will never meet ,the
> program stuck in infinite loop.

I find your changelog a bit hard to parse. What would you think about
the following:
"
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunatelly madvise_willneed doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
converion is quite subtle there. madvise_vma is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.

It seems this has been broken since introduced. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.

Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Cc: stable
"

> Signed-off-by: chenjie <chenjie6@huawei.com>
> Signed-off-by: guoxuenan <guoxuenan@huawei.com>

Other than that
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/madvise.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 375cf32..751e97a 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
>  {
>  	struct file *file = vma->vm_file;
>  
> +	*prev = vma;
>  #ifdef CONFIG_SWAP
>  	if (!file) {
> -		*prev = vma;
>  		force_swapin_readahead(vma, start, end);
>  		return 0;
>  	}
>  
>  	if (shmem_mapping(file->f_mapping)) {
> -		*prev = vma;
>  		force_shm_swapin_readahead(vma, start, end,
>  					file->f_mapping);
>  		return 0;
> @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
>  		return 0;
>  	}
>  
> -	*prev = vma;
>  	start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>  	if (end > vma->vm_end)
>  		end = vma->vm_end;
> -- 
> 2.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
@ 2017-11-27 11:53 guoxuenan
  2017-11-27 11:58 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: guoxuenan @ 2017-11-27 11:53 UTC (permalink / raw)
  To: akpm, mhocko, minchan, linux-mm, linux-kernel
  Cc: yi.zhang, miaoxie, rppt, shli, aarcange, mgorman,
	kirill.shutemov, rientjes, khandual, riel

From: chenjie <chenjie6@huawei.com>

The madvise() system call supported a set of "conventional" advice values,
the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
direct access mode(DAX).

Infinite loop situation:
1、initial state [ start = vam->vm_start < vam->vm_end < end ].
2、madvise_vma() using MADV_WILLNEED parameter;
   madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.

In function SYSCALL_DEFINE3(madvise,...)
When [start = vam->vm_start] the program enters "for" loop,
find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
it will never update the value of [prev].

=======================================================================
SYSCALL_DEFINE3(madvise,...)
{
	[...]
	//start = vam->start  => prev=vma->prev
    vma = find_vma_prev(current->mm, start, &prev);
	[...]
	for(;;)
	{
	      update [start = vma->vm_start]

	con0: if (start >= end)                 //false always;
	    goto out;
	       tmp = vma->vm_end;

	//do not update [prev] and always return 0;
	       error = madvise_willneed();

	con1: if (error)                        //false always;
	    goto out;

	//[ vam->vm_start < start = vam->vm_end  <end ]
	       update [start = tmp ]

	con2: if (start >= end)                 //false always ;
	    goto out;

	//because of pointer [prev] did not change,[vma] keep as it was;
	       update [ vma = prev->vm_next ]
	}
	[...]
}
=======================================================================
After the first cycle ;it will always keep
vam->vm_start < start = vam->vm_end  < end  && vma = prev->vm_next;
since Circulation exit conditions (con{0,1,2}) will never meet ,the
program stuck in infinite loop.

Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
---
 mm/madvise.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 375cf32..751e97a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
 {
 	struct file *file = vma->vm_file;
 
+	*prev = vma;
 #ifdef CONFIG_SWAP
 	if (!file) {
-		*prev = vma;
 		force_swapin_readahead(vma, start, end);
 		return 0;
 	}
 
 	if (shmem_mapping(file->f_mapping)) {
-		*prev = vma;
 		force_shm_swapin_readahead(vma, start, end,
 					file->f_mapping);
 		return 0;
@@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
 		return 0;
 	}
 
-	*prev = vma;
 	start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 	if (end > vma->vm_end)
 		end = vma->vm_end;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-27 12:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-24  2:27 [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances guoxuenan
2017-11-24  8:05 ` Michal Hocko
     [not found]   ` <829af987-4d65-382c-dbd4-0c81222ebb51@huawei.com>
     [not found]     ` <20171124130803.hafb3zbhy7gdqkvi@dhcp22.suse.cz>
2017-11-25  1:52       ` 郭雪楠
2017-11-27  2:54         ` 郭雪楠
2017-11-27  7:59           ` Michal Hocko
2017-11-27 11:53 guoxuenan
2017-11-27 11:58 ` Michal Hocko
2017-11-27 12:28   ` guoxuenan
2017-11-27 12:42   ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).