* [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
@ 2017-11-24 2:27 guoxuenan
2017-11-24 8:05 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: guoxuenan @ 2017-11-24 2:27 UTC (permalink / raw)
To: akpm, mhocko, minchan, linux-mm, linux-kernel
Cc: rppt, hillf.zj, shli, aarcange, mgorman, kirill.shutemov,
rientjes, khandual, riel
From: chenjie <chenjie6@huawei.com>
The madvise() system call supported a set of "conventional" advice values,
the MADV_WILLNEED parameter will trigger an infinite loop under direct
access mode(DAX). In DAX mode, the function madvise_vma() will return
directly without updating the pointer [prev].
For example:
Special circumstances:
1、init [ start < vam->vm_start < vam->vm_end < end ]
2、madvise_vma() using MADV_WILLNEED parameter ;
madvise_vma() -> madvise_willneed() -> return 0 && without updating [prev]
=======================================================================
in Function SYSCALL_DEFINE3(madvise,...)
for (;;)
{
//[first loop: start = vam->vm_start < vam->vm_end <end ];
update [start = vma->vm_start | end ]
con0: if (start >= end) //false always;
goto out;
tmp = vma->vm_end;
//do not update [prev] and always return 0;
error = madvise_willneed();
con1: if (error) //false always;
goto out;
//[ vam->vm_start < start = vam->vm_end <end ]
update [start = tmp ]
con2: if (start >= end) //false always ;
goto out;
//because of pointer [prev] did not change,[vma] keep as it was;
update [ vma = prev->vm_next ]
}
=======================================================================
After the first cycle ;it will always keep
[ vam->vm_start < start = vam->vm_end < end ].
since Circulation exit conditions (con{0,1,2}) will never meet ,the
program stuck in infinite loop.
Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
---
mm/madvise.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/madvise.c b/mm/madvise.c
index 21261ff..c355fee 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -294,6 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
#endif
if (IS_DAX(file_inode(file))) {
+ *prev = vma;
/* no bad return value, but ignore advice */
return 0;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-24 2:27 [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances guoxuenan
@ 2017-11-24 8:05 ` Michal Hocko
[not found] ` <829af987-4d65-382c-dbd4-0c81222ebb51@huawei.com>
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2017-11-24 8:05 UTC (permalink / raw)
To: guoxuenan
Cc: akpm, minchan, linux-mm, linux-kernel, rppt, hillf.zj, shli,
aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel
On Fri 24-11-17 10:27:57, guoxuenan wrote:
> From: chenjie <chenjie6@huawei.com>
>
> The madvise() system call supported a set of "conventional" advice values,
> the MADV_WILLNEED parameter will trigger an infinite loop under direct
> access mode(DAX). In DAX mode, the function madvise_vma() will return
> directly without updating the pointer [prev].
>
> For example:
> Special circumstances:
> 1、init [ start < vam->vm_start < vam->vm_end < end ]
> 2、madvise_vma() using MADV_WILLNEED parameter ;
> madvise_vma() -> madvise_willneed() -> return 0 && without updating [prev]
>
> =======================================================================
> in Function SYSCALL_DEFINE3(madvise,...)
>
> for (;;)
> {
> //[first loop: start = vam->vm_start < vam->vm_end <end ];
> update [start = vma->vm_start | end ]
>
> con0: if (start >= end) //false always;
> goto out;
> tmp = vma->vm_end;
>
> //do not update [prev] and always return 0;
> error = madvise_willneed();
>
> con1: if (error) //false always;
> goto out;
>
> //[ vam->vm_start < start = vam->vm_end <end ]
> update [start = tmp ]
>
> con2: if (start >= end) //false always ;
> goto out;
>
> //because of pointer [prev] did not change,[vma] keep as it was;
> update [ vma = prev->vm_next ]
> }
>
> =======================================================================
> After the first cycle ;it will always keep
> [ vam->vm_start < start = vam->vm_end < end ].
> since Circulation exit conditions (con{0,1,2}) will never meet ,the
> program stuck in infinite loop.
Are you sure? Have you tested this? I might be missing something because
madvise code is a bit of a mess but AFAICS prev pointer (updated or not)
will allow to move advance
if (prev)
vma = prev->vm_next;
else /* madvise_remove dropped mmap_sem */
vma = find_vma(current->mm, start);
note that start is vma->vm_end and find_vma will find a vma which
vma_end > addr
So either I am missing something or this code has actaully never worked
for DAX, XIP which I find rather suspicious.
> Signed-off-by: chenjie <chenjie6@huawei.com>
> Signed-off-by: guoxuenan <guoxuenan@huawei.com>
> ---
> mm/madvise.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 21261ff..c355fee 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -294,6 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
> #endif
>
> if (IS_DAX(file_inode(file))) {
> + *prev = vma;
> /* no bad return value, but ignore advice */
> return 0;
> }
> --
> 2.9.5
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
[not found] ` <20171124130803.hafb3zbhy7gdqkvi@dhcp22.suse.cz>
@ 2017-11-25 1:52 ` 郭雪楠
2017-11-27 2:54 ` 郭雪楠
0 siblings, 1 reply; 9+ messages in thread
From: 郭雪楠 @ 2017-11-25 1:52 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
hillf.zj, shli
Yes , your modification is much better! thanks.
在 2017/11/24 21:08, Michal Hocko 写道:
> On Fri 24-11-17 20:51:29, 郭雪楠 wrote:
>> Sorry,I explained wrong before. But,I've tested using trinity in DAX
>> mode,and I'am sure it has possibility of triggering an soft lockup. I have
>> encountered the problem of endless loop here .
>>
>> I had a little problem here,I correct it .
>> under Initial state :
>> [ start = vam->vm_start < vam->vm_end < end ]
>>
>> When [start = vam->vm_start] the program enters for{;;} loop
>> ,find_vma_prev() will set the pointer vma and the pointer prev (prev =
>> vam->vm_prev ). Normally ,madvise_vma() will always move the pointer prev
>> ,but when use DAX mode , it will never update .
> [...]
>> if (prev) // here prev not NULL,it will always enter this branch ..
>> vma = prev->vm_next;
>> else /* madvise_remove dropped mmap_sem */
>> vma = find_vma(current->mm, start);
>
> You are right! My fault, I managed to confuse myself in the code flow.
> It really looks like this has been broken for more than 10 years since
> fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place").
>
> Maybe the following would be more readable and less error prone?
> ---
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 375cf32087e4..a631c414f915 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -276,30 +276,26 @@ static long madvise_willneed(struct vm_area_struct *vma,
> {
> struct file *file = vma->vm_file;
>
> + *prev = vma;
> #ifdef CONFIG_SWAP
> if (!file) {
> - *prev = vma;
> force_swapin_readahead(vma, start, end);
> return 0;
> }
>
> - if (shmem_mapping(file->f_mapping)) {
> - *prev = vma;
> + if (shmem_mapping(file->f_mapping))
> force_shm_swapin_readahead(vma, start, end,
> file->f_mapping);
> return 0;
> - }
> #else
> if (!file)
> return -EBADF;
> #endif
>
> - if (IS_DAX(file_inode(file))) {
> + if (IS_DAX(file_inode(file)))
> /* no bad return value, but ignore advice */
> return 0;
> - }
>
> - *prev = vma;
> start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> if (end > vma->vm_end)
> end = vma->vm_end;
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-25 1:52 ` 郭雪楠
@ 2017-11-27 2:54 ` 郭雪楠
2017-11-27 7:59 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: 郭雪楠 @ 2017-11-27 2:54 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
hillf.zj, shli
Hi,Michal, Whether need me to modify according your modification and
resubmit a new patch?
在 2017/11/25 9:52, 郭雪楠 写道:
> Yes , your modification is much better! thanks.
>
> 在 2017/11/24 21:08, Michal Hocko 写道:
>> On Fri 24-11-17 20:51:29, 郭雪楠 wrote:
>>> Sorry,I explained wrong before. But,I've tested using trinity in DAX
>>> mode,and I'am sure it has possibility of triggering an soft lockup. I
>>> have
>>> encountered the problem of endless loop here .
>>>
>>> I had a little problem here,I correct it .
>>> under Initial state :
>>> [ start = vam->vm_start < vam->vm_end < end ]
>>>
>>> When [start = vam->vm_start] the program enters for{;;} loop
>>> ,find_vma_prev() will set the pointer vma and the pointer prev (prev =
>>> vam->vm_prev ). Normally ,madvise_vma() will always move the pointer
>>> prev
>>> ,but when use DAX mode , it will never update .
>> [...]
>>> if (prev) // here prev not NULL,it will always enter this branch ..
>>> vma = prev->vm_next;
>>> else /* madvise_remove dropped mmap_sem */
>>> vma = find_vma(current->mm, start);
>>
>> You are right! My fault, I managed to confuse myself in the code flow.
>> It really looks like this has been broken for more than 10 years since
>> fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place").
>>
>> Maybe the following would be more readable and less error prone?
>> ---
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 375cf32087e4..a631c414f915 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -276,30 +276,26 @@ static long madvise_willneed(struct
>> vm_area_struct *vma,
>> {
>> struct file *file = vma->vm_file;
>> + *prev = vma;
>> #ifdef CONFIG_SWAP
>> if (!file) {
>> - *prev = vma;
>> force_swapin_readahead(vma, start, end);
>> return 0;
>> }
>> - if (shmem_mapping(file->f_mapping)) {
>> - *prev = vma;
>> + if (shmem_mapping(file->f_mapping))
>> force_shm_swapin_readahead(vma, start, end,
>> file->f_mapping);
>> return 0;
>> - }
>> #else
>> if (!file)
>> return -EBADF;
>> #endif
>> - if (IS_DAX(file_inode(file))) {
>> + if (IS_DAX(file_inode(file)))
>> /* no bad return value, but ignore advice */
>> return 0;
>> - }
>> - *prev = vma;
>> start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> if (end > vma->vm_end)
>> end = vma->vm_end;
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-27 2:54 ` 郭雪楠
@ 2017-11-27 7:59 ` Michal Hocko
0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-11-27 7:59 UTC (permalink / raw)
To: 郭雪楠
Cc: akpm, minchan, linux-mm, linux-kernel, rppt, yi.zhang, miaoxie,
aarcange, mgorman, kirill.shutemov, rientjes, khandual, riel,
hillf.zj, shli
On Mon 27-11-17 10:54:39, 郭雪楠 wrote:
> Hi,Michal, Whether need me to modify according your modification and
> resubmit a new patch?
please do
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-27 11:58 ` Michal Hocko
2017-11-27 12:28 ` guoxuenan
@ 2017-11-27 12:42 ` Mike Rapoport
1 sibling, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27 12:42 UTC (permalink / raw)
To: Michal Hocko
Cc: guoxuenan, akpm, minchan, linux-mm, linux-kernel, yi.zhang,
miaoxie, shli, aarcange, mgorman, kirill.shutemov, rientjes,
khandual, riel
On Mon, Nov 27, 2017 at 12:58:47PM +0100, Michal Hocko wrote:
> On Mon 27-11-17 19:53:18, guoxuenan wrote:
> > From: chenjie <chenjie6@huawei.com>
> >
> > The madvise() system call supported a set of "conventional" advice values,
> > the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
> > direct access mode(DAX).
> >
> > Infinite loop situation:
> > 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
> > 2、madvise_vma() using MADV_WILLNEED parameter;
> > madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
> >
> > In function SYSCALL_DEFINE3(madvise,...)
> > When [start = vam->vm_start] the program enters "for" loop,
> > find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
> > Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
> > it will never update the value of [prev].
> >
> > =======================================================================
> > SYSCALL_DEFINE3(madvise,...)
> > {
> > [...]
> > //start = vam->start => prev=vma->prev
> > vma = find_vma_prev(current->mm, start, &prev);
> > [...]
> > for(;;)
> > {
> > update [start = vma->vm_start]
> >
> > con0: if (start >= end) //false always;
> > goto out;
> > tmp = vma->vm_end;
> >
> > //do not update [prev] and always return 0;
> > error = madvise_willneed();
> >
> > con1: if (error) //false always;
> > goto out;
> >
> > //[ vam->vm_start < start = vam->vm_end <end ]
> > update [start = tmp ]
> >
> > con2: if (start >= end) //false always ;
> > goto out;
> >
> > //because of pointer [prev] did not change,[vma] keep as it was;
> > update [ vma = prev->vm_next ]
> > }
> > [...]
> > }
> > =======================================================================
> > After the first cycle ;it will always keep
> > vam->vm_start < start = vam->vm_end < end && vma = prev->vm_next;
> > since Circulation exit conditions (con{0,1,2}) will never meet ,the
> > program stuck in infinite loop.
>
> I find your changelog a bit hard to parse. What would you think about
> the following:
> "
> MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
> Unfortunatelly madvise_willneed doesn't communicate this information
> properly to the generic madvise syscall implementation. The calling
> converion is quite subtle there. madvise_vma is supposed to either
spelling: "The calling convention"
> return an error or update &prev otherwise the main loop will never
> advance to the next vma and it will keep looping for ever without a way
> to get out of the kernel.
>
> It seems this has been broken since introduced. Nobody has noticed
> because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
>
> Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
> Cc: stable
> "
>
> > Signed-off-by: chenjie <chenjie6@huawei.com>
> > Signed-off-by: guoxuenan <guoxuenan@huawei.com>
>
> Other than that
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> > ---
> > mm/madvise.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 375cf32..751e97a 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
> > {
> > struct file *file = vma->vm_file;
> >
> > + *prev = vma;
> > #ifdef CONFIG_SWAP
> > if (!file) {
> > - *prev = vma;
> > force_swapin_readahead(vma, start, end);
> > return 0;
> > }
> >
> > if (shmem_mapping(file->f_mapping)) {
> > - *prev = vma;
> > force_shm_swapin_readahead(vma, start, end,
> > file->f_mapping);
> > return 0;
> > @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
> > return 0;
> > }
> >
> > - *prev = vma;
> > start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> > if (end > vma->vm_end)
> > end = vma->vm_end;
> > --
> > 2.9.5
> >
>
> --
> Michal Hocko
> SUSE Labs
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-27 11:58 ` Michal Hocko
@ 2017-11-27 12:28 ` guoxuenan
2017-11-27 12:42 ` Mike Rapoport
1 sibling, 0 replies; 9+ messages in thread
From: guoxuenan @ 2017-11-27 12:28 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, minchan, linux-mm, linux-kernel, yi.zhang, miaoxie, rppt,
shli, aarcange, mgorman, kirill.shutemov, rientjes, khandual,
riel
Of course! Thank you, you save my poor english :).
在 2017/11/27 19:58, Michal Hocko 写道:
> On Mon 27-11-17 19:53:18, guoxuenan wrote:
>> From: chenjie <chenjie6@huawei.com>
>>
>> The madvise() system call supported a set of "conventional" advice values,
>> the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
>> direct access mode(DAX).
>>
>> Infinite loop situation:
>> 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
>> 2、madvise_vma() using MADV_WILLNEED parameter;
>> madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
>>
>> In function SYSCALL_DEFINE3(madvise,...)
>> When [start = vam->vm_start] the program enters "for" loop,
>> find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
>> Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
>> it will never update the value of [prev].
>>
>> =======================================================================
>> SYSCALL_DEFINE3(madvise,...)
>> {
>> [...]
>> //start = vam->start => prev=vma->prev
>> vma = find_vma_prev(current->mm, start, &prev);
>> [...]
>> for(;;)
>> {
>> update [start = vma->vm_start]
>>
>> con0: if (start >= end) //false always;
>> goto out;
>> tmp = vma->vm_end;
>>
>> //do not update [prev] and always return 0;
>> error = madvise_willneed();
>>
>> con1: if (error) //false always;
>> goto out;
>>
>> //[ vam->vm_start < start = vam->vm_end <end ]
>> update [start = tmp ]
>>
>> con2: if (start >= end) //false always ;
>> goto out;
>>
>> //because of pointer [prev] did not change,[vma] keep as it was;
>> update [ vma = prev->vm_next ]
>> }
>> [...]
>> }
>> =======================================================================
>> After the first cycle ;it will always keep
>> vam->vm_start < start = vam->vm_end < end && vma = prev->vm_next;
>> since Circulation exit conditions (con{0,1,2}) will never meet ,the
>> program stuck in infinite loop.
>
> I find your changelog a bit hard to parse. What would you think about
> the following:
> "
> MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
> Unfortunatelly madvise_willneed doesn't communicate this information
> properly to the generic madvise syscall implementation. The calling
> converion is quite subtle there. madvise_vma is supposed to either
> return an error or update &prev otherwise the main loop will never
> advance to the next vma and it will keep looping for ever without a way
> to get out of the kernel.
>
> It seems this has been broken since introduced. Nobody has noticed
> because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
>
> Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
> Cc: stable
> "
>
>> Signed-off-by: chenjie <chenjie6@huawei.com>
>> Signed-off-by: guoxuenan <guoxuenan@huawei.com>
>
> Other than that
> Acked-by: Michal Hocko <mhocko@suse.com>
>
>> ---
>> mm/madvise.c | 4 +---
>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 375cf32..751e97a 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
>> {
>> struct file *file = vma->vm_file;
>>
>> + *prev = vma;
>> #ifdef CONFIG_SWAP
>> if (!file) {
>> - *prev = vma;
>> force_swapin_readahead(vma, start, end);
>> return 0;
>> }
>>
>> if (shmem_mapping(file->f_mapping)) {
>> - *prev = vma;
>> force_shm_swapin_readahead(vma, start, end,
>> file->f_mapping);
>> return 0;
>> @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
>> return 0;
>> }
>>
>> - *prev = vma;
>> start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> if (end > vma->vm_end)
>> end = vma->vm_end;
>> --
>> 2.9.5
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
2017-11-27 11:53 guoxuenan
@ 2017-11-27 11:58 ` Michal Hocko
2017-11-27 12:28 ` guoxuenan
2017-11-27 12:42 ` Mike Rapoport
0 siblings, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2017-11-27 11:58 UTC (permalink / raw)
To: guoxuenan
Cc: akpm, minchan, linux-mm, linux-kernel, yi.zhang, miaoxie, rppt,
shli, aarcange, mgorman, kirill.shutemov, rientjes, khandual,
riel
On Mon 27-11-17 19:53:18, guoxuenan wrote:
> From: chenjie <chenjie6@huawei.com>
>
> The madvise() system call supported a set of "conventional" advice values,
> the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
> direct access mode(DAX).
>
> Infinite loop situation:
> 1、initial state [ start = vam->vm_start < vam->vm_end < end ].
> 2、madvise_vma() using MADV_WILLNEED parameter;
> madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
>
> In function SYSCALL_DEFINE3(madvise,...)
> When [start = vam->vm_start] the program enters "for" loop,
> find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
> Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
> it will never update the value of [prev].
>
> =======================================================================
> SYSCALL_DEFINE3(madvise,...)
> {
> [...]
> //start = vam->start => prev=vma->prev
> vma = find_vma_prev(current->mm, start, &prev);
> [...]
> for(;;)
> {
> update [start = vma->vm_start]
>
> con0: if (start >= end) //false always;
> goto out;
> tmp = vma->vm_end;
>
> //do not update [prev] and always return 0;
> error = madvise_willneed();
>
> con1: if (error) //false always;
> goto out;
>
> //[ vam->vm_start < start = vam->vm_end <end ]
> update [start = tmp ]
>
> con2: if (start >= end) //false always ;
> goto out;
>
> //because of pointer [prev] did not change,[vma] keep as it was;
> update [ vma = prev->vm_next ]
> }
> [...]
> }
> =======================================================================
> After the first cycle ;it will always keep
> vam->vm_start < start = vam->vm_end < end && vma = prev->vm_next;
> since Circulation exit conditions (con{0,1,2}) will never meet ,the
> program stuck in infinite loop.
I find your changelog a bit hard to parse. What would you think about
the following:
"
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunatelly madvise_willneed doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
converion is quite subtle there. madvise_vma is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduced. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Cc: stable
"
> Signed-off-by: chenjie <chenjie6@huawei.com>
> Signed-off-by: guoxuenan <guoxuenan@huawei.com>
Other than that
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/madvise.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 375cf32..751e97a 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
> {
> struct file *file = vma->vm_file;
>
> + *prev = vma;
> #ifdef CONFIG_SWAP
> if (!file) {
> - *prev = vma;
> force_swapin_readahead(vma, start, end);
> return 0;
> }
>
> if (shmem_mapping(file->f_mapping)) {
> - *prev = vma;
> force_shm_swapin_readahead(vma, start, end,
> file->f_mapping);
> return 0;
> @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
> return 0;
> }
>
> - *prev = vma;
> start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> if (end > vma->vm_end)
> end = vma->vm_end;
> --
> 2.9.5
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances.
@ 2017-11-27 11:53 guoxuenan
2017-11-27 11:58 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: guoxuenan @ 2017-11-27 11:53 UTC (permalink / raw)
To: akpm, mhocko, minchan, linux-mm, linux-kernel
Cc: yi.zhang, miaoxie, rppt, shli, aarcange, mgorman,
kirill.shutemov, rientjes, khandual, riel
From: chenjie <chenjie6@huawei.com>
The madvise() system call supported a set of "conventional" advice values,
the MADV_WILLNEED parameter has possibility of triggering an infinite loop under
direct access mode(DAX).
Infinite loop situation:
1、initial state [ start = vam->vm_start < vam->vm_end < end ].
2、madvise_vma() using MADV_WILLNEED parameter;
madvise_vma() -> madvise_willneed() -> return 0 && the value of [prev] is not updated.
In function SYSCALL_DEFINE3(madvise,...)
When [start = vam->vm_start] the program enters "for" loop,
find_vma_prev() will set the pointer vma and the pointer prev(prev = vam->vm_prev).
Normally ,madvise_vma() will always move the pointer prev ,but when use DAX mode,
it will never update the value of [prev].
=======================================================================
SYSCALL_DEFINE3(madvise,...)
{
[...]
//start = vam->start => prev=vma->prev
vma = find_vma_prev(current->mm, start, &prev);
[...]
for(;;)
{
update [start = vma->vm_start]
con0: if (start >= end) //false always;
goto out;
tmp = vma->vm_end;
//do not update [prev] and always return 0;
error = madvise_willneed();
con1: if (error) //false always;
goto out;
//[ vam->vm_start < start = vam->vm_end <end ]
update [start = tmp ]
con2: if (start >= end) //false always ;
goto out;
//because of pointer [prev] did not change,[vma] keep as it was;
update [ vma = prev->vm_next ]
}
[...]
}
=======================================================================
After the first cycle ;it will always keep
vam->vm_start < start = vam->vm_end < end && vma = prev->vm_next;
since Circulation exit conditions (con{0,1,2}) will never meet ,the
program stuck in infinite loop.
Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 375cf32..751e97a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma,
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma,
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
--
2.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-11-27 12:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-24 2:27 [PATCH] mm,madvise: bugfix of madvise systemcall infinite loop under special circumstances guoxuenan
2017-11-24 8:05 ` Michal Hocko
[not found] ` <829af987-4d65-382c-dbd4-0c81222ebb51@huawei.com>
[not found] ` <20171124130803.hafb3zbhy7gdqkvi@dhcp22.suse.cz>
2017-11-25 1:52 ` 郭雪楠
2017-11-27 2:54 ` 郭雪楠
2017-11-27 7:59 ` Michal Hocko
2017-11-27 11:53 guoxuenan
2017-11-27 11:58 ` Michal Hocko
2017-11-27 12:28 ` guoxuenan
2017-11-27 12:42 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).