All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: fix sleeping function warning in alloc_swap_info
@ 2019-01-29  7:21 Jiufei Xue
  2019-01-29  8:53 ` Aaron Lu
  2019-01-29 11:43 ` Tetsuo Handa
  0 siblings, 2 replies; 21+ messages in thread
From: Jiufei Xue @ 2019-01-29  7:21 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, joseph.qi

Trinity reports BUG:

sleeping function called from invalid context at mm/vmalloc.c:1477
in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1

[ 2748.573460] Call Trace:
[ 2748.575935]  dump_stack+0x91/0xeb
[ 2748.578512]  ___might_sleep+0x21c/0x250
[ 2748.581090]  remove_vm_area+0x1d/0x90
[ 2748.583637]  __vunmap+0x76/0x100
[ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
[ 2748.598973]  do_syscall_64+0x60/0x210
[ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is triggered by calling kvfree() inside spinlock() section in
function alloc_swap_info().
Fix this by moving the kvfree() after spin_unlock().

Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
Cc: <stable@vger.kernel.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
---
 mm/swapfile.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index dbac1d49469d..d26c9eac3d64 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check);
 
 static struct swap_info_struct *alloc_swap_info(void)
 {
-	struct swap_info_struct *p;
+	struct swap_info_struct *p, *tmp = NULL;
 	unsigned int type;
 	int i;
 	int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);
@@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void)
 		smp_wmb();
 		nr_swapfiles++;
 	} else {
-		kvfree(p);
+		tmp = p;
 		p = swap_info[type];
 		/*
 		 * Do not memset this entry: a racing procfs swap_next()
@@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void)
 		plist_node_init(&p->avail_lists[i], 0);
 	p->flags = SWP_USED;
 	spin_unlock(&swap_lock);
+	kvfree(tmp);
+
 	spin_lock_init(&p->lock);
 	spin_lock_init(&p->cont_lock);
 
-- 
2.19.1.856.g8858448bb


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29  7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue
@ 2019-01-29  8:53 ` Aaron Lu
  2019-01-29 10:43   ` Joseph Qi
  2019-01-29 11:43 ` Tetsuo Handa
  1 sibling, 1 reply; 21+ messages in thread
From: Aaron Lu @ 2019-01-29  8:53 UTC (permalink / raw)
  To: Jiufei Xue, akpm; +Cc: linux-mm, joseph.qi, Michal Hocko, Vasily Averin

On 2019/1/29 15:21, Jiufei Xue wrote:
> Trinity reports BUG:
> 
> sleeping function called from invalid context at mm/vmalloc.c:1477
> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> 
> [ 2748.573460] Call Trace:
> [ 2748.575935]  dump_stack+0x91/0xeb
> [ 2748.578512]  ___might_sleep+0x21c/0x250
> [ 2748.581090]  remove_vm_area+0x1d/0x90
> [ 2748.583637]  __vunmap+0x76/0x100
> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> [ 2748.598973]  do_syscall_64+0x60/0x210
> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This is triggered by calling kvfree() inside spinlock() section in
> function alloc_swap_info().
> Fix this by moving the kvfree() after spin_unlock().

The fix looks good to me.

BTW, swap_info_struct's size has been reduced to its original size:
272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for
avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree
in that commit since I don't see any any harm by keep using
kvzalloc/kvfree, but now looks like they're causing some trouble.

So what about using back kzalloc/kfree for swap_info_struct instead?
Can save one local variable and using kvzalloc/kvfree for a struct
that is 272 bytes doesn't really have any benefit.

Thanks,
Aaron

> 
> Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
> Cc: <stable@vger.kernel.org>
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> ---
>  mm/swapfile.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index dbac1d49469d..d26c9eac3d64 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check);
>  
>  static struct swap_info_struct *alloc_swap_info(void)
>  {
> -	struct swap_info_struct *p;
> +	struct swap_info_struct *p, *tmp = NULL;
>  	unsigned int type;
>  	int i;
>  	int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);
> @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void)
>  		smp_wmb();
>  		nr_swapfiles++;
>  	} else {
> -		kvfree(p);
> +		tmp = p;
>  		p = swap_info[type];
>  		/*
>  		 * Do not memset this entry: a racing procfs swap_next()
> @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void)
>  		plist_node_init(&p->avail_lists[i], 0);
>  	p->flags = SWP_USED;
>  	spin_unlock(&swap_lock);
> +	kvfree(tmp);
> +
>  	spin_lock_init(&p->lock);
>  	spin_lock_init(&p->cont_lock);
>  
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29  8:53 ` Aaron Lu
@ 2019-01-29 10:43   ` Joseph Qi
  2019-01-29 11:19     ` Aaron Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Joseph Qi @ 2019-01-29 10:43 UTC (permalink / raw)
  To: Aaron Lu, Jiufei Xue, akpm; +Cc: linux-mm, Michal Hocko, Vasily Averin

Hi,

On 19/1/29 16:53, Aaron Lu wrote:
> On 2019/1/29 15:21, Jiufei Xue wrote:
>> Trinity reports BUG:
>>
>> sleeping function called from invalid context at mm/vmalloc.c:1477
>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
>>
>> [ 2748.573460] Call Trace:
>> [ 2748.575935]  dump_stack+0x91/0xeb
>> [ 2748.578512]  ___might_sleep+0x21c/0x250
>> [ 2748.581090]  remove_vm_area+0x1d/0x90
>> [ 2748.583637]  __vunmap+0x76/0x100
>> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
>> [ 2748.598973]  do_syscall_64+0x60/0x210
>> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> This is triggered by calling kvfree() inside spinlock() section in
>> function alloc_swap_info().
>> Fix this by moving the kvfree() after spin_unlock().
> 
> The fix looks good to me.
> 
> BTW, swap_info_struct's size has been reduced to its original size:
> 272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for
> avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree
> in that commit since I don't see any any harm by keep using
> kvzalloc/kvfree, but now looks like they're causing some trouble.
> 
> So what about using back kzalloc/kfree for swap_info_struct instead?
> Can save one local variable and using kvzalloc/kvfree for a struct
> that is 272 bytes doesn't really have any benefit.
> 
avail_lists in swap_info_struct is dynamic allocated.
So if we use back kzalloc/kfree, how to deal with the case that
nr_node_ids is big?

Thanks,
Joseph

> Thanks,
> Aaron
> 
>>
>> Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
>> Cc: <stable@vger.kernel.org>
>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
>> ---
>>  mm/swapfile.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index dbac1d49469d..d26c9eac3d64 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check);
>>  
>>  static struct swap_info_struct *alloc_swap_info(void)
>>  {
>> -	struct swap_info_struct *p;
>> +	struct swap_info_struct *p, *tmp = NULL;
>>  	unsigned int type;
>>  	int i;
>>  	int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);
>> @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  		smp_wmb();
>>  		nr_swapfiles++;
>>  	} else {
>> -		kvfree(p);
>> +		tmp = p;
>>  		p = swap_info[type];
>>  		/*
>>  		 * Do not memset this entry: a racing procfs swap_next()
>> @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  		plist_node_init(&p->avail_lists[i], 0);
>>  	p->flags = SWP_USED;
>>  	spin_unlock(&swap_lock);
>> +	kvfree(tmp);
>> +
>>  	spin_lock_init(&p->lock);
>>  	spin_lock_init(&p->cont_lock);
>>  
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29 10:43   ` Joseph Qi
@ 2019-01-29 11:19     ` Aaron Lu
  0 siblings, 0 replies; 21+ messages in thread
From: Aaron Lu @ 2019-01-29 11:19 UTC (permalink / raw)
  To: Joseph Qi; +Cc: Jiufei Xue, akpm, linux-mm, Michal Hocko, Vasily Averin

On Tue, Jan 29, 2019 at 06:43:53PM +0800, Joseph Qi wrote:
> Hi,
> 
> On 19/1/29 16:53, Aaron Lu wrote:
> > On 2019/1/29 15:21, Jiufei Xue wrote:
> >> Trinity reports BUG:
> >>
> >> sleeping function called from invalid context at mm/vmalloc.c:1477
> >> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> >>
> >> [ 2748.573460] Call Trace:
> >> [ 2748.575935]  dump_stack+0x91/0xeb
> >> [ 2748.578512]  ___might_sleep+0x21c/0x250
> >> [ 2748.581090]  remove_vm_area+0x1d/0x90
> >> [ 2748.583637]  __vunmap+0x76/0x100
> >> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> >> [ 2748.598973]  do_syscall_64+0x60/0x210
> >> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> This is triggered by calling kvfree() inside spinlock() section in
> >> function alloc_swap_info().
> >> Fix this by moving the kvfree() after spin_unlock().
> > 
> > The fix looks good to me.
> > 
> > BTW, swap_info_struct's size has been reduced to its original size:
> > 272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for
> > avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree
> > in that commit since I don't see any any harm by keep using
> > kvzalloc/kvfree, but now looks like they're causing some trouble.
> > 
> > So what about using back kzalloc/kfree for swap_info_struct instead?
> > Can save one local variable and using kvzalloc/kvfree for a struct
> > that is 272 bytes doesn't really have any benefit.
> > 
> avail_lists in swap_info_struct is dynamic allocated.
> So if we use back kzalloc/kfree, how to deal with the case that
> nr_node_ids is big?

Oh right, I missed that.

Acked-by: Aaron Lu <aaron.lu@linux.alibaba.com>
 
Thanks,
Aaron
 
> >>
> >> Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
> >> Cc: <stable@vger.kernel.org>
> >> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> >> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> >> ---
> >>  mm/swapfile.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> index dbac1d49469d..d26c9eac3d64 100644
> >> --- a/mm/swapfile.c
> >> +++ b/mm/swapfile.c
> >> @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check);
> >>  
> >>  static struct swap_info_struct *alloc_swap_info(void)
> >>  {
> >> -	struct swap_info_struct *p;
> >> +	struct swap_info_struct *p, *tmp = NULL;
> >>  	unsigned int type;
> >>  	int i;
> >>  	int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);
> >> @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void)
> >>  		smp_wmb();
> >>  		nr_swapfiles++;
> >>  	} else {
> >> -		kvfree(p);
> >> +		tmp = p;
> >>  		p = swap_info[type];
> >>  		/*
> >>  		 * Do not memset this entry: a racing procfs swap_next()
> >> @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void)
> >>  		plist_node_init(&p->avail_lists[i], 0);
> >>  	p->flags = SWP_USED;
> >>  	spin_unlock(&swap_lock);
> >> +	kvfree(tmp);
> >> +
> >>  	spin_lock_init(&p->lock);
> >>  	spin_lock_init(&p->cont_lock);
> >>  
> >>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29  7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue
  2019-01-29  8:53 ` Aaron Lu
@ 2019-01-29 11:43 ` Tetsuo Handa
  2019-01-29 19:13   ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2019-01-29 11:43 UTC (permalink / raw)
  To: Jiufei Xue, akpm; +Cc: linux-mm, joseph.qi

On 2019/01/29 16:21, Jiufei Xue wrote:
> Trinity reports BUG:
> 
> sleeping function called from invalid context at mm/vmalloc.c:1477
> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> 
> [ 2748.573460] Call Trace:
> [ 2748.575935]  dump_stack+0x91/0xeb
> [ 2748.578512]  ___might_sleep+0x21c/0x250
> [ 2748.581090]  remove_vm_area+0x1d/0x90
> [ 2748.583637]  __vunmap+0x76/0x100
> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> [ 2748.598973]  do_syscall_64+0x60/0x210
> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This is triggered by calling kvfree() inside spinlock() section in
> function alloc_swap_info().
> Fix this by moving the kvfree() after spin_unlock().
> 

Excuse me? But isn't kvfree() safe to be called with spinlock held?

There was no context explanation regarding kvfree() for 4.18.20.
4.19.18 says

  Context: Any context except NMI.

and 4.20.5 says

  Context: Either preemptible task context or not-NMI interrupt.

. There might be users who didn't notice this change.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29 11:43 ` Tetsuo Handa
@ 2019-01-29 19:13   ` Andrew Morton
  2019-01-29 21:12     ` Tetsuo Handa
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2019-01-29 19:13 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Jiufei Xue, linux-mm, joseph.qi

On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> On 2019/01/29 16:21, Jiufei Xue wrote:
> > Trinity reports BUG:
> > 
> > sleeping function called from invalid context at mm/vmalloc.c:1477
> > in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> > 
> > [ 2748.573460] Call Trace:
> > [ 2748.575935]  dump_stack+0x91/0xeb
> > [ 2748.578512]  ___might_sleep+0x21c/0x250
> > [ 2748.581090]  remove_vm_area+0x1d/0x90
> > [ 2748.583637]  __vunmap+0x76/0x100
> > [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> > [ 2748.598973]  do_syscall_64+0x60/0x210
> > [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > This is triggered by calling kvfree() inside spinlock() section in
> > function alloc_swap_info().
> > Fix this by moving the kvfree() after spin_unlock().
> > 
> 
> Excuse me? But isn't kvfree() safe to be called with spinlock held?

Yes, I'm having trouble spotting where kvfree() can sleep.  Perhaps it
*used* to sleep on mutex_lock(vmap_purge_lock), but
try_purge_vmap_area_lazy() is using mutex_trylock().  Confused.

kvfree() darn well *shouldn't* sleep!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29 19:13   ` Andrew Morton
@ 2019-01-29 21:12     ` Tetsuo Handa
  2019-01-29 21:51       ` Yang Shi
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2019-01-29 21:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jiufei Xue, linux-mm, joseph.qi, Linus Torvalds

On 2019/01/30 4:13, Andrew Morton wrote:
> On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> On 2019/01/29 16:21, Jiufei Xue wrote:
>>> Trinity reports BUG:
>>>
>>> sleeping function called from invalid context at mm/vmalloc.c:1477
>>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
>>>
>>> [ 2748.573460] Call Trace:
>>> [ 2748.575935]  dump_stack+0x91/0xeb
>>> [ 2748.578512]  ___might_sleep+0x21c/0x250
>>> [ 2748.581090]  remove_vm_area+0x1d/0x90
>>> [ 2748.583637]  __vunmap+0x76/0x100
>>> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
>>> [ 2748.598973]  do_syscall_64+0x60/0x210
>>> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>
>>> This is triggered by calling kvfree() inside spinlock() section in
>>> function alloc_swap_info().
>>> Fix this by moving the kvfree() after spin_unlock().
>>>
>>
>> Excuse me? But isn't kvfree() safe to be called with spinlock held?
> 
> Yes, I'm having trouble spotting where kvfree() can sleep.  Perhaps it
> *used* to sleep on mutex_lock(vmap_purge_lock), but
> try_purge_vmap_area_lazy() is using mutex_trylock().  Confused.
> 
> kvfree() darn well *shouldn't* sleep!
> 

If I recall correctly, there was an attempt to allow vfree() to sleep
but that attempt failed, and the change to allow vfree() to sleep was
reverted. Thus, vfree() had been "Context: Any context except NMI.".

If we want to allow vfree() to sleep, at least we need to test with
kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
vmalloc()/vfree() path). For now, reverting the 
"Context: Either preemptible task context or not-NMI interrupt." change
will be needed for stable kernels.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29 21:12     ` Tetsuo Handa
@ 2019-01-29 21:51       ` Yang Shi
  2019-01-30  0:42         ` Tetsuo Handa
  0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-01-29 21:51 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds

On Tue, Jan 29, 2019 at 1:12 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/01/30 4:13, Andrew Morton wrote:
> > On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >
> >> On 2019/01/29 16:21, Jiufei Xue wrote:
> >>> Trinity reports BUG:
> >>>
> >>> sleeping function called from invalid context at mm/vmalloc.c:1477
> >>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> >>>
> >>> [ 2748.573460] Call Trace:
> >>> [ 2748.575935]  dump_stack+0x91/0xeb
> >>> [ 2748.578512]  ___might_sleep+0x21c/0x250
> >>> [ 2748.581090]  remove_vm_area+0x1d/0x90
> >>> [ 2748.583637]  __vunmap+0x76/0x100
> >>> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> >>> [ 2748.598973]  do_syscall_64+0x60/0x210
> >>> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>
> >>> This is triggered by calling kvfree() inside spinlock() section in
> >>> function alloc_swap_info().
> >>> Fix this by moving the kvfree() after spin_unlock().
> >>>
> >>
> >> Excuse me? But isn't kvfree() safe to be called with spinlock held?
> >
> > Yes, I'm having trouble spotting where kvfree() can sleep.  Perhaps it
> > *used* to sleep on mutex_lock(vmap_purge_lock), but
> > try_purge_vmap_area_lazy() is using mutex_trylock().  Confused.
> >
> > kvfree() darn well *shouldn't* sleep!
> >
>
> If I recall correctly, there was an attempt to allow vfree() to sleep
> but that attempt failed, and the change to allow vfree() to sleep was
> reverted. Thus, vfree() had been "Context: Any context except NMI.".
>
> If we want to allow vfree() to sleep, at least we need to test with
> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
> vmalloc()/vfree() path). For now, reverting the
> "Context: Either preemptible task context or not-NMI interrupt." change
> will be needed for stable kernels.

So, the comment for vfree "May sleep if called *not* from interrupt
context." is wrong?

Thanks,
Yang

>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-29 21:51       ` Yang Shi
@ 2019-01-30  0:42         ` Tetsuo Handa
  2019-01-30  1:01           ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2019-01-30  0:42 UTC (permalink / raw)
  To: Yang Shi; +Cc: Andrew Morton, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds

Yang Shi wrote:
> On Tue, Jan 29, 2019 at 1:12 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/01/30 4:13, Andrew Morton wrote:
> > > On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > >
> > >> On 2019/01/29 16:21, Jiufei Xue wrote:
> > >>> Trinity reports BUG:
> > >>>
> > >>> sleeping function called from invalid context at mm/vmalloc.c:1477
> > >>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1
> > >>>
> > >>> [ 2748.573460] Call Trace:
> > >>> [ 2748.575935]  dump_stack+0x91/0xeb
> > >>> [ 2748.578512]  ___might_sleep+0x21c/0x250
> > >>> [ 2748.581090]  remove_vm_area+0x1d/0x90
> > >>> [ 2748.583637]  __vunmap+0x76/0x100
> > >>> [ 2748.586120]  __se_sys_swapon+0xb9a/0x1220
> > >>> [ 2748.598973]  do_syscall_64+0x60/0x210
> > >>> [ 2748.601439]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >>>
> > >>> This is triggered by calling kvfree() inside spinlock() section in
> > >>> function alloc_swap_info().
> > >>> Fix this by moving the kvfree() after spin_unlock().
> > >>>
> > >>
> > >> Excuse me? But isn't kvfree() safe to be called with spinlock held?
> > >
> > > Yes, I'm having trouble spotting where kvfree() can sleep.  Perhaps it
> > > *used* to sleep on mutex_lock(vmap_purge_lock), but
> > > try_purge_vmap_area_lazy() is using mutex_trylock().  Confused.
> > >
> > > kvfree() darn well *shouldn't* sleep!
> > >
> >
> > If I recall correctly, there was an attempt to allow vfree() to sleep
> > but that attempt failed, and the change to allow vfree() to sleep was
> > reverted. Thus, vfree() had been "Context: Any context except NMI.".

That attempt was not reverted. Instead vfree_atomic() was added.

> >
> > If we want to allow vfree() to sleep, at least we need to test with
> > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
> > vmalloc()/vfree() path). For now, reverting the
> > "Context: Either preemptible task context or not-NMI interrupt." change
> > will be needed for stable kernels.
> 
> So, the comment for vfree "May sleep if called *not* from interrupt
> context." is wrong?

Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says

    We are going to use sleeping lock for freeing vmap.  However some
    vfree() users want to free memory from atomic (but not from interrupt)
    context.  For this we add vfree_atomic() - deferred variation of vfree()
    which can be used in any atomic context (except NMIs).

and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made

    - * Context: Any context except NMI.
    + * Context: Either preemptible task context or not-NMI interrupt.

change. But I think that we converted kmalloc() to kvmalloc() without checking
context of kvfree() callers. Therefore, I think that kvfree() needs to use
vfree_atomic() rather than just saying "vfree() might sleep if called not in
interrupt context."...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  0:42         ` Tetsuo Handa
@ 2019-01-30  1:01           ` Andrew Morton
  2019-01-30  1:11             ` Linus Torvalds
  2019-03-07 14:43             ` Aaron Lu
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2019-01-30  1:01 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds

On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> > >
> > > If we want to allow vfree() to sleep, at least we need to test with
> > > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
> > > vmalloc()/vfree() path). For now, reverting the
> > > "Context: Either preemptible task context or not-NMI interrupt." change
> > > will be needed for stable kernels.
> > 
> > So, the comment for vfree "May sleep if called *not* from interrupt
> > context." is wrong?
> 
> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
> 
>     We are going to use sleeping lock for freeing vmap.  However some
>     vfree() users want to free memory from atomic (but not from interrupt)
>     context.  For this we add vfree_atomic() - deferred variation of vfree()
>     which can be used in any atomic context (except NMIs).
> 
> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
> 
>     - * Context: Any context except NMI.
>     + * Context: Either preemptible task context or not-NMI interrupt.
> 
> change. But I think that we converted kmalloc() to kvmalloc() without checking
> context of kvfree() callers. Therefore, I think that kvfree() needs to use
> vfree_atomic() rather than just saying "vfree() might sleep if called not in
> interrupt context."...

Whereabouts in the vfree() path can the kernel sleep?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  1:01           ` Andrew Morton
@ 2019-01-30  1:11             ` Linus Torvalds
  2019-01-30  1:23               ` Linus Torvalds
  2019-03-07 14:43             ` Aaron Lu
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2019-01-30  1:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi

On Tue, Jan 29, 2019 at 5:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> >
> >     - * Context: Any context except NMI.
> >     + * Context: Either preemptible task context or not-NMI interrupt.
>
> Whereabouts in the vfree() path can the kernel sleep?

Note that it's not necessarily about *sleeping*.

One thing that vfree() really fundamentally should do is to flush
TLB's. And you must not do a cross-TLB flush with interrupts disabled.

NOTE! Right now, I think we do lazy TLB flushing, so the flush
actually is delayed until the vmalloc() when the address rolls around
in the vmalloc address space. But there really are very real and
obvious reasons why we might want to do it at vfree time.

So I'd honestly be a whole lot happier with vmalloc/vfree being
process context only. Or at least with with interrupts enabled (so
swirq/BH context would be fine, but an actual interrupt not so).

Again, this is not about sleeping. But the end result is almost the
same: we really should strive to not do vfree() in interrupt context.

                Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  1:11             ` Linus Torvalds
@ 2019-01-30  1:23               ` Linus Torvalds
  2019-01-30  2:54                 ` Tetsuo Handa
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2019-01-30  1:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi

On Tue, Jan 29, 2019 at 5:11 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Again, this is not about sleeping. But the end result is almost the
> same: we really should strive to not do vfree() in interrupt context.

Note that we currently test the wrong thing for this: we actually
check "in_interrupt()" for the deferred case, which certainly works in
practice, and protects from deadlocks (we need vmap_area_lock for the
free-area handling)

But it doesn't actually end up testing the "oops, interrupts are
disabled in process context" issue. The "might_sleep()" check _does_
check that, iirc.

Which - as mentioned - is fine because we currently don't actually do
the TLB flush synchronously, but it's worth noting again. "vfree()"
really is a *lot* different from "kfree()". It's unsafe in all kinds
of special ways, and the locking difference is just part of it.

So whatever might_sleep() has found might be a potential real issue at
some point...

              Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  1:23               ` Linus Torvalds
@ 2019-01-30  2:54                 ` Tetsuo Handa
  2019-01-30 17:18                   ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2019-01-30  2:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Yang Shi, Jiufei Xue, Linux MM, joseph.qi

Andrew Morton wrote:
> > change. But I think that we converted kmalloc() to kvmalloc() without checking
> > context of kvfree() callers. Therefore, I think that kvfree() needs to use
> > vfree_atomic() rather than just saying "vfree() might sleep if called not in
> > interrupt context."...
> 
> Whereabouts in the vfree() path can the kernel sleep?

Indeed. Although __vunmap() must not be called from interrupt context because
mutex_trylock()/mutex_unlock() from try_purge_vmap_area_lazy() from
free_vmap_area_noflush() from free_unmap_vmap_area() from remove_vm_area() from
__vunmap() cannot be called from interrupt context, it seems that there is no
location that does sleeping operation.

Linus Torvalds wrote:
> Which - as mentioned - is fine because we currently don't actually do
> the TLB flush synchronously, but it's worth noting again. "vfree()"
> really is a *lot* different from "kfree()". It's unsafe in all kinds
> of special ways, and the locking difference is just part of it.
> 
> So whatever might_sleep() has found might be a potential real issue at
> some point...

Then, do we automatically defer vfree() to mm_percpu_wq context?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  2:54                 ` Tetsuo Handa
@ 2019-01-30 17:18                   ` Linus Torvalds
  2019-01-30 22:13                     ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2019-01-30 17:18 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Yang Shi, Jiufei Xue, Linux MM, joseph.qi

On Tue, Jan 29, 2019 at 6:54 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Then, do we automatically defer vfree() to mm_percpu_wq context?

We might do that, and say "if you call vfree with interrupts disabled,
it gets deferred".

That said, the deferred case should generally not be a common case
either. It has real issues, one of which is simply that particularly
on 32-bit architectures we can run out of vmalloc space even normally,
and if there are loads that do a lot of allocation and then deferred
frees, that problem could become really bad.

So I'd almost be happier having a warning if we end up doing the TLB
flush and defer. At least to find *what* people do.

And I do wonder if we should just always warn, and have that
"might_sleep()", and simply say "if you do vfree from interrupts or
with interrupts disabled, you really should be aware of these kinds of
issues, and you really should *show* that you are aware by using
vfree_atomic()".

                     Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30 17:18                   ` Linus Torvalds
@ 2019-01-30 22:13                     ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2019-01-30 22:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi

On Wed, 30 Jan 2019 09:18:20 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jan 29, 2019 at 6:54 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > Then, do we automatically defer vfree() to mm_percpu_wq context?
> 
> We might do that, and say "if you call vfree with interrupts disabled,
> it gets deferred".
> 
> That said, the deferred case should generally not be a common case
> either. It has real issues, one of which is simply that particularly
> on 32-bit architectures we can run out of vmalloc space even normally,
> and if there are loads that do a lot of allocation and then deferred
> frees, that problem could become really bad.
> 
> So I'd almost be happier having a warning if we end up doing the TLB
> flush and defer. At least to find *what* people do.
> 
> And I do wonder if we should just always warn, and have that
> "might_sleep()", and simply say "if you do vfree from interrupts or
> with interrupts disabled, you really should be aware of these kinds of
> issues, and you really should *show* that you are aware by using
> vfree_atomic()".
> 

Agree.  if (irqs_disabled()) {warn_once; punt_to_workqueue} is the way
to go here.  vfree() should be callable from spinlocked code and
might_sleep() is an inappropriate check.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-01-30  1:01           ` Andrew Morton
  2019-01-30  1:11             ` Linus Torvalds
@ 2019-03-07 14:43             ` Aaron Lu
  2019-03-07 14:47               ` Andrey Ryabinin
  1 sibling, 1 reply; 21+ messages in thread
From: Aaron Lu @ 2019-03-07 14:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi,
	Linus Torvalds, Andrey Ryabinin

On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote:
> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> 
> > > >
> > > > If we want to allow vfree() to sleep, at least we need to test with
> > > > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
> > > > vmalloc()/vfree() path). For now, reverting the
> > > > "Context: Either preemptible task context or not-NMI interrupt." change
> > > > will be needed for stable kernels.
> > > 
> > > So, the comment for vfree "May sleep if called *not* from interrupt
> > > context." is wrong?
> > 
> > Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
> > 
> >     We are going to use sleeping lock for freeing vmap.  However some
> >     vfree() users want to free memory from atomic (but not from interrupt)
> >     context.  For this we add vfree_atomic() - deferred variation of vfree()
> >     which can be used in any atomic context (except NMIs).
> > 
> > and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
> > 
> >     - * Context: Any context except NMI.
> >     + * Context: Either preemptible task context or not-NMI interrupt.
> > 
> > change. But I think that we converted kmalloc() to kvmalloc() without checking
> > context of kvfree() callers. Therefore, I think that kvfree() needs to use
> > vfree_atomic() rather than just saying "vfree() might sleep if called not in
> > interrupt context."...
> 
> Whereabouts in the vfree() path can the kernel sleep?

(Sorry for the late reply.)

Adding Andrey Ryabinin, author of commit 52414d3302577bb6
("kvfree(): fix misleading comment"), maybe Andrey remembers
where vfree() can sleep.

In the meantime, does "cond_resched_lock(&vmap_area_lock);" in
__purge_vmap_area_lazy() count as a sleep point?
__purge_vmap_area_lazy() can be called if mutex_trylock on
vmap_purge_lock succeed.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-03-07 14:43             ` Aaron Lu
@ 2019-03-07 14:47               ` Andrey Ryabinin
  2019-03-07 15:24                 ` Aaron Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Andrey Ryabinin @ 2019-03-07 14:47 UTC (permalink / raw)
  To: Aaron Lu, Andrew Morton
  Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds



On 3/7/19 5:43 PM, Aaron Lu wrote:
> On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote:
>> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>>>>>
>>>>> If we want to allow vfree() to sleep, at least we need to test with
>>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
>>>>> vmalloc()/vfree() path). For now, reverting the
>>>>> "Context: Either preemptible task context or not-NMI interrupt." change
>>>>> will be needed for stable kernels.
>>>>
>>>> So, the comment for vfree "May sleep if called *not* from interrupt
>>>> context." is wrong?
>>>
>>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
>>>
>>>     We are going to use sleeping lock for freeing vmap.  However some
>>>     vfree() users want to free memory from atomic (but not from interrupt)
>>>     context.  For this we add vfree_atomic() - deferred variation of vfree()
>>>     which can be used in any atomic context (except NMIs).
>>>
>>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
>>>
>>>     - * Context: Any context except NMI.
>>>     + * Context: Either preemptible task context or not-NMI interrupt.
>>>
>>> change. But I think that we converted kmalloc() to kvmalloc() without checking
>>> context of kvfree() callers. Therefore, I think that kvfree() needs to use
>>> vfree_atomic() rather than just saying "vfree() might sleep if called not in
>>> interrupt context."...
>>
>> Whereabouts in the vfree() path can the kernel sleep?
> 
> (Sorry for the late reply.)
> 
> Adding Andrey Ryabinin, author of commit 52414d3302577bb6
> ("kvfree(): fix misleading comment"), maybe Andrey remembers
> where vfree() can sleep.
> 
> In the meantime, does "cond_resched_lock(&vmap_area_lock);" in
> __purge_vmap_area_lazy() count as a sleep point?

Yes, this is the place (the only one) where vfree() can sleep.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-03-07 14:47               ` Andrey Ryabinin
@ 2019-03-07 15:24                 ` Aaron Lu
  2019-03-07 16:33                   ` Andrey Ryabinin
  0 siblings, 1 reply; 21+ messages in thread
From: Aaron Lu @ 2019-03-07 15:24 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM,
	joseph.qi, Linus Torvalds

On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 3/7/19 5:43 PM, Aaron Lu wrote:
> > On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote:
> >> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >>>>>
> >>>>> If we want to allow vfree() to sleep, at least we need to test with
> >>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
> >>>>> vmalloc()/vfree() path). For now, reverting the
> >>>>> "Context: Either preemptible task context or not-NMI interrupt." change
> >>>>> will be needed for stable kernels.
> >>>>
> >>>> So, the comment for vfree "May sleep if called *not* from interrupt
> >>>> context." is wrong?
> >>>
> >>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
> >>>
> >>>     We are going to use sleeping lock for freeing vmap.  However some
> >>>     vfree() users want to free memory from atomic (but not from interrupt)
> >>>     context.  For this we add vfree_atomic() - deferred variation of vfree()
> >>>     which can be used in any atomic context (except NMIs).
> >>>
> >>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
> >>>
> >>>     - * Context: Any context except NMI.
> >>>     + * Context: Either preemptible task context or not-NMI interrupt.
> >>>
> >>> change. But I think that we converted kmalloc() to kvmalloc() without checking
> >>> context of kvfree() callers. Therefore, I think that kvfree() needs to use
> >>> vfree_atomic() rather than just saying "vfree() might sleep if called not in
> >>> interrupt context."...
> >>
> >> Whereabouts in the vfree() path can the kernel sleep?
> > 
> > (Sorry for the late reply.)
> > 
> > Adding Andrey Ryabinin, author of commit 52414d3302577bb6
> > ("kvfree(): fix misleading comment"), maybe Andrey remembers
> > where vfree() can sleep.
> > 
> > In the meantime, does "cond_resched_lock(&vmap_area_lock);" in
> > __purge_vmap_area_lazy() count as a sleep point?
> 
> Yes, this is the place (the only one) where vfree() can sleep.

OK, thanks for the quick confirm.

So what about this: use __vfree_deferred() when:
 - in_interrupt(), because we can't use mutex_trylock() as pointed out
   by Tetsuo;
 - in_atomic(), because cond_resched_lock();
 - irqs_disabled(), as smp_call_function_many() will deadlock.

An untested diff to show the idea(not sure if warn is needed):

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e86ba6e74b50..28d200f054b0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1578,7 +1578,7 @@ void vfree_atomic(const void *addr)
 
 static void __vfree(const void *addr)
 {
-	if (unlikely(in_interrupt()))
+	if (unlikely(in_interrupt() || in_atomic() || irqs_disabled()))
 		__vfree_deferred(addr);
 	else
 		__vunmap(addr, 1);
@@ -1606,8 +1606,6 @@ void vfree(const void *addr)
 
 	kmemleak_free(addr);
 
-	might_sleep_if(!in_interrupt());
-
 	if (!addr)
 		return;
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-03-07 15:24                 ` Aaron Lu
@ 2019-03-07 16:33                   ` Andrey Ryabinin
  2019-03-08  2:41                     ` Aaron Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Andrey Ryabinin @ 2019-03-07 16:33 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM,
	joseph.qi, Linus Torvalds



On 3/7/19 6:24 PM, Aaron Lu wrote:
> On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 3/7/19 5:43 PM, Aaron Lu wrote:
>>> On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote:
>>>> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>
>>>>>>>
>>>>>>> If we want to allow vfree() to sleep, at least we need to test with
>>>>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
>>>>>>> vmalloc()/vfree() path). For now, reverting the
>>>>>>> "Context: Either preemptible task context or not-NMI interrupt." change
>>>>>>> will be needed for stable kernels.
>>>>>>
>>>>>> So, the comment for vfree "May sleep if called *not* from interrupt
>>>>>> context." is wrong?
>>>>>
>>>>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
>>>>>
>>>>>     We are going to use sleeping lock for freeing vmap.  However some
>>>>>     vfree() users want to free memory from atomic (but not from interrupt)
>>>>>     context.  For this we add vfree_atomic() - deferred variation of vfree()
>>>>>     which can be used in any atomic context (except NMIs).
>>>>>
>>>>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
>>>>>
>>>>>     - * Context: Any context except NMI.
>>>>>     + * Context: Either preemptible task context or not-NMI interrupt.
>>>>>
>>>>> change. But I think that we converted kmalloc() to kvmalloc() without checking
>>>>> context of kvfree() callers. Therefore, I think that kvfree() needs to use
>>>>> vfree_atomic() rather than just saying "vfree() might sleep if called not in
>>>>> interrupt context."...
>>>>
>>>> Whereabouts in the vfree() path can the kernel sleep?
>>>
>>> (Sorry for the late reply.)
>>>
>>> Adding Andrey Ryabinin, author of commit 52414d3302577bb6
>>> ("kvfree(): fix misleading comment"), maybe Andrey remembers
>>> where vfree() can sleep.
>>>
>>> In the meantime, does "cond_resched_lock(&vmap_area_lock);" in
>>> __purge_vmap_area_lazy() count as a sleep point?
>>
>> Yes, this is the place (the only one) where vfree() can sleep.
> 
> OK, thanks for the quick confirm.
> 
> So what about this: use __vfree_deferred() when:
>  - in_interrupt(), because we can't use mutex_trylock() as pointed out
>    by Tetsuo;
>  - in_atomic(), because cond_resched_lock();
>  - irqs_disabled(), as smp_call_function_many() will deadlock.
> 
> An untested diff to show the idea(not sure if warn is needed):
> 

It was discussed before. You're not the first one suggesting something like this.
There is the comment near in_atomic() explaining  well why and when your patch won't work.

The easiest way of making vfree() to be safe in atomic contexts is this patch:
http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com

But the final decision at that time was to fix caller so the call vfree from sleepable context instead:
 http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org

 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e86ba6e74b50..28d200f054b0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1578,7 +1578,7 @@ void vfree_atomic(const void *addr)
>  
>  static void __vfree(const void *addr)
>  {
> -	if (unlikely(in_interrupt()))
> +	if (unlikely(in_interrupt() || in_atomic() || irqs_disabled()))
>  		__vfree_deferred(addr);
>  	else
>  		__vunmap(addr, 1);
> @@ -1606,8 +1606,6 @@ void vfree(const void *addr)
>  
>  	kmemleak_free(addr);
>  
> -	might_sleep_if(!in_interrupt());
> -
>  	if (!addr)
>  		return;
>  
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-03-07 16:33                   ` Andrey Ryabinin
@ 2019-03-08  2:41                     ` Aaron Lu
  2019-03-11  1:43                       ` Jiufei Xue
  0 siblings, 1 reply; 21+ messages in thread
From: Aaron Lu @ 2019-03-08  2:41 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM,
	joseph.qi, Linus Torvalds

On 2019/3/8 0:33, Andrey Ryabinin wrote:
> 
> 
> On 3/7/19 6:24 PM, Aaron Lu wrote:
>> On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote:
>>>
>>>
>>> On 3/7/19 5:43 PM, Aaron Lu wrote:
>>>> On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote:
>>>>> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>>
>>>>>>>>
>>>>>>>> If we want to allow vfree() to sleep, at least we need to test with
>>>>>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use
>>>>>>>> vmalloc()/vfree() path). For now, reverting the
>>>>>>>> "Context: Either preemptible task context or not-NMI interrupt." change
>>>>>>>> will be needed for stable kernels.
>>>>>>>
>>>>>>> So, the comment for vfree "May sleep if called *not* from interrupt
>>>>>>> context." is wrong?
>>>>>>
>>>>>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says
>>>>>>
>>>>>>     We are going to use sleeping lock for freeing vmap.  However some
>>>>>>     vfree() users want to free memory from atomic (but not from interrupt)
>>>>>>     context.  For this we add vfree_atomic() - deferred variation of vfree()
>>>>>>     which can be used in any atomic context (except NMIs).
>>>>>>
>>>>>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made
>>>>>>
>>>>>>     - * Context: Any context except NMI.
>>>>>>     + * Context: Either preemptible task context or not-NMI interrupt.
>>>>>>
>>>>>> change. But I think that we converted kmalloc() to kvmalloc() without checking
>>>>>> context of kvfree() callers. Therefore, I think that kvfree() needs to use
>>>>>> vfree_atomic() rather than just saying "vfree() might sleep if called not in
>>>>>> interrupt context."...
>>>>>
>>>>> Whereabouts in the vfree() path can the kernel sleep?
>>>>
>>>> (Sorry for the late reply.)
>>>>
>>>> Adding Andrey Ryabinin, author of commit 52414d3302577bb6
>>>> ("kvfree(): fix misleading comment"), maybe Andrey remembers
>>>> where vfree() can sleep.
>>>>
>>>> In the meantime, does "cond_resched_lock(&vmap_area_lock);" in
>>>> __purge_vmap_area_lazy() count as a sleep point?
>>>
>>> Yes, this is the place (the only one) where vfree() can sleep.
>>
>> OK, thanks for the quick confirm.
>>
>> So what about this: use __vfree_deferred() when:
>>  - in_interrupt(), because we can't use mutex_trylock() as pointed out
>>    by Tetsuo;
>>  - in_atomic(), because cond_resched_lock();
>>  - irqs_disabled(), as smp_call_function_many() will deadlock.
>>
>> An untested diff to show the idea(not sure if warn is needed):
>>
> 
> It was discussed before. You're not the first one suggesting something like this.
> There is the comment near in_atomic() explaining  well why and when your patch won't work.

Thanks for the info.

> The easiest way of making vfree() to be safe in atomic contexts is this patch:
> http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com
> 
> But the final decision at that time was to fix caller so the call vfree from sleepable context instead:
>  http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org

OK, if that is the final decision, then I think Jiufei's patch that
moves kvfree() out of the locked region is the right thing to do for
this issue here.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info
  2019-03-08  2:41                     ` Aaron Lu
@ 2019-03-11  1:43                       ` Jiufei Xue
  0 siblings, 0 replies; 21+ messages in thread
From: Jiufei Xue @ 2019-03-11  1:43 UTC (permalink / raw)
  To: Andrey Ryabinin, Andrew Morton
  Cc: Aaron Lu, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM,
	joseph.qi, Linus Torvalds

Hi Andrew,
>> It was discussed before. You're not the first one suggesting something like this.
>> There is the comment near in_atomic() explaining  well why and when your patch won't work.
> Thanks for the info.
>
>> The easiest way of making vfree() to be safe in atomic contexts is this patch:
>> http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com
>>
>> But the final decision at that time was to fix caller so the call vfree from sleepable context instead:
>>  http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org
> OK, if that is the final decision, then I think Jiufei's patch that
> moves kvfree() out of the locked region is the right thing to do for
> this issue here.
>
Is that the final decision we have made? Could you please look into
my patch again and give the decision?


Thanks,

Jiufei


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-03-11  1:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29  7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue
2019-01-29  8:53 ` Aaron Lu
2019-01-29 10:43   ` Joseph Qi
2019-01-29 11:19     ` Aaron Lu
2019-01-29 11:43 ` Tetsuo Handa
2019-01-29 19:13   ` Andrew Morton
2019-01-29 21:12     ` Tetsuo Handa
2019-01-29 21:51       ` Yang Shi
2019-01-30  0:42         ` Tetsuo Handa
2019-01-30  1:01           ` Andrew Morton
2019-01-30  1:11             ` Linus Torvalds
2019-01-30  1:23               ` Linus Torvalds
2019-01-30  2:54                 ` Tetsuo Handa
2019-01-30 17:18                   ` Linus Torvalds
2019-01-30 22:13                     ` Andrew Morton
2019-03-07 14:43             ` Aaron Lu
2019-03-07 14:47               ` Andrey Ryabinin
2019-03-07 15:24                 ` Aaron Lu
2019-03-07 16:33                   ` Andrey Ryabinin
2019-03-08  2:41                     ` Aaron Lu
2019-03-11  1:43                       ` Jiufei Xue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.