* [PATCH] mm: fix sleeping function warning in alloc_swap_info @ 2019-01-29 7:21 Jiufei Xue 2019-01-29 8:53 ` Aaron Lu 2019-01-29 11:43 ` Tetsuo Handa 0 siblings, 2 replies; 21+ messages in thread From: Jiufei Xue @ 2019-01-29 7:21 UTC (permalink / raw) To: akpm; +Cc: linux-mm, joseph.qi Trinity reports BUG: sleeping function called from invalid context at mm/vmalloc.c:1477 in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 [ 2748.573460] Call Trace: [ 2748.575935] dump_stack+0x91/0xeb [ 2748.578512] ___might_sleep+0x21c/0x250 [ 2748.581090] remove_vm_area+0x1d/0x90 [ 2748.583637] __vunmap+0x76/0x100 [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 [ 2748.598973] do_syscall_64+0x60/0x210 [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is triggered by calling kvfree() inside spinlock() section in function alloc_swap_info(). Fix this by moving the kvfree() after spin_unlock(). Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") Cc: <stable@vger.kernel.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> --- mm/swapfile.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index dbac1d49469d..d26c9eac3d64 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check); static struct swap_info_struct *alloc_swap_info(void) { - struct swap_info_struct *p; + struct swap_info_struct *p, *tmp = NULL; unsigned int type; int i; int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node); @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void) smp_wmb(); nr_swapfiles++; } else { - kvfree(p); + tmp = p; p = swap_info[type]; /* * Do not memset this entry: a racing procfs swap_next() @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void) plist_node_init(&p->avail_lists[i], 0); p->flags = SWP_USED; spin_unlock(&swap_lock); + kvfree(tmp); + spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); -- 2.19.1.856.g8858448bb ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue @ 2019-01-29 8:53 ` Aaron Lu 2019-01-29 10:43 ` Joseph Qi 2019-01-29 11:43 ` Tetsuo Handa 1 sibling, 1 reply; 21+ messages in thread From: Aaron Lu @ 2019-01-29 8:53 UTC (permalink / raw) To: Jiufei Xue, akpm; +Cc: linux-mm, joseph.qi, Michal Hocko, Vasily Averin On 2019/1/29 15:21, Jiufei Xue wrote: > Trinity reports BUG: > > sleeping function called from invalid context at mm/vmalloc.c:1477 > in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > > [ 2748.573460] Call Trace: > [ 2748.575935] dump_stack+0x91/0xeb > [ 2748.578512] ___might_sleep+0x21c/0x250 > [ 2748.581090] remove_vm_area+0x1d/0x90 > [ 2748.583637] __vunmap+0x76/0x100 > [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > [ 2748.598973] do_syscall_64+0x60/0x210 > [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This is triggered by calling kvfree() inside spinlock() section in > function alloc_swap_info(). > Fix this by moving the kvfree() after spin_unlock(). The fix looks good to me. BTW, swap_info_struct's size has been reduced to its original size: 272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree in that commit since I don't see any any harm by keep using kvzalloc/kvfree, but now looks like they're causing some trouble. So what about using back kzalloc/kfree for swap_info_struct instead? Can save one local variable and using kvzalloc/kvfree for a struct that is 272 bytes doesn't really have any benefit. Thanks, Aaron > > Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") > Cc: <stable@vger.kernel.org> > Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> > Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> > --- > mm/swapfile.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index dbac1d49469d..d26c9eac3d64 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check); > > static struct swap_info_struct *alloc_swap_info(void) > { > - struct swap_info_struct *p; > + struct swap_info_struct *p, *tmp = NULL; > unsigned int type; > int i; > int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node); > @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void) > smp_wmb(); > nr_swapfiles++; > } else { > - kvfree(p); > + tmp = p; > p = swap_info[type]; > /* > * Do not memset this entry: a racing procfs swap_next() > @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void) > plist_node_init(&p->avail_lists[i], 0); > p->flags = SWP_USED; > spin_unlock(&swap_lock); > + kvfree(tmp); > + > spin_lock_init(&p->lock); > spin_lock_init(&p->cont_lock); > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 8:53 ` Aaron Lu @ 2019-01-29 10:43 ` Joseph Qi 2019-01-29 11:19 ` Aaron Lu 0 siblings, 1 reply; 21+ messages in thread From: Joseph Qi @ 2019-01-29 10:43 UTC (permalink / raw) To: Aaron Lu, Jiufei Xue, akpm; +Cc: linux-mm, Michal Hocko, Vasily Averin Hi, On 19/1/29 16:53, Aaron Lu wrote: > On 2019/1/29 15:21, Jiufei Xue wrote: >> Trinity reports BUG: >> >> sleeping function called from invalid context at mm/vmalloc.c:1477 >> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 >> >> [ 2748.573460] Call Trace: >> [ 2748.575935] dump_stack+0x91/0xeb >> [ 2748.578512] ___might_sleep+0x21c/0x250 >> [ 2748.581090] remove_vm_area+0x1d/0x90 >> [ 2748.583637] __vunmap+0x76/0x100 >> [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 >> [ 2748.598973] do_syscall_64+0x60/0x210 >> [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> This is triggered by calling kvfree() inside spinlock() section in >> function alloc_swap_info(). >> Fix this by moving the kvfree() after spin_unlock(). > > The fix looks good to me. > > BTW, swap_info_struct's size has been reduced to its original size: > 272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for > avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree > in that commit since I don't see any any harm by keep using > kvzalloc/kvfree, but now looks like they're causing some trouble. > > So what about using back kzalloc/kfree for swap_info_struct instead? > Can save one local variable and using kvzalloc/kvfree for a struct > that is 272 bytes doesn't really have any benefit. > avail_lists in swap_info_struct is dynamic allocated. So if we use back kzalloc/kfree, how to deal with the case that nr_node_ids is big? Thanks, Joseph > Thanks, > Aaron > >> >> Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") >> Cc: <stable@vger.kernel.org> >> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> >> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> >> --- >> mm/swapfile.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index dbac1d49469d..d26c9eac3d64 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check); >> >> static struct swap_info_struct *alloc_swap_info(void) >> { >> - struct swap_info_struct *p; >> + struct swap_info_struct *p, *tmp = NULL; >> unsigned int type; >> int i; >> int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node); >> @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void) >> smp_wmb(); >> nr_swapfiles++; >> } else { >> - kvfree(p); >> + tmp = p; >> p = swap_info[type]; >> /* >> * Do not memset this entry: a racing procfs swap_next() >> @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void) >> plist_node_init(&p->avail_lists[i], 0); >> p->flags = SWP_USED; >> spin_unlock(&swap_lock); >> + kvfree(tmp); >> + >> spin_lock_init(&p->lock); >> spin_lock_init(&p->cont_lock); >> >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 10:43 ` Joseph Qi @ 2019-01-29 11:19 ` Aaron Lu 0 siblings, 0 replies; 21+ messages in thread From: Aaron Lu @ 2019-01-29 11:19 UTC (permalink / raw) To: Joseph Qi; +Cc: Jiufei Xue, akpm, linux-mm, Michal Hocko, Vasily Averin On Tue, Jan 29, 2019 at 06:43:53PM +0800, Joseph Qi wrote: > Hi, > > On 19/1/29 16:53, Aaron Lu wrote: > > On 2019/1/29 15:21, Jiufei Xue wrote: > >> Trinity reports BUG: > >> > >> sleeping function called from invalid context at mm/vmalloc.c:1477 > >> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > >> > >> [ 2748.573460] Call Trace: > >> [ 2748.575935] dump_stack+0x91/0xeb > >> [ 2748.578512] ___might_sleep+0x21c/0x250 > >> [ 2748.581090] remove_vm_area+0x1d/0x90 > >> [ 2748.583637] __vunmap+0x76/0x100 > >> [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > >> [ 2748.598973] do_syscall_64+0x60/0x210 > >> [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >> > >> This is triggered by calling kvfree() inside spinlock() section in > >> function alloc_swap_info(). > >> Fix this by moving the kvfree() after spin_unlock(). > > > > The fix looks good to me. > > > > BTW, swap_info_struct's size has been reduced to its original size: > > 272 bytes by commit 66f71da9dd38("mm/swap: use nr_node_ids for > > avail_lists in swap_info_struct"). I didn't use back kzalloc/kfree > > in that commit since I don't see any any harm by keep using > > kvzalloc/kvfree, but now looks like they're causing some trouble. > > > > So what about using back kzalloc/kfree for swap_info_struct instead? > > Can save one local variable and using kvzalloc/kvfree for a struct > > that is 272 bytes doesn't really have any benefit. > > > avail_lists in swap_info_struct is dynamic allocated. > So if we use back kzalloc/kfree, how to deal with the case that > nr_node_ids is big? Oh right, I missed that. Acked-by: Aaron Lu <aaron.lu@linux.alibaba.com> Thanks, Aaron > >> > >> Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") > >> Cc: <stable@vger.kernel.org> > >> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> > >> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> > >> --- > >> mm/swapfile.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> index dbac1d49469d..d26c9eac3d64 100644 > >> --- a/mm/swapfile.c > >> +++ b/mm/swapfile.c > >> @@ -2810,7 +2810,7 @@ late_initcall(max_swapfiles_check); > >> > >> static struct swap_info_struct *alloc_swap_info(void) > >> { > >> - struct swap_info_struct *p; > >> + struct swap_info_struct *p, *tmp = NULL; > >> unsigned int type; > >> int i; > >> int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node); > >> @@ -2840,7 +2840,7 @@ static struct swap_info_struct *alloc_swap_info(void) > >> smp_wmb(); > >> nr_swapfiles++; > >> } else { > >> - kvfree(p); > >> + tmp = p; > >> p = swap_info[type]; > >> /* > >> * Do not memset this entry: a racing procfs swap_next() > >> @@ -2853,6 +2853,8 @@ static struct swap_info_struct *alloc_swap_info(void) > >> plist_node_init(&p->avail_lists[i], 0); > >> p->flags = SWP_USED; > >> spin_unlock(&swap_lock); > >> + kvfree(tmp); > >> + > >> spin_lock_init(&p->lock); > >> spin_lock_init(&p->cont_lock); > >> > >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue 2019-01-29 8:53 ` Aaron Lu @ 2019-01-29 11:43 ` Tetsuo Handa 2019-01-29 19:13 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2019-01-29 11:43 UTC (permalink / raw) To: Jiufei Xue, akpm; +Cc: linux-mm, joseph.qi On 2019/01/29 16:21, Jiufei Xue wrote: > Trinity reports BUG: > > sleeping function called from invalid context at mm/vmalloc.c:1477 > in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > > [ 2748.573460] Call Trace: > [ 2748.575935] dump_stack+0x91/0xeb > [ 2748.578512] ___might_sleep+0x21c/0x250 > [ 2748.581090] remove_vm_area+0x1d/0x90 > [ 2748.583637] __vunmap+0x76/0x100 > [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > [ 2748.598973] do_syscall_64+0x60/0x210 > [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This is triggered by calling kvfree() inside spinlock() section in > function alloc_swap_info(). > Fix this by moving the kvfree() after spin_unlock(). > Excuse me? But isn't kvfree() safe to be called with spinlock held? There was no context explanation regarding kvfree() for 4.18.20. 4.19.18 says Context: Any context except NMI. and 4.20.5 says Context: Either preemptible task context or not-NMI interrupt. . There might be users who didn't notice this change. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 11:43 ` Tetsuo Handa @ 2019-01-29 19:13 ` Andrew Morton 2019-01-29 21:12 ` Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2019-01-29 19:13 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Jiufei Xue, linux-mm, joseph.qi On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > On 2019/01/29 16:21, Jiufei Xue wrote: > > Trinity reports BUG: > > > > sleeping function called from invalid context at mm/vmalloc.c:1477 > > in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > > > > [ 2748.573460] Call Trace: > > [ 2748.575935] dump_stack+0x91/0xeb > > [ 2748.578512] ___might_sleep+0x21c/0x250 > > [ 2748.581090] remove_vm_area+0x1d/0x90 > > [ 2748.583637] __vunmap+0x76/0x100 > > [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > > [ 2748.598973] do_syscall_64+0x60/0x210 > > [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > This is triggered by calling kvfree() inside spinlock() section in > > function alloc_swap_info(). > > Fix this by moving the kvfree() after spin_unlock(). > > > > Excuse me? But isn't kvfree() safe to be called with spinlock held? Yes, I'm having trouble spotting where kvfree() can sleep. Perhaps it *used* to sleep on mutex_lock(vmap_purge_lock), but try_purge_vmap_area_lazy() is using mutex_trylock(). Confused. kvfree() darn well *shouldn't* sleep! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 19:13 ` Andrew Morton @ 2019-01-29 21:12 ` Tetsuo Handa 2019-01-29 21:51 ` Yang Shi 0 siblings, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2019-01-29 21:12 UTC (permalink / raw) To: Andrew Morton; +Cc: Jiufei Xue, linux-mm, joseph.qi, Linus Torvalds On 2019/01/30 4:13, Andrew Morton wrote: > On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> On 2019/01/29 16:21, Jiufei Xue wrote: >>> Trinity reports BUG: >>> >>> sleeping function called from invalid context at mm/vmalloc.c:1477 >>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 >>> >>> [ 2748.573460] Call Trace: >>> [ 2748.575935] dump_stack+0x91/0xeb >>> [ 2748.578512] ___might_sleep+0x21c/0x250 >>> [ 2748.581090] remove_vm_area+0x1d/0x90 >>> [ 2748.583637] __vunmap+0x76/0x100 >>> [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 >>> [ 2748.598973] do_syscall_64+0x60/0x210 >>> [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >>> This is triggered by calling kvfree() inside spinlock() section in >>> function alloc_swap_info(). >>> Fix this by moving the kvfree() after spin_unlock(). >>> >> >> Excuse me? But isn't kvfree() safe to be called with spinlock held? > > Yes, I'm having trouble spotting where kvfree() can sleep. Perhaps it > *used* to sleep on mutex_lock(vmap_purge_lock), but > try_purge_vmap_area_lazy() is using mutex_trylock(). Confused. > > kvfree() darn well *shouldn't* sleep! > If I recall correctly, there was an attempt to allow vfree() to sleep but that attempt failed, and the change to allow vfree() to sleep was reverted. Thus, vfree() had been "Context: Any context except NMI.". If we want to allow vfree() to sleep, at least we need to test with kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use vmalloc()/vfree() path). For now, reverting the "Context: Either preemptible task context or not-NMI interrupt." change will be needed for stable kernels. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 21:12 ` Tetsuo Handa @ 2019-01-29 21:51 ` Yang Shi 2019-01-30 0:42 ` Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Yang Shi @ 2019-01-29 21:51 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On Tue, Jan 29, 2019 at 1:12 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2019/01/30 4:13, Andrew Morton wrote: > > On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > > >> On 2019/01/29 16:21, Jiufei Xue wrote: > >>> Trinity reports BUG: > >>> > >>> sleeping function called from invalid context at mm/vmalloc.c:1477 > >>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > >>> > >>> [ 2748.573460] Call Trace: > >>> [ 2748.575935] dump_stack+0x91/0xeb > >>> [ 2748.578512] ___might_sleep+0x21c/0x250 > >>> [ 2748.581090] remove_vm_area+0x1d/0x90 > >>> [ 2748.583637] __vunmap+0x76/0x100 > >>> [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > >>> [ 2748.598973] do_syscall_64+0x60/0x210 > >>> [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> > >>> This is triggered by calling kvfree() inside spinlock() section in > >>> function alloc_swap_info(). > >>> Fix this by moving the kvfree() after spin_unlock(). > >>> > >> > >> Excuse me? But isn't kvfree() safe to be called with spinlock held? > > > > Yes, I'm having trouble spotting where kvfree() can sleep. Perhaps it > > *used* to sleep on mutex_lock(vmap_purge_lock), but > > try_purge_vmap_area_lazy() is using mutex_trylock(). Confused. > > > > kvfree() darn well *shouldn't* sleep! > > > > If I recall correctly, there was an attempt to allow vfree() to sleep > but that attempt failed, and the change to allow vfree() to sleep was > reverted. Thus, vfree() had been "Context: Any context except NMI.". > > If we want to allow vfree() to sleep, at least we need to test with > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use > vmalloc()/vfree() path). For now, reverting the > "Context: Either preemptible task context or not-NMI interrupt." change > will be needed for stable kernels. So, the comment for vfree "May sleep if called *not* from interrupt context." is wrong? Thanks, Yang > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-29 21:51 ` Yang Shi @ 2019-01-30 0:42 ` Tetsuo Handa 2019-01-30 1:01 ` Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2019-01-30 0:42 UTC (permalink / raw) To: Yang Shi; +Cc: Andrew Morton, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds Yang Shi wrote: > On Tue, Jan 29, 2019 at 1:12 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > On 2019/01/30 4:13, Andrew Morton wrote: > > > On Tue, 29 Jan 2019 20:43:20 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > > > > >> On 2019/01/29 16:21, Jiufei Xue wrote: > > >>> Trinity reports BUG: > > >>> > > >>> sleeping function called from invalid context at mm/vmalloc.c:1477 > > >>> in_atomic(): 1, irqs_disabled(): 0, pid: 12269, name: trinity-c1 > > >>> > > >>> [ 2748.573460] Call Trace: > > >>> [ 2748.575935] dump_stack+0x91/0xeb > > >>> [ 2748.578512] ___might_sleep+0x21c/0x250 > > >>> [ 2748.581090] remove_vm_area+0x1d/0x90 > > >>> [ 2748.583637] __vunmap+0x76/0x100 > > >>> [ 2748.586120] __se_sys_swapon+0xb9a/0x1220 > > >>> [ 2748.598973] do_syscall_64+0x60/0x210 > > >>> [ 2748.601439] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > >>> > > >>> This is triggered by calling kvfree() inside spinlock() section in > > >>> function alloc_swap_info(). > > >>> Fix this by moving the kvfree() after spin_unlock(). > > >>> > > >> > > >> Excuse me? But isn't kvfree() safe to be called with spinlock held? > > > > > > Yes, I'm having trouble spotting where kvfree() can sleep. Perhaps it > > > *used* to sleep on mutex_lock(vmap_purge_lock), but > > > try_purge_vmap_area_lazy() is using mutex_trylock(). Confused. > > > > > > kvfree() darn well *shouldn't* sleep! > > > > > > > If I recall correctly, there was an attempt to allow vfree() to sleep > > but that attempt failed, and the change to allow vfree() to sleep was > > reverted. Thus, vfree() had been "Context: Any context except NMI.". That attempt was not reverted. Instead vfree_atomic() was added. > > > > If we want to allow vfree() to sleep, at least we need to test with > > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use > > vmalloc()/vfree() path). For now, reverting the > > "Context: Either preemptible task context or not-NMI interrupt." change > > will be needed for stable kernels. > > So, the comment for vfree "May sleep if called *not* from interrupt > context." is wrong? Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says We are going to use sleeping lock for freeing vmap. However some vfree() users want to free memory from atomic (but not from interrupt) context. For this we add vfree_atomic() - deferred variation of vfree() which can be used in any atomic context (except NMIs). and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made - * Context: Any context except NMI. + * Context: Either preemptible task context or not-NMI interrupt. change. But I think that we converted kmalloc() to kvmalloc() without checking context of kvfree() callers. Therefore, I think that kvfree() needs to use vfree_atomic() rather than just saying "vfree() might sleep if called not in interrupt context."... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 0:42 ` Tetsuo Handa @ 2019-01-30 1:01 ` Andrew Morton 2019-01-30 1:11 ` Linus Torvalds 2019-03-07 14:43 ` Aaron Lu 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2019-01-30 1:01 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > If we want to allow vfree() to sleep, at least we need to test with > > > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use > > > vmalloc()/vfree() path). For now, reverting the > > > "Context: Either preemptible task context or not-NMI interrupt." change > > > will be needed for stable kernels. > > > > So, the comment for vfree "May sleep if called *not* from interrupt > > context." is wrong? > > Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says > > We are going to use sleeping lock for freeing vmap. However some > vfree() users want to free memory from atomic (but not from interrupt) > context. For this we add vfree_atomic() - deferred variation of vfree() > which can be used in any atomic context (except NMIs). > > and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made > > - * Context: Any context except NMI. > + * Context: Either preemptible task context or not-NMI interrupt. > > change. But I think that we converted kmalloc() to kvmalloc() without checking > context of kvfree() callers. Therefore, I think that kvfree() needs to use > vfree_atomic() rather than just saying "vfree() might sleep if called not in > interrupt context."... Whereabouts in the vfree() path can the kernel sleep? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 1:01 ` Andrew Morton @ 2019-01-30 1:11 ` Linus Torvalds 2019-01-30 1:23 ` Linus Torvalds 2019-03-07 14:43 ` Aaron Lu 1 sibling, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2019-01-30 1:11 UTC (permalink / raw) To: Andrew Morton; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi On Tue, Jan 29, 2019 at 5:01 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > - * Context: Any context except NMI. > > + * Context: Either preemptible task context or not-NMI interrupt. > > Whereabouts in the vfree() path can the kernel sleep? Note that it's not necessarily about *sleeping*. One thing that vfree() really fundamentally should do is to flush TLB's. And you must not do a cross-TLB flush with interrupts disabled. NOTE! Right now, I think we do lazy TLB flushing, so the flush actually is delayed until the vmalloc() when the address rolls around in the vmalloc address space. But there really are very real and obvious reasons why we might want to do it at vfree time. So I'd honestly be a whole lot happier with vmalloc/vfree being process context only. Or at least with with interrupts enabled (so swirq/BH context would be fine, but an actual interrupt not so). Again, this is not about sleeping. But the end result is almost the same: we really should strive to not do vfree() in interrupt context. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 1:11 ` Linus Torvalds @ 2019-01-30 1:23 ` Linus Torvalds 2019-01-30 2:54 ` Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2019-01-30 1:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi On Tue, Jan 29, 2019 at 5:11 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Again, this is not about sleeping. But the end result is almost the > same: we really should strive to not do vfree() in interrupt context. Note that we currently test the wrong thing for this: we actually check "in_interrupt()" for the deferred case, which certainly works in practice, and protects from deadlocks (we need vmap_area_lock for the free-area handling) But it doesn't actually end up testing the "oops, interrupts are disabled in process context" issue. The "might_sleep()" check _does_ check that, iirc. Which - as mentioned - is fine because we currently don't actually do the TLB flush synchronously, but it's worth noting again. "vfree()" really is a *lot* different from "kfree()". It's unsafe in all kinds of special ways, and the locking difference is just part of it. So whatever might_sleep() has found might be a potential real issue at some point... Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 1:23 ` Linus Torvalds @ 2019-01-30 2:54 ` Tetsuo Handa 2019-01-30 17:18 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2019-01-30 2:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Yang Shi, Jiufei Xue, Linux MM, joseph.qi Andrew Morton wrote: > > change. But I think that we converted kmalloc() to kvmalloc() without checking > > context of kvfree() callers. Therefore, I think that kvfree() needs to use > > vfree_atomic() rather than just saying "vfree() might sleep if called not in > > interrupt context."... > > Whereabouts in the vfree() path can the kernel sleep? Indeed. Although __vunmap() must not be called from interrupt context because mutex_trylock()/mutex_unlock() from try_purge_vmap_area_lazy() from free_vmap_area_noflush() from free_unmap_vmap_area() from remove_vm_area() from __vunmap() cannot be called from interrupt context, it seems that there is no location that does sleeping operation. Linus Torvalds wrote: > Which - as mentioned - is fine because we currently don't actually do > the TLB flush synchronously, but it's worth noting again. "vfree()" > really is a *lot* different from "kfree()". It's unsafe in all kinds > of special ways, and the locking difference is just part of it. > > So whatever might_sleep() has found might be a potential real issue at > some point... Then, do we automatically defer vfree() to mm_percpu_wq context? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 2:54 ` Tetsuo Handa @ 2019-01-30 17:18 ` Linus Torvalds 2019-01-30 22:13 ` Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2019-01-30 17:18 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, Yang Shi, Jiufei Xue, Linux MM, joseph.qi On Tue, Jan 29, 2019 at 6:54 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Then, do we automatically defer vfree() to mm_percpu_wq context? We might do that, and say "if you call vfree with interrupts disabled, it gets deferred". That said, the deferred case should generally not be a common case either. It has real issues, one of which is simply that particularly on 32-bit architectures we can run out of vmalloc space even normally, and if there are loads that do a lot of allocation and then deferred frees, that problem could become really bad. So I'd almost be happier having a warning if we end up doing the TLB flush and defer. At least to find *what* people do. And I do wonder if we should just always warn, and have that "might_sleep()", and simply say "if you do vfree from interrupts or with interrupts disabled, you really should be aware of these kinds of issues, and you really should *show* that you are aware by using vfree_atomic()". Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 17:18 ` Linus Torvalds @ 2019-01-30 22:13 ` Andrew Morton 0 siblings, 0 replies; 21+ messages in thread From: Andrew Morton @ 2019-01-30 22:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi On Wed, 30 Jan 2019 09:18:20 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jan 29, 2019 at 6:54 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > Then, do we automatically defer vfree() to mm_percpu_wq context? > > We might do that, and say "if you call vfree with interrupts disabled, > it gets deferred". > > That said, the deferred case should generally not be a common case > either. It has real issues, one of which is simply that particularly > on 32-bit architectures we can run out of vmalloc space even normally, > and if there are loads that do a lot of allocation and then deferred > frees, that problem could become really bad. > > So I'd almost be happier having a warning if we end up doing the TLB > flush and defer. At least to find *what* people do. > > And I do wonder if we should just always warn, and have that > "might_sleep()", and simply say "if you do vfree from interrupts or > with interrupts disabled, you really should be aware of these kinds of > issues, and you really should *show* that you are aware by using > vfree_atomic()". > Agree. if (irqs_disabled()) {warn_once; punt_to_workqueue} is the way to go here. vfree() should be callable from spinlocked code and might_sleep() is an inappropriate check. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-01-30 1:01 ` Andrew Morton 2019-01-30 1:11 ` Linus Torvalds @ 2019-03-07 14:43 ` Aaron Lu 2019-03-07 14:47 ` Andrey Ryabinin 1 sibling, 1 reply; 21+ messages in thread From: Aaron Lu @ 2019-03-07 14:43 UTC (permalink / raw) To: Andrew Morton Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds, Andrey Ryabinin On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote: > On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > > If we want to allow vfree() to sleep, at least we need to test with > > > > kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use > > > > vmalloc()/vfree() path). For now, reverting the > > > > "Context: Either preemptible task context or not-NMI interrupt." change > > > > will be needed for stable kernels. > > > > > > So, the comment for vfree "May sleep if called *not* from interrupt > > > context." is wrong? > > > > Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says > > > > We are going to use sleeping lock for freeing vmap. However some > > vfree() users want to free memory from atomic (but not from interrupt) > > context. For this we add vfree_atomic() - deferred variation of vfree() > > which can be used in any atomic context (except NMIs). > > > > and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made > > > > - * Context: Any context except NMI. > > + * Context: Either preemptible task context or not-NMI interrupt. > > > > change. But I think that we converted kmalloc() to kvmalloc() without checking > > context of kvfree() callers. Therefore, I think that kvfree() needs to use > > vfree_atomic() rather than just saying "vfree() might sleep if called not in > > interrupt context."... > > Whereabouts in the vfree() path can the kernel sleep? (Sorry for the late reply.) Adding Andrey Ryabinin, author of commit 52414d3302577bb6 ("kvfree(): fix misleading comment"), maybe Andrey remembers where vfree() can sleep. In the meantime, does "cond_resched_lock(&vmap_area_lock);" in __purge_vmap_area_lazy() count as a sleep point? __purge_vmap_area_lazy() can be called if mutex_trylock on vmap_purge_lock succeed. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-03-07 14:43 ` Aaron Lu @ 2019-03-07 14:47 ` Andrey Ryabinin 2019-03-07 15:24 ` Aaron Lu 0 siblings, 1 reply; 21+ messages in thread From: Andrey Ryabinin @ 2019-03-07 14:47 UTC (permalink / raw) To: Aaron Lu, Andrew Morton Cc: Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On 3/7/19 5:43 PM, Aaron Lu wrote: > On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote: >> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >>>>> >>>>> If we want to allow vfree() to sleep, at least we need to test with >>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use >>>>> vmalloc()/vfree() path). For now, reverting the >>>>> "Context: Either preemptible task context or not-NMI interrupt." change >>>>> will be needed for stable kernels. >>>> >>>> So, the comment for vfree "May sleep if called *not* from interrupt >>>> context." is wrong? >>> >>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says >>> >>> We are going to use sleeping lock for freeing vmap. However some >>> vfree() users want to free memory from atomic (but not from interrupt) >>> context. For this we add vfree_atomic() - deferred variation of vfree() >>> which can be used in any atomic context (except NMIs). >>> >>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made >>> >>> - * Context: Any context except NMI. >>> + * Context: Either preemptible task context or not-NMI interrupt. >>> >>> change. But I think that we converted kmalloc() to kvmalloc() without checking >>> context of kvfree() callers. Therefore, I think that kvfree() needs to use >>> vfree_atomic() rather than just saying "vfree() might sleep if called not in >>> interrupt context."... >> >> Whereabouts in the vfree() path can the kernel sleep? > > (Sorry for the late reply.) > > Adding Andrey Ryabinin, author of commit 52414d3302577bb6 > ("kvfree(): fix misleading comment"), maybe Andrey remembers > where vfree() can sleep. > > In the meantime, does "cond_resched_lock(&vmap_area_lock);" in > __purge_vmap_area_lazy() count as a sleep point? Yes, this is the place (the only one) where vfree() can sleep. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-03-07 14:47 ` Andrey Ryabinin @ 2019-03-07 15:24 ` Aaron Lu 2019-03-07 16:33 ` Andrey Ryabinin 0 siblings, 1 reply; 21+ messages in thread From: Aaron Lu @ 2019-03-07 15:24 UTC (permalink / raw) To: Andrey Ryabinin Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote: > > > On 3/7/19 5:43 PM, Aaron Lu wrote: > > On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote: > >> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > >> > >>>>> > >>>>> If we want to allow vfree() to sleep, at least we need to test with > >>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use > >>>>> vmalloc()/vfree() path). For now, reverting the > >>>>> "Context: Either preemptible task context or not-NMI interrupt." change > >>>>> will be needed for stable kernels. > >>>> > >>>> So, the comment for vfree "May sleep if called *not* from interrupt > >>>> context." is wrong? > >>> > >>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says > >>> > >>> We are going to use sleeping lock for freeing vmap. However some > >>> vfree() users want to free memory from atomic (but not from interrupt) > >>> context. For this we add vfree_atomic() - deferred variation of vfree() > >>> which can be used in any atomic context (except NMIs). > >>> > >>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made > >>> > >>> - * Context: Any context except NMI. > >>> + * Context: Either preemptible task context or not-NMI interrupt. > >>> > >>> change. But I think that we converted kmalloc() to kvmalloc() without checking > >>> context of kvfree() callers. Therefore, I think that kvfree() needs to use > >>> vfree_atomic() rather than just saying "vfree() might sleep if called not in > >>> interrupt context."... > >> > >> Whereabouts in the vfree() path can the kernel sleep? > > > > (Sorry for the late reply.) > > > > Adding Andrey Ryabinin, author of commit 52414d3302577bb6 > > ("kvfree(): fix misleading comment"), maybe Andrey remembers > > where vfree() can sleep. > > > > In the meantime, does "cond_resched_lock(&vmap_area_lock);" in > > __purge_vmap_area_lazy() count as a sleep point? > > Yes, this is the place (the only one) where vfree() can sleep. OK, thanks for the quick confirm. So what about this: use __vfree_deferred() when: - in_interrupt(), because we can't use mutex_trylock() as pointed out by Tetsuo; - in_atomic(), because cond_resched_lock(); - irqs_disabled(), as smp_call_function_many() will deadlock. An untested diff to show the idea(not sure if warn is needed): diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e86ba6e74b50..28d200f054b0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1578,7 +1578,7 @@ void vfree_atomic(const void *addr) static void __vfree(const void *addr) { - if (unlikely(in_interrupt())) + if (unlikely(in_interrupt() || in_atomic() || irqs_disabled())) __vfree_deferred(addr); else __vunmap(addr, 1); @@ -1606,8 +1606,6 @@ void vfree(const void *addr) kmemleak_free(addr); - might_sleep_if(!in_interrupt()); - if (!addr) return; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-03-07 15:24 ` Aaron Lu @ 2019-03-07 16:33 ` Andrey Ryabinin 2019-03-08 2:41 ` Aaron Lu 0 siblings, 1 reply; 21+ messages in thread From: Andrey Ryabinin @ 2019-03-07 16:33 UTC (permalink / raw) To: Aaron Lu Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On 3/7/19 6:24 PM, Aaron Lu wrote: > On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote: >> >> >> On 3/7/19 5:43 PM, Aaron Lu wrote: >>> On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote: >>>> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>> >>>>>>> >>>>>>> If we want to allow vfree() to sleep, at least we need to test with >>>>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use >>>>>>> vmalloc()/vfree() path). For now, reverting the >>>>>>> "Context: Either preemptible task context or not-NMI interrupt." change >>>>>>> will be needed for stable kernels. >>>>>> >>>>>> So, the comment for vfree "May sleep if called *not* from interrupt >>>>>> context." is wrong? >>>>> >>>>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says >>>>> >>>>> We are going to use sleeping lock for freeing vmap. However some >>>>> vfree() users want to free memory from atomic (but not from interrupt) >>>>> context. For this we add vfree_atomic() - deferred variation of vfree() >>>>> which can be used in any atomic context (except NMIs). >>>>> >>>>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made >>>>> >>>>> - * Context: Any context except NMI. >>>>> + * Context: Either preemptible task context or not-NMI interrupt. >>>>> >>>>> change. But I think that we converted kmalloc() to kvmalloc() without checking >>>>> context of kvfree() callers. Therefore, I think that kvfree() needs to use >>>>> vfree_atomic() rather than just saying "vfree() might sleep if called not in >>>>> interrupt context."... >>>> >>>> Whereabouts in the vfree() path can the kernel sleep? >>> >>> (Sorry for the late reply.) >>> >>> Adding Andrey Ryabinin, author of commit 52414d3302577bb6 >>> ("kvfree(): fix misleading comment"), maybe Andrey remembers >>> where vfree() can sleep. >>> >>> In the meantime, does "cond_resched_lock(&vmap_area_lock);" in >>> __purge_vmap_area_lazy() count as a sleep point? >> >> Yes, this is the place (the only one) where vfree() can sleep. > > OK, thanks for the quick confirm. > > So what about this: use __vfree_deferred() when: > - in_interrupt(), because we can't use mutex_trylock() as pointed out > by Tetsuo; > - in_atomic(), because cond_resched_lock(); > - irqs_disabled(), as smp_call_function_many() will deadlock. > > An untested diff to show the idea(not sure if warn is needed): > It was discussed before. You're not the first one suggesting something like this. There is the comment near in_atomic() explaining well why and when your patch won't work. The easiest way of making vfree() to be safe in atomic contexts is this patch: http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com But the final decision at that time was to fix caller so the call vfree from sleepable context instead: http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e86ba6e74b50..28d200f054b0 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1578,7 +1578,7 @@ void vfree_atomic(const void *addr) > > static void __vfree(const void *addr) > { > - if (unlikely(in_interrupt())) > + if (unlikely(in_interrupt() || in_atomic() || irqs_disabled())) > __vfree_deferred(addr); > else > __vunmap(addr, 1); > @@ -1606,8 +1606,6 @@ void vfree(const void *addr) > > kmemleak_free(addr); > > - might_sleep_if(!in_interrupt()); > - > if (!addr) > return; > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-03-07 16:33 ` Andrey Ryabinin @ 2019-03-08 2:41 ` Aaron Lu 2019-03-11 1:43 ` Jiufei Xue 0 siblings, 1 reply; 21+ messages in thread From: Aaron Lu @ 2019-03-08 2:41 UTC (permalink / raw) To: Andrey Ryabinin Cc: Andrew Morton, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds On 2019/3/8 0:33, Andrey Ryabinin wrote: > > > On 3/7/19 6:24 PM, Aaron Lu wrote: >> On Thu, Mar 07, 2019 at 05:47:13PM +0300, Andrey Ryabinin wrote: >>> >>> >>> On 3/7/19 5:43 PM, Aaron Lu wrote: >>>> On Tue, Jan 29, 2019 at 05:01:50PM -0800, Andrew Morton wrote: >>>>> On Wed, 30 Jan 2019 09:42:06 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>>> >>>>>>>> >>>>>>>> If we want to allow vfree() to sleep, at least we need to test with >>>>>>>> kvmalloc() == vmalloc() (i.e. force kvmalloc()/kvfree() users to use >>>>>>>> vmalloc()/vfree() path). For now, reverting the >>>>>>>> "Context: Either preemptible task context or not-NMI interrupt." change >>>>>>>> will be needed for stable kernels. >>>>>>> >>>>>>> So, the comment for vfree "May sleep if called *not* from interrupt >>>>>>> context." is wrong? >>>>>> >>>>>> Commit bf22e37a641327e3 ("mm: add vfree_atomic()") says >>>>>> >>>>>> We are going to use sleeping lock for freeing vmap. However some >>>>>> vfree() users want to free memory from atomic (but not from interrupt) >>>>>> context. For this we add vfree_atomic() - deferred variation of vfree() >>>>>> which can be used in any atomic context (except NMIs). >>>>>> >>>>>> and commit 52414d3302577bb6 ("kvfree(): fix misleading comment") made >>>>>> >>>>>> - * Context: Any context except NMI. >>>>>> + * Context: Either preemptible task context or not-NMI interrupt. >>>>>> >>>>>> change. But I think that we converted kmalloc() to kvmalloc() without checking >>>>>> context of kvfree() callers. Therefore, I think that kvfree() needs to use >>>>>> vfree_atomic() rather than just saying "vfree() might sleep if called not in >>>>>> interrupt context."... >>>>> >>>>> Whereabouts in the vfree() path can the kernel sleep? >>>> >>>> (Sorry for the late reply.) >>>> >>>> Adding Andrey Ryabinin, author of commit 52414d3302577bb6 >>>> ("kvfree(): fix misleading comment"), maybe Andrey remembers >>>> where vfree() can sleep. >>>> >>>> In the meantime, does "cond_resched_lock(&vmap_area_lock);" in >>>> __purge_vmap_area_lazy() count as a sleep point? >>> >>> Yes, this is the place (the only one) where vfree() can sleep. >> >> OK, thanks for the quick confirm. >> >> So what about this: use __vfree_deferred() when: >> - in_interrupt(), because we can't use mutex_trylock() as pointed out >> by Tetsuo; >> - in_atomic(), because cond_resched_lock(); >> - irqs_disabled(), as smp_call_function_many() will deadlock. >> >> An untested diff to show the idea(not sure if warn is needed): >> > > It was discussed before. You're not the first one suggesting something like this. > There is the comment near in_atomic() explaining well why and when your patch won't work. Thanks for the info. > The easiest way of making vfree() to be safe in atomic contexts is this patch: > http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com > > But the final decision at that time was to fix caller so the call vfree from sleepable context instead: > http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org OK, if that is the final decision, then I think Jiufei's patch that moves kvfree() out of the locked region is the right thing to do for this issue here. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm: fix sleeping function warning in alloc_swap_info 2019-03-08 2:41 ` Aaron Lu @ 2019-03-11 1:43 ` Jiufei Xue 0 siblings, 0 replies; 21+ messages in thread From: Jiufei Xue @ 2019-03-11 1:43 UTC (permalink / raw) To: Andrey Ryabinin, Andrew Morton Cc: Aaron Lu, Tetsuo Handa, Yang Shi, Jiufei Xue, Linux MM, joseph.qi, Linus Torvalds Hi Andrew, >> It was discussed before. You're not the first one suggesting something like this. >> There is the comment near in_atomic() explaining well why and when your patch won't work. > Thanks for the info. > >> The easiest way of making vfree() to be safe in atomic contexts is this patch: >> http://lkml.kernel.org/r/20170330102719.13119-1-aryabinin@virtuozzo.com >> >> But the final decision at that time was to fix caller so the call vfree from sleepable context instead: >> http://lkml.kernel.org/r/20170330152229.f2108e718114ed77acae7405@linux-foundation.org > OK, if that is the final decision, then I think Jiufei's patch that > moves kvfree() out of the locked region is the right thing to do for > this issue here. > Is that the final decision we have made? Could you please look into my patch again and give the decision? Thanks, Jiufei ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-03-11 1:43 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-29 7:21 [PATCH] mm: fix sleeping function warning in alloc_swap_info Jiufei Xue 2019-01-29 8:53 ` Aaron Lu 2019-01-29 10:43 ` Joseph Qi 2019-01-29 11:19 ` Aaron Lu 2019-01-29 11:43 ` Tetsuo Handa 2019-01-29 19:13 ` Andrew Morton 2019-01-29 21:12 ` Tetsuo Handa 2019-01-29 21:51 ` Yang Shi 2019-01-30 0:42 ` Tetsuo Handa 2019-01-30 1:01 ` Andrew Morton 2019-01-30 1:11 ` Linus Torvalds 2019-01-30 1:23 ` Linus Torvalds 2019-01-30 2:54 ` Tetsuo Handa 2019-01-30 17:18 ` Linus Torvalds 2019-01-30 22:13 ` Andrew Morton 2019-03-07 14:43 ` Aaron Lu 2019-03-07 14:47 ` Andrey Ryabinin 2019-03-07 15:24 ` Aaron Lu 2019-03-07 16:33 ` Andrey Ryabinin 2019-03-08 2:41 ` Aaron Lu 2019-03-11 1:43 ` Jiufei Xue
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.