All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
@ 2017-03-24 10:53 Tetsuo Handa
  2017-03-24 12:22 ` Andrey Ryabinin
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-24 10:53 UTC (permalink / raw)
  To: linux-mm
  Cc: Tetsuo Handa, Matthew Wilcox, Christoph Hellwig, Jisheng Zhang,
	Andrey Ryabinin, Joel Fernandes, Chris Wilson, John Dias,
	Thomas Gleixner, H. Peter Anvin, Ingo Molnar

Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
as potentially sleeping") added might_sleep() to remove_vm_area() from
vfree(), and is causing

[    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
[    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
[    2.616156] 2 locks held by plymouthd/341:
[    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
[    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
[    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
[    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[    2.616273] Call Trace:
[    2.616281]  dump_stack+0x86/0xc3
[    2.616285]  ___might_sleep+0x17d/0x250
[    2.616289]  __might_sleep+0x4a/0x80
[    2.616293]  remove_vm_area+0x22/0x90
[    2.616296]  __vunmap+0x2e/0x110
[    2.616299]  vfree+0x42/0x90
[    2.616304]  kvfree+0x2c/0x40
[    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
[    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
[    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
[    2.616331]  drm_release+0x290/0x3b0 [drm]
[    2.616338]  __fput+0xf8/0x210
[    2.616342]  ____fput+0xe/0x10
[    2.616345]  task_work_run+0x85/0xc0
[    2.616351]  exit_to_usermode_loop+0xb4/0xc0
[    2.616355]  do_syscall_64+0x185/0x1f0
[    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25

warning.

But commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex") did
not make vfree() potentially sleeping because try_purge_vmap_area_lazy()
is still using mutex_trylock(). Thus, this is a false positive warning.

___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
free_unmap_vmap_area() from remove_vm_area() which might trigger same
false positive warning is remaining. But so far we haven't heard about
warning from that path.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 mm/vmalloc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 68eb002..32be3c4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1477,8 +1477,6 @@ struct vm_struct *remove_vm_area(const void *addr)
 {
 	struct vmap_area *va;
 
-	might_sleep();
-
 	va = find_vmap_area((unsigned long)addr);
 	if (va && va->flags & VM_VM_AREA) {
 		struct vm_struct *vm = va->vm;
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 10:53 [PATCH] mm: Remove pointless might_sleep() in remove_vm_area() Tetsuo Handa
@ 2017-03-24 12:22 ` Andrey Ryabinin
  2017-03-24 12:40   ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Ryabinin @ 2017-03-24 12:22 UTC (permalink / raw)
  To: Tetsuo Handa, linux-mm
  Cc: Matthew Wilcox, Christoph Hellwig, Jisheng Zhang, Joel Fernandes,
	Chris Wilson, John Dias, Thomas Gleixner, H. Peter Anvin,
	Ingo Molnar

On 03/24/2017 01:53 PM, Tetsuo Handa wrote:
> Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
> as potentially sleeping") added might_sleep() to remove_vm_area() from
> vfree(), and is causing
> 
> [    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
> [    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
> [    2.616156] 2 locks held by plymouthd/341:
> [    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
> [    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
> [    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
> [    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [    2.616273] Call Trace:
> [    2.616281]  dump_stack+0x86/0xc3
> [    2.616285]  ___might_sleep+0x17d/0x250
> [    2.616289]  __might_sleep+0x4a/0x80
> [    2.616293]  remove_vm_area+0x22/0x90
> [    2.616296]  __vunmap+0x2e/0x110
> [    2.616299]  vfree+0x42/0x90
> [    2.616304]  kvfree+0x2c/0x40
> [    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
> [    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
> [    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
> [    2.616331]  drm_release+0x290/0x3b0 [drm]
> [    2.616338]  __fput+0xf8/0x210
> [    2.616342]  ____fput+0xe/0x10
> [    2.616345]  task_work_run+0x85/0xc0
> [    2.616351]  exit_to_usermode_loop+0xb4/0xc0
> [    2.616355]  do_syscall_64+0x185/0x1f0
> [    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25
> 
> warning.
> 
> But commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex") did
> not make vfree() potentially sleeping because try_purge_vmap_area_lazy()
> is still using mutex_trylock(). Thus, this is a false positive warning.
> 

Commit f9e09977671b618a did not made vfree() sleeping.
Commit 763b218ddfa "mm: add preempt points into __purge_vmap_area_lazy()"
did this, thus it's not a false positive.


> ___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
> try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
> free_unmap_vmap_area() from remove_vm_area() which might trigger same
> false positive warning is remaining. But so far we haven't heard about
> warning from that path.

And why that would be a false positive?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 12:22 ` Andrey Ryabinin
@ 2017-03-24 12:40   ` Tetsuo Handa
  2017-03-24 15:05     ` Andrey Ryabinin
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-24 12:40 UTC (permalink / raw)
  To: aryabinin, linux-mm
  Cc: willy, hch, jszhang, joelaf, chris, joaodias, tglx, hpa, mingo

Andrey Ryabinin wrote:
> On 03/24/2017 01:53 PM, Tetsuo Handa wrote:
> > Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
> > as potentially sleeping") added might_sleep() to remove_vm_area() from
> > vfree(), and is causing
> > 
> > [    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
> > [    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
> > [    2.616156] 2 locks held by plymouthd/341:
> > [    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
> > [    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
> > [    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
> > [    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> > [    2.616273] Call Trace:
> > [    2.616281]  dump_stack+0x86/0xc3
> > [    2.616285]  ___might_sleep+0x17d/0x250
> > [    2.616289]  __might_sleep+0x4a/0x80
> > [    2.616293]  remove_vm_area+0x22/0x90
> > [    2.616296]  __vunmap+0x2e/0x110
> > [    2.616299]  vfree+0x42/0x90
> > [    2.616304]  kvfree+0x2c/0x40
> > [    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
> > [    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
> > [    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
> > [    2.616331]  drm_release+0x290/0x3b0 [drm]
> > [    2.616338]  __fput+0xf8/0x210
> > [    2.616342]  ____fput+0xe/0x10
> > [    2.616345]  task_work_run+0x85/0xc0
> > [    2.616351]  exit_to_usermode_loop+0xb4/0xc0
> > [    2.616355]  do_syscall_64+0x185/0x1f0
> > [    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > warning.
> > 
> > But commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex") did
> > not make vfree() potentially sleeping because try_purge_vmap_area_lazy()
> > is still using mutex_trylock(). Thus, this is a false positive warning.
> > 
> 
> Commit f9e09977671b618a did not made vfree() sleeping.
> Commit 763b218ddfa "mm: add preempt points into __purge_vmap_area_lazy()"
> did this, thus it's not a false positive.
> 
> 
> > ___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
> > try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
> > free_unmap_vmap_area() from remove_vm_area() which might trigger same
> > false positive warning is remaining. But so far we haven't heard about
> > warning from that path.
> 
> And why that would be a false positive?
> 

#define cond_resched_lock(lock) ({                              \
	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
	__cond_resched_lock(lock);                              \
	})

cond_resched_lock() calls ___might_sleep() even when
__cond_resched_lock() will not call preempt_schedule_common()
because should_resched() returns false due to preemption counter
being already elevated by holding &(&tfile->lock)->rlock spinlock.

If should_resched() is known to return false, calling
___might_sleep() from cond_resched_lock() is a false positive.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 12:40   ` Tetsuo Handa
@ 2017-03-24 15:05     ` Andrey Ryabinin
  2017-03-24 16:17       ` Matthew Wilcox
  2017-03-24 22:47       ` Tetsuo Handa
  0 siblings, 2 replies; 16+ messages in thread
From: Andrey Ryabinin @ 2017-03-24 15:05 UTC (permalink / raw)
  To: Tetsuo Handa, linux-mm
  Cc: willy, hch, jszhang, joelaf, chris, joaodias, tglx, hpa, mingo



On 03/24/2017 03:40 PM, Tetsuo Handa wrote:
> Andrey Ryabinin wrote:
>> On 03/24/2017 01:53 PM, Tetsuo Handa wrote:
>>> Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
>>> as potentially sleeping") added might_sleep() to remove_vm_area() from
>>> vfree(), and is causing
>>>
>>> [    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
>>> [    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
>>> [    2.616156] 2 locks held by plymouthd/341:
>>> [    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
>>> [    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
>>> [    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
>>> [    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
>>> [    2.616273] Call Trace:
>>> [    2.616281]  dump_stack+0x86/0xc3
>>> [    2.616285]  ___might_sleep+0x17d/0x250
>>> [    2.616289]  __might_sleep+0x4a/0x80
>>> [    2.616293]  remove_vm_area+0x22/0x90
>>> [    2.616296]  __vunmap+0x2e/0x110
>>> [    2.616299]  vfree+0x42/0x90
>>> [    2.616304]  kvfree+0x2c/0x40
>>> [    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
>>> [    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
>>> [    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
>>> [    2.616331]  drm_release+0x290/0x3b0 [drm]
>>> [    2.616338]  __fput+0xf8/0x210
>>> [    2.616342]  ____fput+0xe/0x10
>>> [    2.616345]  task_work_run+0x85/0xc0
>>> [    2.616351]  exit_to_usermode_loop+0xb4/0xc0
>>> [    2.616355]  do_syscall_64+0x185/0x1f0
>>> [    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25
>>>
>>> warning.
>>>
>>> But commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex") did
>>> not make vfree() potentially sleeping because try_purge_vmap_area_lazy()
>>> is still using mutex_trylock(). Thus, this is a false positive warning.
>>>
>>
>> Commit f9e09977671b618a did not made vfree() sleeping.
>> Commit 763b218ddfa "mm: add preempt points into __purge_vmap_area_lazy()"
>> did this, thus it's not a false positive.
>>
>>
>>> ___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
>>> try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
>>> free_unmap_vmap_area() from remove_vm_area() which might trigger same
>>> false positive warning is remaining. But so far we haven't heard about
>>> warning from that path.
>>
>> And why that would be a false positive?
>>
> 
> #define cond_resched_lock(lock) ({                              \
> 	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
> 	__cond_resched_lock(lock);                              \
> 	})
> 
> cond_resched_lock() calls ___might_sleep() even when
> __cond_resched_lock() will not call preempt_schedule_common()
> because should_resched() returns false due to preemption counter
> being already elevated by holding &(&tfile->lock)->rlock spinlock.
 
That is true only for preemptible kernel. On non-preempt kernel should_resched()
might return true under spin_lock.

Just fix the drm code. There is zero point in releasing memory under spinlock.



> If should_resched() is known to return false, calling
> ___might_sleep() from cond_resched_lock() is a false positive.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 15:05     ` Andrey Ryabinin
@ 2017-03-24 16:17       ` Matthew Wilcox
  2017-03-27 13:26         ` Andrey Ryabinin
  2017-03-24 22:47       ` Tetsuo Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2017-03-24 16:17 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, linux-mm, hch, jszhang, joelaf, chris, joaodias,
	tglx, hpa, mingo

On Fri, Mar 24, 2017 at 06:05:45PM +0300, Andrey Ryabinin wrote:
> Just fix the drm code. There is zero point in releasing memory under spinlock.

I disagree.  The spinlock has to be held while deleting from the hash
table.  Sure, we could change the API to return the object removed, and
then force the caller to free the object that was removed from the hash
table outside the lock it's holding, but that's a really inelegant API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 15:05     ` Andrey Ryabinin
  2017-03-24 16:17       ` Matthew Wilcox
@ 2017-03-24 22:47       ` Tetsuo Handa
  2017-03-27 10:16         ` Tetsuo Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-24 22:47 UTC (permalink / raw)
  To: aryabinin, linux-mm
  Cc: willy, hch, jszhang, joelaf, chris, joaodias, tglx, hpa, mingo

Andrey Ryabinin wrote:
> >>> ___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
> >>> try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
> >>> free_unmap_vmap_area() from remove_vm_area() which might trigger same
> >>> false positive warning is remaining. But so far we haven't heard about
> >>> warning from that path.
> >>
> >> And why that would be a false positive?
> >>
> > 
> > #define cond_resched_lock(lock) ({                              \
> > 	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
> > 	__cond_resched_lock(lock);                              \
> > 	})
> > 
> > cond_resched_lock() calls ___might_sleep() even when
> > __cond_resched_lock() will not call preempt_schedule_common()
> > because should_resched() returns false due to preemption counter
> > being already elevated by holding &(&tfile->lock)->rlock spinlock.
>  
> That is true only for preemptible kernel. On non-preempt kernel should_resched()
> might return true under spin_lock.

Excuse me, but I did not catch what you mean.
I think

  spin_lock(&lock1);
  cond_resched_lock(&lock1); /* <= might sleep */
  spin_unlock(&lock1);

is true. Are you saying that

  spin_lock(&lock0);
  spin_lock(&lock1);
  cond_resched_lock(&lock1); /* <= might sleep */
  spin_unlock(&lock1);
  spin_unlock(&lock0);

is also true for non-preempt kernel?

> 
> Just fix the drm code. There is zero point in releasing memory under spinlock.

If above answer is "yes", I don't think fixing the drm code is what we can do now.
Rather, we need below change because commit 763b218ddfaf5676 ("mm: add preempt
points into __purge_vmap_area_lazy()") suddenly broke things without verifying
all callers.

>From 281cfbc722cd0f4ff83f880a2dd487fd42226e2d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 24 Mar 2017 11:24:18 +0900
Subject: [PATCH] mm: Allow calling vfree() from non-schedulable context.

Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
as potentially sleeping") added might_sleep() to remove_vm_area() from
vfree(), and is causing

[    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
[    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
[    2.616156] 2 locks held by plymouthd/341:
[    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
[    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
[    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
[    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[    2.616273] Call Trace:
[    2.616281]  dump_stack+0x86/0xc3
[    2.616285]  ___might_sleep+0x17d/0x250
[    2.616289]  __might_sleep+0x4a/0x80
[    2.616293]  remove_vm_area+0x22/0x90
[    2.616296]  __vunmap+0x2e/0x110
[    2.616299]  vfree+0x42/0x90
[    2.616304]  kvfree+0x2c/0x40
[    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
[    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
[    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
[    2.616331]  drm_release+0x290/0x3b0 [drm]
[    2.616338]  __fput+0xf8/0x210
[    2.616342]  ____fput+0xe/0x10
[    2.616345]  task_work_run+0x85/0xc0
[    2.616351]  exit_to_usermode_loop+0xb4/0xc0
[    2.616355]  do_syscall_64+0x185/0x1f0
[    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25

warning.

And commit 763b218ddfaf5676 ("mm: add preempt points into
__purge_vmap_area_lazy()") actually made vfree() potentially sleeping on
non-preemptible kernels. But we want to keep vfree() being callable from
non-schedulable context as with kfree() because vfree() is called via
kvfree().

This patch updates the condition to use __vfree_deferred() in order to
make sure that all vfree()/kvfree() users who did not notice that commit
will remain safe.

console_unlock() is a function which is prepared for being called from
non-schedulable context (e.g. spinlock held, inside RCU). It is using

  !oops_in_progress && preemptible() && !rcu_preempt_depth()

as a condition for whether it is safe to schedule. This patch uses that
condition with oops_in_progress check (which is not important for
__vunmap() case) removed.

Straightforward change will be

-		if (unlikely(in_interrupt()))
+		if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))

in vfree(). But this patch optimizes the condition as below.

-  if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+  if (unlikely(in_interrupt() || !preemptible() || rcu_preempt_depth()))

Then, since in_interrupt() and preemptible() are defined as

#define in_interrupt() (irq_count())
#define irq_count()    (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))
#define preemptible()  (preempt_count() == 0 && !irqs_disabled())

if CONFIG_PREEMPT_COUNT=y, this condition is

-  if (unlikely(in_interrupt() || !preemptible() || rcu_preempt_depth()))
+  if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+          !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

and is thus

-   if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-           !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	    if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+	            (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

and is thus

-   if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-           (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+	    if (unlikely((preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

and is thus finally

-   if (unlikely((preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+   if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))

. This patch will not break CONFIG_PREEMPT_COUNT=n case because
in_interrupt() is evaluated as false because preempt_count() is always 0.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org> # v4.10
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0b05762..5f0eb81 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1589,7 +1589,7 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
+	if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))
 		__vfree_deferred(addr);
 	else
 		__vunmap(addr, 1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 22:47       ` Tetsuo Handa
@ 2017-03-27 10:16         ` Tetsuo Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-27 10:16 UTC (permalink / raw)
  To: aryabinin, linux-mm
  Cc: willy, hch, jszhang, joelaf, chris, joaodias, tglx, hpa, mingo

Tetsuo Handa wrote:
> . This patch will not break CONFIG_PREEMPT_COUNT=n case because
> in_interrupt() is evaluated as false because preempt_count() is always 0.

> -	if (unlikely(in_interrupt()))
> +	if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))

Oops, I got confused. preemptible() is always 0 for CONFIG_PREEMPT_COUNT=n case.
I think above condition is wrong. Updated patch is shown below.

>From 3dd03c34ee45fbdb3c8fd31b558a76db3a562b22 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 27 Mar 2017 10:53:08 +0900
Subject: [PATCH v2] mm: Allow calling vfree() from non-schedulable context.

Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
as potentially sleeping") added might_sleep() to remove_vm_area() from
vfree(), and is causing

[    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
[    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
[    2.616156] 2 locks held by plymouthd/341:
[    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
[    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
[    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
[    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[    2.616273] Call Trace:
[    2.616281]  dump_stack+0x86/0xc3
[    2.616285]  ___might_sleep+0x17d/0x250
[    2.616289]  __might_sleep+0x4a/0x80
[    2.616293]  remove_vm_area+0x22/0x90
[    2.616296]  __vunmap+0x2e/0x110
[    2.616299]  vfree+0x42/0x90
[    2.616304]  kvfree+0x2c/0x40
[    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
[    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
[    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
[    2.616331]  drm_release+0x290/0x3b0 [drm]
[    2.616338]  __fput+0xf8/0x210
[    2.616342]  ____fput+0xe/0x10
[    2.616345]  task_work_run+0x85/0xc0
[    2.616351]  exit_to_usermode_loop+0xb4/0xc0
[    2.616355]  do_syscall_64+0x185/0x1f0
[    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25

warning.

And commit 763b218ddfaf5676 ("mm: add preempt points into
__purge_vmap_area_lazy()") actually made vfree() potentially sleeping on
non-preemptible kernels. But we want to keep vfree() being callable from
non-schedulable context as with kfree() because vfree() is called via
kvfree().

This patch updates the condition to use __vfree_deferred() in order to
make sure that all vfree()/kvfree() users who did not notice that commit
will remain safe.

console_unlock() is a function which is prepared for being called from
non-schedulable context (e.g. spinlock held, inside RCU). It is using

  !oops_in_progress && preemptible() && !rcu_preempt_depth()

as a condition for whether it is safe to schedule. This patch uses that
condition with oops_in_progress check (which is not important for
__vunmap() case) removed.

Straightforward change will be

-	if (unlikely(in_interrupt()))
+	if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))

in vfree(). But we can remove in_interrupt() check due to reasons below.

If CONFIG_PREEMPT_COUNT=y, in_interrupt() and preemptible() are defined as

  #define in_interrupt() (irq_count())
  #define irq_count()    (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))
  #define preemptible()  (preempt_count() == 0 && !irqs_disabled())

and therefore this condition can be rewritten as below.

-	if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))
+	if (unlikely(preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))

-	if (unlikely(preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))
+	if (unlikely(!(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely(!(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely(!preemptible() || rcu_preempt_depth()))

If CONFIG_PREEMPT_COUNT=n, preemptible() is defined as

  #define preemptible() 0

and therefore this condition can be rewritten as below.

-       if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+       if (unlikely(in_interrupt() || !(0 && !rcu_preempt_depth())))

-       if (unlikely(in_interrupt() || !(0 && !rcu_preempt_depth())))
+       if (unlikely(in_interrupt() || !(0)))

-       if (unlikely(in_interrupt() || !(0)))
+       if (unlikely(in_interrupt() || 1))

-       if (unlikely(in_interrupt() || 1))
+       if (unlikely(1))

Also drop unlikely() part because caller holding spinlock or inside RCU is not
such uncommon cases.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org> # v4.10
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0b05762..36334ff 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1589,7 +1589,7 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
+	if (!preemptible() || rcu_preempt_depth())
 		__vfree_deferred(addr);
 	else
 		__vunmap(addr, 1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-24 16:17       ` Matthew Wilcox
@ 2017-03-27 13:26         ` Andrey Ryabinin
  2017-03-27 14:06           ` Matthew Wilcox
  2017-03-27 14:10           ` Thomas Hellstrom
  0 siblings, 2 replies; 16+ messages in thread
From: Andrey Ryabinin @ 2017-03-27 13:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Tetsuo Handa, linux-mm, hch, jszhang, joelaf, chris, joaodias,
	tglx, hpa, mingo, Thomas Hellstrom, dri-devel, David Airlie

[+CC drm folks, see the following threads:
	http://lkml.kernel.org/r/201703232349.BGB95898.QHLVFFOMtFOOJS@I-love.SAKURA.ne.jp
	http://lkml.kernel.org/r/1490352808-7187-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
]

On 03/24/2017 07:17 PM, Matthew Wilcox wrote:
> On Fri, Mar 24, 2017 at 06:05:45PM +0300, Andrey Ryabinin wrote:
>> Just fix the drm code. There is zero point in releasing memory under spinlock.
> 
> I disagree.  The spinlock has to be held while deleting from the hash
> table. 

And what makes you think so?

There are too places where spinlock held during drm_ht_remove();

1) The first one is an obvious crap in ttm_object_device_release():

void ttm_object_device_release(struct ttm_object_device **p_tdev)
{
	struct ttm_object_device *tdev = *p_tdev;

	*p_tdev = NULL;

	spin_lock(&tdev->object_lock);
	drm_ht_remove(&tdev->object_hash);
	spin_unlock(&tdev->object_lock);

	kfree(tdev);
}

Obviously this spin_lock has no use here and it can be removed. There should
be no concurrent access to tdev at this point, because that would mean immediate
use-afte-free.

2) The second case is in ttm_object_file_release() calls drm_ht_remove() under tfile->lock
And drm_ht_remove() does:
void drm_ht_remove(struct drm_open_hash *ht)
{
	if (ht->table) {
		kvfree(ht->table);
		ht->table = NULL;
	}
}

Let's assume that we have some other code accessing ht->table and racing
against ttm_object_file_release()->drm_ht_remove().
This would mean that such code must do the following:
  a) take spin_lock(&tfile->lock)
  b) check ht->table for being non-NULL and only after that it can dereference ht->table.

But I don't see any code checking ht->table for NULL. So if race against drm_ht_remove()
is possible, this code is already broken and this spin_lock doesn't save us from NULL-ptr
deref.

So, either we already protected from such scenarios (e.g. we are the only owners of tdev/tfile in
ttm_object_device_release()/ttm_object_file_release()) or this code is already terribly
broken. Anyways we can just move drm_ht_remove() out of spin_lock()/spin_unlock() section.

Did I miss anything? 


> Sure, we could change the API to return the object removed, and
> then force the caller to free the object that was removed from the hash
> table outside the lock it's holding, but that's a really inelegant API.
> 

This won't be required if I'm right.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-27 13:26         ` Andrey Ryabinin
@ 2017-03-27 14:06           ` Matthew Wilcox
  2017-03-27 14:10           ` Thomas Hellstrom
  1 sibling, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2017-03-27 14:06 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, linux-mm, hch, jszhang, joelaf, chris, joaodias,
	tglx, hpa, mingo, Thomas Hellstrom, dri-devel, David Airlie

On Mon, Mar 27, 2017 at 04:26:02PM +0300, Andrey Ryabinin wrote:
> [+CC drm folks, see the following threads:
> 	http://lkml.kernel.org/r/201703232349.BGB95898.QHLVFFOMtFOOJS@I-love.SAKURA.ne.jp
> 	http://lkml.kernel.org/r/1490352808-7187-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> ]
> 
> On 03/24/2017 07:17 PM, Matthew Wilcox wrote:
> > On Fri, Mar 24, 2017 at 06:05:45PM +0300, Andrey Ryabinin wrote:
> >> Just fix the drm code. There is zero point in releasing memory under spinlock.
> > 
> > I disagree.  The spinlock has to be held while deleting from the hash
> > table. 
> 
> And what makes you think so?

The bad naming of the function.  If somebody has a function called
'hashtable_remove' I naturally think it means "remove something from
the hash table".  This function should be called drm_ht_destroy().
And then, yes, it becomes obvious that there is no need to protect
destuction against usage because if anyone is still using the hashtable,
they're about to get a NULL pointer dereference.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-27 13:26         ` Andrey Ryabinin
  2017-03-27 14:06           ` Matthew Wilcox
@ 2017-03-27 14:10           ` Thomas Hellstrom
  2017-03-27 14:29               ` Tetsuo Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Hellstrom @ 2017-03-27 14:10 UTC (permalink / raw)
  To: Andrey Ryabinin, Matthew Wilcox
  Cc: Tetsuo Handa, linux-mm, hch, jszhang, joelaf, chris, joaodias,
	tglx, hpa, mingo, dri-devel, David Airlie

On 03/27/2017 03:26 PM, Andrey Ryabinin wrote:
> [+CC drm folks, see the following threads:
> 	https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_201703232349.BGB95898.QHLVFFOMtFOOJS-40I-2Dlove.SAKURA.ne.jp&d=DwIC-g&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=iraTSVk5qLTsuZU7WSBk97YAYoGmC7W5zjR2wwDRVVk&s=bX-RtB9qE168yR2AjMRvRvln1Pn6r8fmNUDQydGWIdk&e= 
> 	https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1490352808-2D7187-2D1-2Dgit-2Dsend-2Demail-2Dpenguin-2Dkernel-40I-2Dlove.SAKURA.ne.jp&d=DwIC-g&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=iraTSVk5qLTsuZU7WSBk97YAYoGmC7W5zjR2wwDRVVk&s=jw45iN1ypIQrzl08wkan3QZkuU6Gu0riU4_PvZD8KXQ&e= 
> ]
>
> On 03/24/2017 07:17 PM, Matthew Wilcox wrote:
>> On Fri, Mar 24, 2017 at 06:05:45PM +0300, Andrey Ryabinin wrote:
>>> Just fix the drm code. There is zero point in releasing memory under spinlock.
>> I disagree.  The spinlock has to be held while deleting from the hash
>> table. 
> And what makes you think so?
>
> There are too places where spinlock held during drm_ht_remove();
>
> 1) The first one is an obvious crap in ttm_object_device_release():
>
> void ttm_object_device_release(struct ttm_object_device **p_tdev)
> {
> 	struct ttm_object_device *tdev = *p_tdev;
>
> 	*p_tdev = NULL;
>
> 	spin_lock(&tdev->object_lock);
> 	drm_ht_remove(&tdev->object_hash);
> 	spin_unlock(&tdev->object_lock);
>
> 	kfree(tdev);
> }
>
> Obviously this spin_lock has no use here and it can be removed. There should
> be no concurrent access to tdev at this point, because that would mean immediate
> use-afte-free.
>
> 2) The second case is in ttm_object_file_release() calls drm_ht_remove() under tfile->lock
> And drm_ht_remove() does:
> void drm_ht_remove(struct drm_open_hash *ht)
> {
> 	if (ht->table) {
> 		kvfree(ht->table);
> 		ht->table = NULL;
> 	}
> }
>
> Let's assume that we have some other code accessing ht->table and racing
> against ttm_object_file_release()->drm_ht_remove().
> This would mean that such code must do the following:
>   a) take spin_lock(&tfile->lock)
>   b) check ht->table for being non-NULL and only after that it can dereference ht->table.
>
> But I don't see any code checking ht->table for NULL. So if race against drm_ht_remove()
> is possible, this code is already broken and this spin_lock doesn't save us from NULL-ptr
> deref.
>
> So, either we already protected from such scenarios (e.g. we are the only owners of tdev/tfile in
> ttm_object_device_release()/ttm_object_file_release()) or this code is already terribly
> broken. Anyways we can just move drm_ht_remove() out of spin_lock()/spin_unlock() section.
>
> Did I miss anything? 
>
>
>> Sure, we could change the API to return the object removed, and
>> then force the caller to free the object that was removed from the hash
>> table outside the lock it's holding, but that's a really inelegant API.
>>
> This won't be required if I'm right.
>
Actually, I've already sent out a patch for internal review to remove
the spinlocks around drm_ht_free().
They are both in destructors so it should be harmless in this particular
case. The reason the locks are there is to avoid upsetting static code
analyzers that think the hash table pointer should be protected because
it is elsewhere in the code.

However, while it is common that acquiring a resource (in this case
vmalloc space) might sleep,  Sleeping while releasing it ishould, IMO in
general be avoided if at all possible. It's quite common to take a lock
around kref_put() and if the destructor needs to sleep that requires
unlocking in it, leading to bad looking code that upsets static
analyzers and requires extra locking cycles.

In addition, if the vfree sleeping is triggered by waiting for a thread
currently blocked by, for example a memory allocation, which is blocked
by the kernel running shrinkers, which call vfree() then we're in a
deadlock situation.

So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
vfree() to be non-atomic is IMO not a good idea if avoidable.

Thanks,

Thomas



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-27 14:10           ` Thomas Hellstrom
@ 2017-03-27 14:29               ` Tetsuo Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-27 14:29 UTC (permalink / raw)
  To: linux-security-module

Thomas Hellstrom wrote:
> So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
> vfree() to be non-atomic is IMO not a good idea if avoidable.

I agree.

I don't know about drm code. But I can find AppArmor code doing
kvfree() from dfa_free() from aa_dfa_free_kref() from kref_put() from
aa_put_dfa() from aa_free_profile() which says

 * If the profile was referenced from a task context, free_profile() will
 * be called from an rcu callback routine, so we must not sleep here.

which means that below changes broke things without properly auditing
all vfree()/kvfree() users.

  commit bf22e37a641327e3 ("mm: add vfree_atomic()")
  commit 0f110a9b956c1678 ("kernel/fork: use vfree_atomic() to free thread stack")
  commit 8d5341a6260a59cf ("x86/ldt: use vfree_atomic() to free ldt entries")
  commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem as potentially sleeping")
  commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex")
  commit 763b218ddfaf5676 ("mm: add preempt points into __purge_vmap_area_lazy()")

Since above commits did not take appropriate proceedure for changing
non-blocking API to blocking API, we must fix vfree() part for 4.10 and 4.11.

Updated patch is at
http://lkml.kernel.org/r/201703271916.FBI69340.SQFtOFVJHOLOMF at I-love.SAKURA.ne.jp .
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
@ 2017-03-27 14:29               ` Tetsuo Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-27 14:29 UTC (permalink / raw)
  To: thellstrom, aryabinin, willy
  Cc: linux-mm, hch, jszhang, joelaf, chris, joaodias, tglx, hpa,
	mingo, dri-devel, airlied, linux-security-module

Thomas Hellstrom wrote:
> So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
> vfree() to be non-atomic is IMO not a good idea if avoidable.

I agree.

I don't know about drm code. But I can find AppArmor code doing
kvfree() from dfa_free() from aa_dfa_free_kref() from kref_put() from
aa_put_dfa() from aa_free_profile() which says

 * If the profile was referenced from a task context, free_profile() will
 * be called from an rcu callback routine, so we must not sleep here.

which means that below changes broke things without properly auditing
all vfree()/kvfree() users.

  commit bf22e37a641327e3 ("mm: add vfree_atomic()")
  commit 0f110a9b956c1678 ("kernel/fork: use vfree_atomic() to free thread stack")
  commit 8d5341a6260a59cf ("x86/ldt: use vfree_atomic() to free ldt entries")
  commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem as potentially sleeping")
  commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex")
  commit 763b218ddfaf5676 ("mm: add preempt points into __purge_vmap_area_lazy()")

Since above commits did not take appropriate proceedure for changing
non-blocking API to blocking API, we must fix vfree() part for 4.10 and 4.11.

Updated patch is at
http://lkml.kernel.org/r/201703271916.FBI69340.SQFtOFVJHOLOMF@I-love.SAKURA.ne.jp .

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-27 14:29               ` Tetsuo Handa
@ 2017-03-27 15:02                 ` Andrey Ryabinin
  -1 siblings, 0 replies; 16+ messages in thread
From: Andrey Ryabinin @ 2017-03-27 15:02 UTC (permalink / raw)
  To: linux-security-module



On 03/27/2017 05:29 PM, Tetsuo Handa wrote:
> Thomas Hellstrom wrote:
>> So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
>> vfree() to be non-atomic is IMO not a good idea if avoidable.
> 
> I agree.
> 
> I don't know about drm code. But I can find AppArmor code doing
> kvfree() from dfa_free() from aa_dfa_free_kref() from kref_put() from
> aa_put_dfa() from aa_free_profile() which says
> 
>  * If the profile was referenced from a task context, free_profile() will
>  * be called from an rcu callback routine, so we must not sleep here.
> 

It's safe to call vfree() from rcu callback as in any other interrupt context.
Commits you listed bellow didn't change anything in that respect.
They made impossible to call vfree() under stuff like preempt_disable()/spin_lock()


> which means that below changes broke things without properly auditing
> all vfree()/kvfree() users.
> 
>   commit bf22e37a641327e3 ("mm: add vfree_atomic()")
>   commit 0f110a9b956c1678 ("kernel/fork: use vfree_atomic() to free thread stack")
>   commit 8d5341a6260a59cf ("x86/ldt: use vfree_atomic() to free ldt entries")
>   commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem as potentially sleeping")
>   commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex")
>   commit 763b218ddfaf5676 ("mm: add preempt points into __purge_vmap_area_lazy()")
> 
> Since above commits did not take appropriate proceedure for changing
> non-blocking API to blocking API, we must fix vfree() part for 4.10 and 4.11.
> 
> Updated patch is at
> http://lkml.kernel.org/r/201703271916.FBI69340.SQFtOFVJHOLOMF at I-love.SAKURA.ne.jp .
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
@ 2017-03-27 15:02                 ` Andrey Ryabinin
  0 siblings, 0 replies; 16+ messages in thread
From: Andrey Ryabinin @ 2017-03-27 15:02 UTC (permalink / raw)
  To: Tetsuo Handa, thellstrom, willy
  Cc: linux-mm, hch, jszhang, joelaf, chris, joaodias, tglx, hpa,
	mingo, dri-devel, airlied, linux-security-module



On 03/27/2017 05:29 PM, Tetsuo Handa wrote:
> Thomas Hellstrom wrote:
>> So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
>> vfree() to be non-atomic is IMO not a good idea if avoidable.
> 
> I agree.
> 
> I don't know about drm code. But I can find AppArmor code doing
> kvfree() from dfa_free() from aa_dfa_free_kref() from kref_put() from
> aa_put_dfa() from aa_free_profile() which says
> 
>  * If the profile was referenced from a task context, free_profile() will
>  * be called from an rcu callback routine, so we must not sleep here.
> 

It's safe to call vfree() from rcu callback as in any other interrupt context.
Commits you listed bellow didn't change anything in that respect.
They made impossible to call vfree() under stuff like preempt_disable()/spin_lock()


> which means that below changes broke things without properly auditing
> all vfree()/kvfree() users.
> 
>   commit bf22e37a641327e3 ("mm: add vfree_atomic()")
>   commit 0f110a9b956c1678 ("kernel/fork: use vfree_atomic() to free thread stack")
>   commit 8d5341a6260a59cf ("x86/ldt: use vfree_atomic() to free ldt entries")
>   commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem as potentially sleeping")
>   commit f9e09977671b618a ("mm: turn vmap_purge_lock into a mutex")
>   commit 763b218ddfaf5676 ("mm: add preempt points into __purge_vmap_area_lazy()")
> 
> Since above commits did not take appropriate proceedure for changing
> non-blocking API to blocking API, we must fix vfree() part for 4.10 and 4.11.
> 
> Updated patch is at
> http://lkml.kernel.org/r/201703271916.FBI69340.SQFtOFVJHOLOMF@I-love.SAKURA.ne.jp .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
  2017-03-27 15:02                 ` Andrey Ryabinin
@ 2017-03-28 10:07                   ` Tetsuo Handa
  -1 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-28 10:07 UTC (permalink / raw)
  To: linux-security-module

Andrey Ryabinin wrote:
> It's safe to call vfree() from rcu callback as in any other interrupt context.
> Commits you listed bellow didn't change anything in that respect.
> They made impossible to call vfree() under stuff like preempt_disable()/spin_lock()

I still cannot catch. According to test results shown below, calling vfree() from
RCU callback is permitted but calling vfree() from RCU read section is no longer
permitted.

---------- test.c ----------
#include <linux/module.h>
static int test_init(void)
{
        rcu_read_lock();
        might_sleep();
        rcu_read_unlock();
        return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
---------- test.c ----------

[   49.729561] test: loading out-of-tree module taints kernel.
[   49.738211] test: module verification failed: signature and/or required key missing - tainting kernel

[   49.757855] ===============================
[   49.764038] [ ERR: suspicious RCU usage.  ]
[   49.767222] 4.11.0-rc4+ #210 Tainted: G           OE
[   49.770216] -------------------------------
[   49.772668] ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
[   49.777915]
other info that might help us debug this:

[   49.782562]
rcu_scheduler_active = 2, debug_locks = 0
[   49.786336] 1 lock held by insmod/2332:
[   49.788596]  #0:  (rcu_read_lock){......}, at: [<ffffffffc07c7005>] test_init+0x5/0x1000 [test]
[   49.793632]
stack backtrace:
[   49.796190] CPU: 2 PID: 2332 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   49.798830] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   49.802213] Call Trace:
[   49.803010]  dump_stack+0x85/0xc9
[   49.804079]  lockdep_rcu_suspicious+0xe7/0x120
[   49.805503]  ___might_sleep+0xac/0x250
[   49.806704]  __might_sleep+0x4a/0x90
[   49.807852]  ? 0xffffffffc07c7000
[   49.808921]  test_init+0x5c/0x1000 [test]
[   49.810212]  ? test_init+0x5/0x1000 [test]
[   49.811526]  do_one_initcall+0x51/0x1c0
[   49.813014]  ? do_init_module+0x27/0x1fc
[   49.814370]  ? rcu_read_lock_sched_held+0x98/0xa0
[   49.815882]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   49.817389]  ? do_init_module+0x27/0x1fc
[   49.818651]  do_init_module+0x60/0x1fc
[   49.819880]  load_module+0x23b4/0x2a00
[   49.821087]  ? __symbol_put+0x70/0x70
[   49.822271]  ? vfs_read+0x12b/0x180
[   49.823401]  SYSC_finit_module+0xa6/0xf0
[   49.824667]  SyS_finit_module+0xe/0x10
[   49.825894]  do_syscall_64+0x6c/0x200
[   49.827074]  entry_SYSCALL64_slow_path+0x25/0x25
[   49.829239] RIP: 0033:0x7fdeb1f52bf9
[   49.830540] RSP: 002b:00007fff1f355b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   49.833255] RAX: ffffffffffffffda RBX: 00000000014e71f0 RCX: 00007fdeb1f52bf9
[   49.835586] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   49.838370] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007fff1f355cc8
[   49.841099] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   49.843994] R13: 00000000014e6130 R14: 0000000000000000 R15: 0000000000000000
[   49.846756] BUG: sleeping function called from invalid context at /data/linux/akari/test.c:5
[   49.850727] in_atomic(): 1, irqs_disabled(): 0, pid: 2332, name: insmod
[   49.853353] INFO: lockdep is turned off.
[   49.855088] CPU: 2 PID: 2332 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   49.858013] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   49.861940] Call Trace:
[   49.863217]  dump_stack+0x85/0xc9
[   49.864744]  ___might_sleep+0x184/0x250
[   49.866420]  __might_sleep+0x4a/0x90
[   49.868003]  ? 0xffffffffc07c7000
[   49.869502]  test_init+0x5c/0x1000 [test]
[   49.871236]  ? test_init+0x5/0x1000 [test]
[   49.872983]  do_one_initcall+0x51/0x1c0
[   49.874796]  ? do_init_module+0x27/0x1fc
[   49.876486]  ? rcu_read_lock_sched_held+0x98/0xa0
[   49.878413]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   49.880330]  ? do_init_module+0x27/0x1fc
[   49.882006]  do_init_module+0x60/0x1fc
[   49.883623]  load_module+0x23b4/0x2a00
[   49.885225]  ? __symbol_put+0x70/0x70
[   49.886784]  ? vfs_read+0x12b/0x180
[   49.888291]  SYSC_finit_module+0xa6/0xf0
[   49.890025]  SyS_finit_module+0xe/0x10
[   49.891611]  do_syscall_64+0x6c/0x200
[   49.893168]  entry_SYSCALL64_slow_path+0x25/0x25
[   49.895027] RIP: 0033:0x7fdeb1f52bf9
[   49.896561] RSP: 002b:00007fff1f355b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   49.899355] RAX: ffffffffffffffda RBX: 00000000014e71f0 RCX: 00007fdeb1f52bf9
[   49.902020] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   49.904683] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007fff1f355cc8
[   49.907510] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   49.910180] R13: 00000000014e6130 R14: 0000000000000000 R15: 0000000000000000

---------- test.c ----------
#include <linux/module.h>
#include <linux/sched.h>
static int test_init(void)
{
        static DEFINE_SPINLOCK(lock);
        rcu_read_lock();
        spin_lock(&lock);
        cond_resched_lock(&lock);
        spin_unlock(&lock);
        rcu_read_unlock();
        return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
---------- test.c ----------

[   66.548461] test: loading out-of-tree module taints kernel.
[   66.551894] test: module verification failed: signature and/or required key missing - tainting kernel

[   66.560299] ===============================
[   66.562838] [ ERR: suspicious RCU usage.  ]
[   66.565395] 4.11.0-rc4+ #210 Tainted: G           OE
[   66.568494] -------------------------------
[   66.571012] ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
[   66.576384]
other info that might help us debug this:

[   66.579234]
rcu_scheduler_active = 2, debug_locks = 0
[   66.581505] 2 locks held by insmod/2336:
[   66.582897]  #0:  (rcu_read_lock){......}, at: [<ffffffffc0543005>] test_init+0x5/0x1000 [test]
[   66.586253]  #1:  (lock#4){+.+...}, at: [<ffffffffc0543055>] test_init+0x55/0x1000 [test]
[   66.589135]
stack backtrace:
[   66.590919] CPU: 0 PID: 2336 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   66.593842] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   66.597716] Call Trace:
[   66.598598]  dump_stack+0x85/0xc9
[   66.599749]  lockdep_rcu_suspicious+0xe7/0x120
[   66.601289]  ___might_sleep+0xac/0x250
[   66.602585]  ? 0xffffffffc0543000
[   66.603735]  test_init+0x6b/0x1000 [test]
[   66.605121]  ? test_init+0x5/0x1000 [test]
[   66.606520]  do_one_initcall+0x51/0x1c0
[   66.607850]  ? do_init_module+0x27/0x1fc
[   66.609245]  ? rcu_read_lock_sched_held+0x98/0xa0
[   66.610855]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   66.612489]  ? do_init_module+0x27/0x1fc
[   66.613868]  do_init_module+0x60/0x1fc
[   66.615195]  load_module+0x23b4/0x2a00
[   66.616512]  ? __symbol_put+0x70/0x70
[   66.617816]  ? vfs_read+0x12b/0x180
[   66.619074]  SYSC_finit_module+0xa6/0xf0
[   66.620455]  SyS_finit_module+0xe/0x10
[   66.621780]  do_syscall_64+0x6c/0x200
[   66.623067]  entry_SYSCALL64_slow_path+0x25/0x25
[   66.624937] RIP: 0033:0x7f6e6702bbf9
[   66.626206] RSP: 002b:00007ffdc94c3588 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   66.628901] RAX: ffffffffffffffda RBX: 00000000007421f0 RCX: 00007f6e6702bbf9
[   66.631373] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   66.634430] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007ffdc94c3728
[   66.637428] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   66.640422] R13: 0000000000741130 R14: 0000000000000000 R15: 0000000000000000
[   66.643430] BUG: sleeping function called from invalid context at /data/linux/akari/test.c:8
[   66.646779] in_atomic(): 1, irqs_disabled(): 0, pid: 2336, name: insmod
[   66.649484] INFO: lockdep is turned off.
[   66.651284] CPU: 0 PID: 2336 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   66.654377] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   66.658592] Call Trace:
[   66.659939]  dump_stack+0x85/0xc9
[   66.661562]  ___might_sleep+0x184/0x250
[   66.663385]  ? 0xffffffffc0543000
[   66.664967]  test_init+0x6b/0x1000 [test]
[   66.666812]  ? test_init+0x5/0x1000 [test]
[   66.668636]  do_one_initcall+0x51/0x1c0
[   66.670399]  ? do_init_module+0x27/0x1fc
[   66.672170]  ? rcu_read_lock_sched_held+0x98/0xa0
[   66.674256]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   66.676308]  ? do_init_module+0x27/0x1fc
[   66.678056]  do_init_module+0x60/0x1fc
[   66.679728]  load_module+0x23b4/0x2a00
[   66.681428]  ? __symbol_put+0x70/0x70
[   66.683100]  ? vfs_read+0x12b/0x180
[   66.684720]  SYSC_finit_module+0xa6/0xf0
[   66.686422]  SyS_finit_module+0xe/0x10
[   66.688109]  do_syscall_64+0x6c/0x200
[   66.689776]  entry_SYSCALL64_slow_path+0x25/0x25
[   66.692052] RIP: 0033:0x7f6e6702bbf9
[   66.693687] RSP: 002b:00007ffdc94c3588 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   66.696577] RAX: ffffffffffffffda RBX: 00000000007421f0 RCX: 00007f6e6702bbf9
[   66.699432] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   66.702261] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007ffdc94c3728
[   66.705076] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   66.707990] R13: 0000000000741130 R14: 0000000000000000 R15: 0000000000000000



Also, if we try below change like suggested at
http://lkml.kernel.org/r/20170323152949.GA29134 at bombadil.infradead.org ,
we get below warnings.
Aren't there vfree()/kvfree() users who are not ready to handle these changes?

----------
diff --git a/mm/util.c b/mm/util.c
index 656dc5e..2a2ef72 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -331,6 +331,12 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 
 void kvfree(const void *addr)
 {
+	/* Detect errors before kvmalloc() falls back to vmalloc(). */
+	if (addr) {
+		WARN_ON(in_nmi());
+		if (likely(!in_interrupt()))
+			might_sleep();
+	}
 	if (is_vmalloc_addr(addr))
 		vfree(addr);
 	else
----------

[   23.635540] BUG: sleeping function called from invalid context at mm/util.c:338
[   23.638701] in_atomic(): 1, irqs_disabled(): 0, pid: 478, name: kworker/0:1H
[   23.641516] 3 locks held by kworker/0:1H/478:
[   23.643476]  #0:  ("xfs-log/%s"mp->m_fsname){.+.+..}, at: [<ffffffffb20d1e64>] process_one_work+0x194/0x6c0
[   23.647176]  #1:  ((&bp->b_ioend_work)){+.+...}, at: [<ffffffffb20d1e64>] process_one_work+0x194/0x6c0
[   23.650939]  #2:  (&(&pag->pagb_lock)->rlock){+.+...}, at: [<ffffffffc02b42ee>] xfs_extent_busy_clear+0x9e/0xe0 [xfs]
[   23.655132] CPU: 0 PID: 478 Comm: kworker/0:1H Not tainted 4.11.0-rc4+ #212
[   23.657974] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   23.662041] Workqueue: xfs-log/sda1 xfs_buf_ioend_work [xfs]
[   23.664463] Call Trace:
[   23.665866]  dump_stack+0x85/0xc9
[   23.667538]  ___might_sleep+0x184/0x250
[   23.669371]  __might_sleep+0x4a/0x90
[   23.671137]  kvfree+0x41/0x90
[   23.672748]  xfs_extent_busy_clear_one+0x51/0x190 [xfs]
[   23.675110]  xfs_extent_busy_clear+0xbb/0xe0 [xfs]
[   23.677278]  xlog_cil_committed+0x241/0x420 [xfs]
[   23.679431]  xlog_state_do_callback+0x170/0x2d0 [xfs]
[   23.681717]  xlog_state_done_syncing+0x7f/0xa0 [xfs]
[   23.683971]  ? xfs_buf_ioend_work+0x15/0x20 [xfs]
[   23.686112]  xlog_iodone+0x86/0xc0 [xfs]
[   23.688007]  xfs_buf_ioend+0xd3/0x440 [xfs]
[   23.689999]  xfs_buf_ioend_work+0x15/0x20 [xfs]
[   23.692060]  process_one_work+0x21c/0x6c0
[   23.694177]  ? process_one_work+0x194/0x6c0
[   23.696120]  worker_thread+0x137/0x4b0
[   23.697973]  kthread+0x10f/0x150
[   23.699607]  ? process_one_work+0x6c0/0x6c0
[   23.701561]  ? kthread_create_on_node+0x70/0x70
[   23.703617]  ret_from_fork+0x31/0x40
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
@ 2017-03-28 10:07                   ` Tetsuo Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-03-28 10:07 UTC (permalink / raw)
  To: aryabinin, thellstrom, willy
  Cc: linux-mm, hch, jszhang, joelaf, chris, joaodias, tglx, hpa,
	mingo, dri-devel, airlied, linux-security-module

Andrey Ryabinin wrote:
> It's safe to call vfree() from rcu callback as in any other interrupt context.
> Commits you listed bellow didn't change anything in that respect.
> They made impossible to call vfree() under stuff like preempt_disable()/spin_lock()

I still cannot catch. According to test results shown below, calling vfree() from
RCU callback is permitted but calling vfree() from RCU read section is no longer
permitted.

---------- test.c ----------
#include <linux/module.h>
static int test_init(void)
{
        rcu_read_lock();
        might_sleep();
        rcu_read_unlock();
        return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
---------- test.c ----------

[   49.729561] test: loading out-of-tree module taints kernel.
[   49.738211] test: module verification failed: signature and/or required key missing - tainting kernel

[   49.757855] ===============================
[   49.764038] [ ERR: suspicious RCU usage.  ]
[   49.767222] 4.11.0-rc4+ #210 Tainted: G           OE
[   49.770216] -------------------------------
[   49.772668] ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
[   49.777915]
other info that might help us debug this:

[   49.782562]
rcu_scheduler_active = 2, debug_locks = 0
[   49.786336] 1 lock held by insmod/2332:
[   49.788596]  #0:  (rcu_read_lock){......}, at: [<ffffffffc07c7005>] test_init+0x5/0x1000 [test]
[   49.793632]
stack backtrace:
[   49.796190] CPU: 2 PID: 2332 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   49.798830] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   49.802213] Call Trace:
[   49.803010]  dump_stack+0x85/0xc9
[   49.804079]  lockdep_rcu_suspicious+0xe7/0x120
[   49.805503]  ___might_sleep+0xac/0x250
[   49.806704]  __might_sleep+0x4a/0x90
[   49.807852]  ? 0xffffffffc07c7000
[   49.808921]  test_init+0x5c/0x1000 [test]
[   49.810212]  ? test_init+0x5/0x1000 [test]
[   49.811526]  do_one_initcall+0x51/0x1c0
[   49.813014]  ? do_init_module+0x27/0x1fc
[   49.814370]  ? rcu_read_lock_sched_held+0x98/0xa0
[   49.815882]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   49.817389]  ? do_init_module+0x27/0x1fc
[   49.818651]  do_init_module+0x60/0x1fc
[   49.819880]  load_module+0x23b4/0x2a00
[   49.821087]  ? __symbol_put+0x70/0x70
[   49.822271]  ? vfs_read+0x12b/0x180
[   49.823401]  SYSC_finit_module+0xa6/0xf0
[   49.824667]  SyS_finit_module+0xe/0x10
[   49.825894]  do_syscall_64+0x6c/0x200
[   49.827074]  entry_SYSCALL64_slow_path+0x25/0x25
[   49.829239] RIP: 0033:0x7fdeb1f52bf9
[   49.830540] RSP: 002b:00007fff1f355b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   49.833255] RAX: ffffffffffffffda RBX: 00000000014e71f0 RCX: 00007fdeb1f52bf9
[   49.835586] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   49.838370] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007fff1f355cc8
[   49.841099] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   49.843994] R13: 00000000014e6130 R14: 0000000000000000 R15: 0000000000000000
[   49.846756] BUG: sleeping function called from invalid context at /data/linux/akari/test.c:5
[   49.850727] in_atomic(): 1, irqs_disabled(): 0, pid: 2332, name: insmod
[   49.853353] INFO: lockdep is turned off.
[   49.855088] CPU: 2 PID: 2332 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   49.858013] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   49.861940] Call Trace:
[   49.863217]  dump_stack+0x85/0xc9
[   49.864744]  ___might_sleep+0x184/0x250
[   49.866420]  __might_sleep+0x4a/0x90
[   49.868003]  ? 0xffffffffc07c7000
[   49.869502]  test_init+0x5c/0x1000 [test]
[   49.871236]  ? test_init+0x5/0x1000 [test]
[   49.872983]  do_one_initcall+0x51/0x1c0
[   49.874796]  ? do_init_module+0x27/0x1fc
[   49.876486]  ? rcu_read_lock_sched_held+0x98/0xa0
[   49.878413]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   49.880330]  ? do_init_module+0x27/0x1fc
[   49.882006]  do_init_module+0x60/0x1fc
[   49.883623]  load_module+0x23b4/0x2a00
[   49.885225]  ? __symbol_put+0x70/0x70
[   49.886784]  ? vfs_read+0x12b/0x180
[   49.888291]  SYSC_finit_module+0xa6/0xf0
[   49.890025]  SyS_finit_module+0xe/0x10
[   49.891611]  do_syscall_64+0x6c/0x200
[   49.893168]  entry_SYSCALL64_slow_path+0x25/0x25
[   49.895027] RIP: 0033:0x7fdeb1f52bf9
[   49.896561] RSP: 002b:00007fff1f355b28 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   49.899355] RAX: ffffffffffffffda RBX: 00000000014e71f0 RCX: 00007fdeb1f52bf9
[   49.902020] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   49.904683] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007fff1f355cc8
[   49.907510] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   49.910180] R13: 00000000014e6130 R14: 0000000000000000 R15: 0000000000000000

---------- test.c ----------
#include <linux/module.h>
#include <linux/sched.h>
static int test_init(void)
{
        static DEFINE_SPINLOCK(lock);
        rcu_read_lock();
        spin_lock(&lock);
        cond_resched_lock(&lock);
        spin_unlock(&lock);
        rcu_read_unlock();
        return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
---------- test.c ----------

[   66.548461] test: loading out-of-tree module taints kernel.
[   66.551894] test: module verification failed: signature and/or required key missing - tainting kernel

[   66.560299] ===============================
[   66.562838] [ ERR: suspicious RCU usage.  ]
[   66.565395] 4.11.0-rc4+ #210 Tainted: G           OE
[   66.568494] -------------------------------
[   66.571012] ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
[   66.576384]
other info that might help us debug this:

[   66.579234]
rcu_scheduler_active = 2, debug_locks = 0
[   66.581505] 2 locks held by insmod/2336:
[   66.582897]  #0:  (rcu_read_lock){......}, at: [<ffffffffc0543005>] test_init+0x5/0x1000 [test]
[   66.586253]  #1:  (lock#4){+.+...}, at: [<ffffffffc0543055>] test_init+0x55/0x1000 [test]
[   66.589135]
stack backtrace:
[   66.590919] CPU: 0 PID: 2336 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   66.593842] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   66.597716] Call Trace:
[   66.598598]  dump_stack+0x85/0xc9
[   66.599749]  lockdep_rcu_suspicious+0xe7/0x120
[   66.601289]  ___might_sleep+0xac/0x250
[   66.602585]  ? 0xffffffffc0543000
[   66.603735]  test_init+0x6b/0x1000 [test]
[   66.605121]  ? test_init+0x5/0x1000 [test]
[   66.606520]  do_one_initcall+0x51/0x1c0
[   66.607850]  ? do_init_module+0x27/0x1fc
[   66.609245]  ? rcu_read_lock_sched_held+0x98/0xa0
[   66.610855]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   66.612489]  ? do_init_module+0x27/0x1fc
[   66.613868]  do_init_module+0x60/0x1fc
[   66.615195]  load_module+0x23b4/0x2a00
[   66.616512]  ? __symbol_put+0x70/0x70
[   66.617816]  ? vfs_read+0x12b/0x180
[   66.619074]  SYSC_finit_module+0xa6/0xf0
[   66.620455]  SyS_finit_module+0xe/0x10
[   66.621780]  do_syscall_64+0x6c/0x200
[   66.623067]  entry_SYSCALL64_slow_path+0x25/0x25
[   66.624937] RIP: 0033:0x7f6e6702bbf9
[   66.626206] RSP: 002b:00007ffdc94c3588 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   66.628901] RAX: ffffffffffffffda RBX: 00000000007421f0 RCX: 00007f6e6702bbf9
[   66.631373] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   66.634430] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007ffdc94c3728
[   66.637428] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   66.640422] R13: 0000000000741130 R14: 0000000000000000 R15: 0000000000000000
[   66.643430] BUG: sleeping function called from invalid context at /data/linux/akari/test.c:8
[   66.646779] in_atomic(): 1, irqs_disabled(): 0, pid: 2336, name: insmod
[   66.649484] INFO: lockdep is turned off.
[   66.651284] CPU: 0 PID: 2336 Comm: insmod Tainted: G           OE   4.11.0-rc4+ #210
[   66.654377] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   66.658592] Call Trace:
[   66.659939]  dump_stack+0x85/0xc9
[   66.661562]  ___might_sleep+0x184/0x250
[   66.663385]  ? 0xffffffffc0543000
[   66.664967]  test_init+0x6b/0x1000 [test]
[   66.666812]  ? test_init+0x5/0x1000 [test]
[   66.668636]  do_one_initcall+0x51/0x1c0
[   66.670399]  ? do_init_module+0x27/0x1fc
[   66.672170]  ? rcu_read_lock_sched_held+0x98/0xa0
[   66.674256]  ? kmem_cache_alloc_trace+0x278/0x2e0
[   66.676308]  ? do_init_module+0x27/0x1fc
[   66.678056]  do_init_module+0x60/0x1fc
[   66.679728]  load_module+0x23b4/0x2a00
[   66.681428]  ? __symbol_put+0x70/0x70
[   66.683100]  ? vfs_read+0x12b/0x180
[   66.684720]  SYSC_finit_module+0xa6/0xf0
[   66.686422]  SyS_finit_module+0xe/0x10
[   66.688109]  do_syscall_64+0x6c/0x200
[   66.689776]  entry_SYSCALL64_slow_path+0x25/0x25
[   66.692052] RIP: 0033:0x7f6e6702bbf9
[   66.693687] RSP: 002b:00007ffdc94c3588 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   66.696577] RAX: ffffffffffffffda RBX: 00000000007421f0 RCX: 00007f6e6702bbf9
[   66.699432] RDX: 0000000000000000 RSI: 000000000041a2d8 RDI: 0000000000000003
[   66.702261] RBP: 000000000041a2d8 R08: 0000000000000000 R09: 00007ffdc94c3728
[   66.705076] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   66.707990] R13: 0000000000741130 R14: 0000000000000000 R15: 0000000000000000



Also, if we try below change like suggested at
http://lkml.kernel.org/r/20170323152949.GA29134@bombadil.infradead.org ,
we get below warnings.
Aren't there vfree()/kvfree() users who are not ready to handle these changes?

----------
diff --git a/mm/util.c b/mm/util.c
index 656dc5e..2a2ef72 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -331,6 +331,12 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 
 void kvfree(const void *addr)
 {
+	/* Detect errors before kvmalloc() falls back to vmalloc(). */
+	if (addr) {
+		WARN_ON(in_nmi());
+		if (likely(!in_interrupt()))
+			might_sleep();
+	}
 	if (is_vmalloc_addr(addr))
 		vfree(addr);
 	else
----------

[   23.635540] BUG: sleeping function called from invalid context at mm/util.c:338
[   23.638701] in_atomic(): 1, irqs_disabled(): 0, pid: 478, name: kworker/0:1H
[   23.641516] 3 locks held by kworker/0:1H/478:
[   23.643476]  #0:  ("xfs-log/%s"mp->m_fsname){.+.+..}, at: [<ffffffffb20d1e64>] process_one_work+0x194/0x6c0
[   23.647176]  #1:  ((&bp->b_ioend_work)){+.+...}, at: [<ffffffffb20d1e64>] process_one_work+0x194/0x6c0
[   23.650939]  #2:  (&(&pag->pagb_lock)->rlock){+.+...}, at: [<ffffffffc02b42ee>] xfs_extent_busy_clear+0x9e/0xe0 [xfs]
[   23.655132] CPU: 0 PID: 478 Comm: kworker/0:1H Not tainted 4.11.0-rc4+ #212
[   23.657974] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   23.662041] Workqueue: xfs-log/sda1 xfs_buf_ioend_work [xfs]
[   23.664463] Call Trace:
[   23.665866]  dump_stack+0x85/0xc9
[   23.667538]  ___might_sleep+0x184/0x250
[   23.669371]  __might_sleep+0x4a/0x90
[   23.671137]  kvfree+0x41/0x90
[   23.672748]  xfs_extent_busy_clear_one+0x51/0x190 [xfs]
[   23.675110]  xfs_extent_busy_clear+0xbb/0xe0 [xfs]
[   23.677278]  xlog_cil_committed+0x241/0x420 [xfs]
[   23.679431]  xlog_state_do_callback+0x170/0x2d0 [xfs]
[   23.681717]  xlog_state_done_syncing+0x7f/0xa0 [xfs]
[   23.683971]  ? xfs_buf_ioend_work+0x15/0x20 [xfs]
[   23.686112]  xlog_iodone+0x86/0xc0 [xfs]
[   23.688007]  xfs_buf_ioend+0xd3/0x440 [xfs]
[   23.689999]  xfs_buf_ioend_work+0x15/0x20 [xfs]
[   23.692060]  process_one_work+0x21c/0x6c0
[   23.694177]  ? process_one_work+0x194/0x6c0
[   23.696120]  worker_thread+0x137/0x4b0
[   23.697973]  kthread+0x10f/0x150
[   23.699607]  ? process_one_work+0x6c0/0x6c0
[   23.701561]  ? kthread_create_on_node+0x70/0x70
[   23.703617]  ret_from_fork+0x31/0x40

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-03-28 10:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-24 10:53 [PATCH] mm: Remove pointless might_sleep() in remove_vm_area() Tetsuo Handa
2017-03-24 12:22 ` Andrey Ryabinin
2017-03-24 12:40   ` Tetsuo Handa
2017-03-24 15:05     ` Andrey Ryabinin
2017-03-24 16:17       ` Matthew Wilcox
2017-03-27 13:26         ` Andrey Ryabinin
2017-03-27 14:06           ` Matthew Wilcox
2017-03-27 14:10           ` Thomas Hellstrom
2017-03-27 14:29             ` Tetsuo Handa
2017-03-27 14:29               ` Tetsuo Handa
2017-03-27 15:02               ` Andrey Ryabinin
2017-03-27 15:02                 ` Andrey Ryabinin
2017-03-28 10:07                 ` Tetsuo Handa
2017-03-28 10:07                   ` Tetsuo Handa
2017-03-24 22:47       ` Tetsuo Handa
2017-03-27 10:16         ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.