linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible deadlock in uprobe_clear_state (2)
@ 2020-07-15 19:59 syzbot
  2020-07-15 23:36 ` [PATCH] binder: Don't use mmput() from shrinker function Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2020-07-15 19:59 UTC (permalink / raw)
  To: acme, alexander.shishkin, jolsa, linux-kernel, mark.rutland,
	mingo, namhyung, peterz, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    a581387e Merge tag 'io_uring-5.8-2020-07-10' of git://git...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=111a004f100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=e5344baa319c9a96edec
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc4-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.1/4807 is trying to acquire lock:
ffffffff89c3a988 (delayed_uprobe_lock){+.+.}-{3:3}, at: uprobe_clear_state+0x47/0x3b0 kernel/events/uprobes.c:1550

but task is already holding lock:
ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
       __fs_reclaim_acquire mm/page_alloc.c:4183 [inline]
       fs_reclaim_acquire+0x2f/0x40 mm/page_alloc.c:4194
       slab_pre_alloc_hook mm/slab.h:564 [inline]
       slab_alloc mm/slab.c:3306 [inline]
       kmem_cache_alloc_trace+0x29/0x2d0 mm/slab.c:3549
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       delayed_uprobe_add kernel/events/uprobes.c:304 [inline]
       update_ref_ctr+0x4b6/0x700 kernel/events/uprobes.c:438
       uprobe_write_opcode+0xe07/0x1650 kernel/events/uprobes.c:497
       install_breakpoint kernel/events/uprobes.c:915 [inline]
       install_breakpoint.isra.0+0x5a5/0x7c0 kernel/events/uprobes.c:897
       uprobe_mmap+0x5d7/0x1050 kernel/events/uprobes.c:1394
       mmap_region+0x5cf/0x1590 mm/mmap.c:1818
       do_mmap+0xca8/0x1170 mm/mmap.c:1545
       do_mmap_pgoff include/linux/mm.h:2596 [inline]
       vm_mmap_pgoff+0x197/0x200 mm/util.c:506
       ksys_mmap_pgoff+0x455/0x5a0 mm/mmap.c:1595
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (delayed_uprobe_lock){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2496 [inline]
       check_prevs_add kernel/locking/lockdep.c:2601 [inline]
       validate_chain kernel/locking/lockdep.c:3218 [inline]
       __lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
       lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
       __mutex_lock_common kernel/locking/mutex.c:956 [inline]
       __mutex_lock+0x134/0x10d0 kernel/locking/mutex.c:1103
       uprobe_clear_state+0x47/0x3b0 kernel/events/uprobes.c:1550
       __mmput+0x73/0x470 kernel/fork.c:1089
       mmput+0x53/0x60 kernel/fork.c:1114
       binder_alloc_free_page+0x441/0xf90 drivers/android/binder_alloc.c:950
       __list_lru_walk_one+0x178/0x5c0 mm/list_lru.c:222
       list_lru_walk_one mm/list_lru.c:266 [inline]
       list_lru_walk_node+0x67/0x2a0 mm/list_lru.c:295
       list_lru_walk include/linux/list_lru.h:215 [inline]
       binder_shrink_scan+0x123/0x190 drivers/android/binder_alloc.c:984
       do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
       shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
       shrink_node_memcgs mm/vmscan.c:2658 [inline]
       shrink_node+0x519/0x1b60 mm/vmscan.c:2770
       shrink_zones mm/vmscan.c:2973 [inline]
       do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
       try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
       __perform_reclaim mm/page_alloc.c:4223 [inline]
       __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
       __alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
       __alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
       alloc_pages_current+0x187/0x280 mm/mempolicy.c:2292
       alloc_pages include/linux/gfp.h:545 [inline]
       alloc_mmu_pages+0x7f/0x170 arch/x86/kvm/mmu/mmu.c:5671
       kvm_mmu_create+0x3cb/0x560 arch/x86/kvm/mmu/mmu.c:5704
       kvm_arch_vcpu_create+0x16d/0xb70 arch/x86/kvm/x86.c:9433
       kvm_vm_ioctl_create_vcpu arch/x86/kvm/../../../virt/kvm/kvm_main.c:3060 [inline]
       kvm_vm_ioctl+0x1547/0x23c0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3621
       vfs_ioctl fs/ioctl.c:48 [inline]
       ksys_ioctl+0x11a/0x180 fs/ioctl.c:753
       __do_sys_ioctl fs/ioctl.c:762 [inline]
       __se_sys_ioctl fs/ioctl.c:760 [inline]
       __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:760
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(delayed_uprobe_lock);
                               lock(fs_reclaim);
  lock(delayed_uprobe_lock);

 *** DEADLOCK ***

3 locks held by syz-executor.1/4807:
 #0: ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
 #0: ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
 #0: ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
 #0: ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
 #0: ffffffff89c6c360 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650
 #1: ffffffff89c46b90 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc7/0x5c0 mm/vmscan.c:669
 #2: ffff88808ea481e0 (&alloc->mutex){+.+.}-{3:3}, at: binder_alloc_free_page+0x4f/0xf90 drivers/android/binder_alloc.c:923

stack backtrace:
CPU: 0 PID: 4807 Comm: syz-executor.1 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
 check_prev_add kernel/locking/lockdep.c:2496 [inline]
 check_prevs_add kernel/locking/lockdep.c:2601 [inline]
 validate_chain kernel/locking/lockdep.c:3218 [inline]
 __lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
 __mutex_lock_common kernel/locking/mutex.c:956 [inline]
 __mutex_lock+0x134/0x10d0 kernel/locking/mutex.c:1103
 uprobe_clear_state+0x47/0x3b0 kernel/events/uprobes.c:1550
 __mmput+0x73/0x470 kernel/fork.c:1089
 mmput+0x53/0x60 kernel/fork.c:1114
 binder_alloc_free_page+0x441/0xf90 drivers/android/binder_alloc.c:950
 __list_lru_walk_one+0x178/0x5c0 mm/list_lru.c:222
 list_lru_walk_one mm/list_lru.c:266 [inline]
 list_lru_walk_node+0x67/0x2a0 mm/list_lru.c:295
 list_lru_walk include/linux/list_lru.h:215 [inline]
 binder_shrink_scan+0x123/0x190 drivers/android/binder_alloc.c:984
 do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
 shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
 shrink_node_memcgs mm/vmscan.c:2658 [inline]
 shrink_node+0x519/0x1b60 mm/vmscan.c:2770
 shrink_zones mm/vmscan.c:2973 [inline]
 do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
 try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
 __perform_reclaim mm/page_alloc.c:4223 [inline]
 __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
 __alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
 __alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
 alloc_pages_current+0x187/0x280 mm/mempolicy.c:2292
 alloc_pages include/linux/gfp.h:545 [inline]
 alloc_mmu_pages+0x7f/0x170 arch/x86/kvm/mmu/mmu.c:5671
 kvm_mmu_create+0x3cb/0x560 arch/x86/kvm/mmu/mmu.c:5704
 kvm_arch_vcpu_create+0x16d/0xb70 arch/x86/kvm/x86.c:9433
 kvm_vm_ioctl_create_vcpu arch/x86/kvm/../../../virt/kvm/kvm_main.c:3060 [inline]
 kvm_vm_ioctl+0x1547/0x23c0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3621
 vfs_ioctl fs/ioctl.c:48 [inline]
 ksys_ioctl+0x11a/0x180 fs/ioctl.c:753
 __do_sys_ioctl fs/ioctl.c:762 [inline]
 __se_sys_ioctl fs/ioctl.c:760 [inline]
 __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:760
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cba9
Code: Bad RIP value.
RSP: 002b:00007f45444f8c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004e85c0 RCX: 000000000045cba9
RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 0000000000000004
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000003a3 R14: 00000000004c652f R15: 00007f45444f96d4


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] binder: Don't use mmput() from shrinker function.
  2020-07-15 19:59 possible deadlock in uprobe_clear_state (2) syzbot
@ 2020-07-15 23:36 ` Tetsuo Handa
  2020-07-16  8:35   ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2020-07-15 23:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Michal Hocko
  Cc: syzbot, acme, alexander.shishkin, jolsa, linux-kernel,
	mark.rutland, mingo, namhyung, peterz, syzkaller-bugs,
	open list:ANDROID DRIVERS, linux-mm

syzbot is reporting that mmput() from shrinker function has a risk of
deadlock [1]. Don't start synchronous teardown of mm when called from
shrinker function.

[1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45

Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/android/binder_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 42c672f1584e..cbe6aa77d50d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -947,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		trace_binder_unmap_user_end(alloc, index);
 	}
 	mmap_read_unlock(mm);
-	mmput(mm);
+	mmput_async(mm);
 
 	trace_binder_unmap_kernel_start(alloc, index);
 
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] binder: Don't use mmput() from shrinker function.
  2020-07-15 23:36 ` [PATCH] binder: Don't use mmput() from shrinker function Tetsuo Handa
@ 2020-07-16  8:35   ` Michal Hocko
  2020-07-16 13:41     ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-07-16  8:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On Thu 16-07-20 08:36:52, Tetsuo Handa wrote:
> syzbot is reporting that mmput() from shrinker function has a risk of
> deadlock [1]. Don't start synchronous teardown of mm when called from
> shrinker function.

Please add the actual lock dependency to the changelog.

Anyway is this deadlock real? Mayve I have missed some details but the
call graph points to these two paths.
uprobe_mmap					do_shrink_slab	
  uprobes_mmap_hash #lock
  install_breakpoint				  binder_shrink_scan
    set_swbp					    binder_alloc_free_page
      uprobe_write_opcode			      __mmput
	update_ref_ctr				        uprobe_clear_state
    	  mutex_lock(&delayed_uprobe_lock)	          mutex_lock(&delayed_uprobe_lock);
	    allocation -> reclaim

But in order for this to happen the shrinker would have to do the last
put on the mm. But mm cannot go away from under uprobe_mmap so those two
paths cannot race with each other.

Unless I am missing something this is a false positive. I do not mind
using mmput_async from the shrinker as a workaround but the changelog
should be explicit about the fact.

> [1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45
> 
> Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
> Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  drivers/android/binder_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 42c672f1584e..cbe6aa77d50d 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -947,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
>  	mmap_read_unlock(mm);
> -	mmput(mm);
> +	mmput_async(mm);
>  
>  	trace_binder_unmap_kernel_start(alloc, index);
>  
> -- 
> 2.18.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] binder: Don't use mmput() from shrinker function.
  2020-07-16  8:35   ` Michal Hocko
@ 2020-07-16 13:41     ` Tetsuo Handa
  2020-07-16 13:54       ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2020-07-16 13:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On 2020/07/16 17:35, Michal Hocko wrote:
> On Thu 16-07-20 08:36:52, Tetsuo Handa wrote:
>> syzbot is reporting that mmput() from shrinker function has a risk of
>> deadlock [1]. Don't start synchronous teardown of mm when called from
>> shrinker function.
> 
> Please add the actual lock dependency to the changelog.
> 
> Anyway is this deadlock real? Mayve I have missed some details but the
> call graph points to these two paths.
> uprobe_mmap					do_shrink_slab	
>   uprobes_mmap_hash #lock
>   install_breakpoint				  binder_shrink_scan
>     set_swbp					    binder_alloc_free_page
>       uprobe_write_opcode			      __mmput
> 	update_ref_ctr				        uprobe_clear_state
>     	  mutex_lock(&delayed_uprobe_lock)	          mutex_lock(&delayed_uprobe_lock);
> 	    allocation -> reclaim
> 

static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, short d) {
  mutex_lock(&delayed_uprobe_lock);
  ret = delayed_uprobe_add(uprobe, mm1) {
    du = kzalloc(sizeof(*du), GFP_KERNEL) {
      do_shrink_slab() {
        binder_shrink_scan() {
          binder_alloc_free_page() {
            mmget_not_zero(mm2);
            mmput(mm2) {
              __mmput(mm2) {
                uprobe_clear_state(mm2) {
                  mutex_lock(&delayed_uprobe_lock);
                  delayed_uprobe_remove(NULL, mm2);
                  mutex_unlock(&delayed_uprobe_lock);
                }
              }
            }
          }
        }
      }
    }
  }
  mutex_unlock(&delayed_uprobe_lock);
}

> But in order for this to happen the shrinker would have to do the last
> put on the mm. But mm cannot go away from under uprobe_mmap so those two
> paths cannot race with each other.

and mm1 != mm2 is possible, isn't it?

> 
> Unless I am missing something this is a false positive. I do not mind
> using mmput_async from the shrinker as a workaround but the changelog
> should be explicit about the fact.
> 

binder_alloc_free_page() is already using mmput_async() 14 lines later.
It just took 18 months to hit this race for the third time, for it is
quite difficult to let the owner of mm2 to call mmput(mm2) between
binder_alloc_free_page() calls mmget_not_zero(mm2) and mmput(mm2).

The reason I added you is to see whether we can do

 void mmput(struct mm_struct *mm)
 {
 	might_sleep();
+	/* Calling mmput() from shrinker context can deadlock. */
+	WARN_ON(current->flags & PF_MEMALLOC);
 
 	if (atomic_dec_and_test(&mm->mm_users))
 		__mmput(mm);
 }

in order to catch this bug easier.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] binder: Don't use mmput() from shrinker function.
  2020-07-16 13:41     ` Tetsuo Handa
@ 2020-07-16 13:54       ` Michal Hocko
  2020-07-16 15:12         ` [PATCH v2] " Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-07-16 13:54 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On Thu 16-07-20 22:41:14, Tetsuo Handa wrote:
> On 2020/07/16 17:35, Michal Hocko wrote:
[...]
> > But in order for this to happen the shrinker would have to do the last
> > put on the mm. But mm cannot go away from under uprobe_mmap so those two
> > paths cannot race with each other.
> 
> and mm1 != mm2 is possible, isn't it?

OK, I have missed that information. You are right. Can you make this
into the changelog please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] binder: Don't use mmput() from shrinker function.
  2020-07-16 13:54       ` Michal Hocko
@ 2020-07-16 15:12         ` Tetsuo Handa
  2020-07-16 15:17           ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2020-07-16 15:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner
  Cc: Michal Hocko, syzbot, acme, alexander.shishkin, jolsa,
	linux-kernel, mark.rutland, mingo, namhyung, peterz,
	syzkaller-bugs, open list:ANDROID DRIVERS, linux-mm

syzbot is reporting that mmput() from shrinker function has a risk of
deadlock [1], for delayed_uprobe_add() from update_ref_ctr() calls
kzalloc(GFP_KERNEL) with delayed_uprobe_lock held, and
uprobe_clear_state() from __mmput() also holds delayed_uprobe_lock.

Commit a1b2289cef92ef0e ("android: binder: drop lru lock in isolate
callback") replaced mmput() with mmput_async() in order to avoid sleeping
with spinlock held. But this patch replaces mmput() with mmput_async() in
order not to start __mmput() from shrinker context.

[1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45

Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/android/binder_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 42c672f1584e..cbe6aa77d50d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -947,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 		trace_binder_unmap_user_end(alloc, index);
 	}
 	mmap_read_unlock(mm);
-	mmput(mm);
+	mmput_async(mm);
 
 	trace_binder_unmap_kernel_start(alloc, index);
 
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] binder: Don't use mmput() from shrinker function.
  2020-07-16 15:12         ` [PATCH v2] " Tetsuo Handa
@ 2020-07-16 15:17           ` Michal Hocko
  2020-07-16 16:29             ` Christian Brauner
  2020-07-16 23:53             ` Todd Kjos
  0 siblings, 2 replies; 10+ messages in thread
From: Michal Hocko @ 2020-07-16 15:17 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On Fri 17-07-20 00:12:15, Tetsuo Handa wrote:
> syzbot is reporting that mmput() from shrinker function has a risk of
> deadlock [1], for delayed_uprobe_add() from update_ref_ctr() calls
> kzalloc(GFP_KERNEL) with delayed_uprobe_lock held, and
> uprobe_clear_state() from __mmput() also holds delayed_uprobe_lock.
> 
> Commit a1b2289cef92ef0e ("android: binder: drop lru lock in isolate
> callback") replaced mmput() with mmput_async() in order to avoid sleeping
> with spinlock held. But this patch replaces mmput() with mmput_async() in
> order not to start __mmput() from shrinker context.
> 
> [1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45
> 
> Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
> Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Reviewed-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  drivers/android/binder_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 42c672f1584e..cbe6aa77d50d 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -947,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
>  	mmap_read_unlock(mm);
> -	mmput(mm);
> +	mmput_async(mm);
>  
>  	trace_binder_unmap_kernel_start(alloc, index);
>  
> -- 
> 2.18.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] binder: Don't use mmput() from shrinker function.
  2020-07-16 15:17           ` Michal Hocko
@ 2020-07-16 16:29             ` Christian Brauner
  2020-07-16 22:27               ` Tetsuo Handa
  2020-07-16 23:53             ` Todd Kjos
  1 sibling, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2020-07-16 16:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On Thu, Jul 16, 2020 at 05:17:56PM +0200, Michal Hocko wrote:
> On Fri 17-07-20 00:12:15, Tetsuo Handa wrote:
> > syzbot is reporting that mmput() from shrinker function has a risk of
> > deadlock [1], for delayed_uprobe_add() from update_ref_ctr() calls
> > kzalloc(GFP_KERNEL) with delayed_uprobe_lock held, and
> > uprobe_clear_state() from __mmput() also holds delayed_uprobe_lock.
> > 
> > Commit a1b2289cef92ef0e ("android: binder: drop lru lock in isolate
> > callback") replaced mmput() with mmput_async() in order to avoid sleeping
> > with spinlock held. But this patch replaces mmput() with mmput_async() in
> > order not to start __mmput() from shrinker context.
> > 
> > [1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45
> > 
> > Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
> > Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> Reviewed-by: Michal Hocko <mhocko@suse.com>

Thanks for the careful review Michal!
Does this need a Cc: stable?

Otherwise

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!
Christian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] binder: Don't use mmput() from shrinker function.
  2020-07-16 16:29             ` Christian Brauner
@ 2020-07-16 22:27               ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2020-07-16 22:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Michal Hocko, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, linux-kernel, mark.rutland, mingo,
	namhyung, peterz, syzkaller-bugs, open list:ANDROID DRIVERS,
	linux-mm

On 2020/07/17 1:29, Christian Brauner wrote:
> Does this need a Cc: stable?

Up to someone who applies this patch. I think this race is hard to hit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] binder: Don't use mmput() from shrinker function.
  2020-07-16 15:17           ` Michal Hocko
  2020-07-16 16:29             ` Christian Brauner
@ 2020-07-16 23:53             ` Todd Kjos
  1 sibling, 0 replies; 10+ messages in thread
From: Todd Kjos @ 2020-07-16 23:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Greg Kroah-Hartman, Arve Hjonnevag, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, syzbot, acme,
	alexander.shishkin, jolsa, LKML, Mark Rutland, Ingo Molnar,
	namhyung, Peter Zijlstra, syzkaller-bugs,
	open list:ANDROID DRIVERS, linux-mm

On Thu, Jul 16, 2020 at 8:18 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-07-20 00:12:15, Tetsuo Handa wrote:
> > syzbot is reporting that mmput() from shrinker function has a risk of
> > deadlock [1], for delayed_uprobe_add() from update_ref_ctr() calls
> > kzalloc(GFP_KERNEL) with delayed_uprobe_lock held, and
> > uprobe_clear_state() from __mmput() also holds delayed_uprobe_lock.
> >
> > Commit a1b2289cef92ef0e ("android: binder: drop lru lock in isolate
> > callback") replaced mmput() with mmput_async() in order to avoid sleeping
> > with spinlock held. But this patch replaces mmput() with mmput_async() in
> > order not to start __mmput() from shrinker context.
> >
> > [1] https://syzkaller.appspot.com/bug?id=bc9e7303f537c41b2b0cc2dfcea3fc42964c2d45
> >
> > Reported-by: syzbot <syzbot+1068f09c44d151250c33@syzkaller.appspotmail.com>
> > Reported-by: syzbot <syzbot+e5344baa319c9a96edec@syzkaller.appspotmail.com>
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>
> Reviewed-by: Michal Hocko <mhocko@suse.com>

Acked-by: Todd Kjos <tkjos@google.com>

>
> Thanks!
>
> > ---
> >  drivers/android/binder_alloc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index 42c672f1584e..cbe6aa77d50d 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -947,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> >               trace_binder_unmap_user_end(alloc, index);
> >       }
> >       mmap_read_unlock(mm);
> > -     mmput(mm);
> > +     mmput_async(mm);
> >
> >       trace_binder_unmap_kernel_start(alloc, index);
> >
> > --
> > 2.18.4
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-16 23:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 19:59 possible deadlock in uprobe_clear_state (2) syzbot
2020-07-15 23:36 ` [PATCH] binder: Don't use mmput() from shrinker function Tetsuo Handa
2020-07-16  8:35   ` Michal Hocko
2020-07-16 13:41     ` Tetsuo Handa
2020-07-16 13:54       ` Michal Hocko
2020-07-16 15:12         ` [PATCH v2] " Tetsuo Handa
2020-07-16 15:17           ` Michal Hocko
2020-07-16 16:29             ` Christian Brauner
2020-07-16 22:27               ` Tetsuo Handa
2020-07-16 23:53             ` Todd Kjos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).