All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING in percpu_ref_kill_and_confirm (2)
@ 2020-12-16 21:14 syzbot
  2020-12-16 21:46 ` Jens Axboe
       [not found] ` <20201217031742.1072-1-hdanton@sina.com>
  0 siblings, 2 replies; 4+ messages in thread
From: syzbot @ 2020-12-16 21:14 UTC (permalink / raw)
  To: axboe, io-uring, linux-fsdevel, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, viro, will

Hello,

syzbot found the following issue on:

HEAD commit:    7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000

The issue was bisected to:

commit 4d004099a668c41522242aa146a38cc4eb59cb1e
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Oct 2 09:04:21 2020 +0000

    lockdep: Fix lockdep recursion

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")

RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
------------[ cut here ]------------
percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
Modules linked in:
CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
FS:  00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
Call Trace:
 percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
 io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
 io_uring_release+0x3e/0x50 fs/io_uring.c:8420
 __fput+0x285/0x920 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:151
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
 exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
 syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x441309
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in percpu_ref_kill_and_confirm (2)
  2020-12-16 21:14 WARNING in percpu_ref_kill_and_confirm (2) syzbot
@ 2020-12-16 21:46 ` Jens Axboe
  2020-12-16 22:33   ` Jens Axboe
       [not found] ` <20201217031742.1072-1-hdanton@sina.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2020-12-16 21:46 UTC (permalink / raw)
  To: syzbot, io-uring, linux-fsdevel, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, viro, will, Tejun Heo

On 12/16/20 2:14 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
> dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000
> 
> The issue was bisected to:
> 
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Fri Oct 2 09:04:21 2020 +0000
> 
>     lockdep: Fix lockdep recursion

Ehhh no... This is timing dependent, so probably why it ends up pinpointing
something totally unrelated.

> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
> console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
> 
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
> ------------[ cut here ]------------
> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
> WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
> Modules linked in:
> CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
> Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
> RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
> RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
> RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
> FS:  00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
> Call Trace:
>  percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
>  io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
>  io_uring_release+0x3e/0x50 fs/io_uring.c:8420
>  __fput+0x285/0x920 fs/file_table.c:281
>  task_work_run+0xdd/0x190 kernel/task_work.c:151
>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>  exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
>  exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
>  syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x441309
> Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000

What I think happens here is that we switch mode, but exit before it's
done. Wonder if there's a wait to wait for that on exit. Tejun?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in percpu_ref_kill_and_confirm (2)
  2020-12-16 21:46 ` Jens Axboe
@ 2020-12-16 22:33   ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2020-12-16 22:33 UTC (permalink / raw)
  To: syzbot, io-uring, linux-fsdevel, linux-kernel, mingo, mingo,
	peterz, rostedt, syzkaller-bugs, viro, will, Tejun Heo

On 12/16/20 2:46 PM, Jens Axboe wrote:
> On 12/16/20 2:14 PM, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000
>>
>> The issue was bisected to:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra <peterz@infradead.org>
>> Date:   Fri Oct 2 09:04:21 2020 +0000
>>
>>     lockdep: Fix lockdep recursion
> 
> Ehhh no... This is timing dependent, so probably why it ends up pinpointing
> something totally unrelated.
> 
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
>> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
>>
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
>> ------------[ cut here ]------------
>> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
>> WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Modules linked in:
>> CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
>> RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
>> RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
>> RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
>> RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
>> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
>> FS:  00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
>> Call Trace:
>>  percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
>>  io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
>>  io_uring_release+0x3e/0x50 fs/io_uring.c:8420
>>  __fput+0x285/0x920 fs/file_table.c:281
>>  task_work_run+0xdd/0x190 kernel/task_work.c:151
>>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>>  exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
>>  exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
>>  syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x441309
>> Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
>> RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
> 
> What I think happens here is that we switch mode, but exit before it's
> done. Wonder if there's a wait to wait for that on exit. Tejun?

Sorry, that was another report... Looks like this one is exercising
memory failure, maybe there's an issue with the teardown path for
certain failures. I'll take a look.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in percpu_ref_kill_and_confirm (2)
       [not found] ` <20201217031742.1072-1-hdanton@sina.com>
@ 2020-12-18 16:34   ` Pavel Begunkov
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Begunkov @ 2020-12-18 16:34 UTC (permalink / raw)
  To: Hillf Danton, syzbot; +Cc: axboe, io-uring, linux-kernel, syzkaller-bugs

On 17/12/2020 03:17, Hillf Danton wrote:
> Wed, 16 Dec 2020 13:14:11 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit:    7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000
>>
>> The issue was bisected to:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra <peterz@infradead.org>
>> Date:   Fri Oct 2 09:04:21 2020 +0000
>>
>>     lockdep: Fix lockdep recursion
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
>> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
>>
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
>> ------------[ cut here ]------------
>> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
>> WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Modules linked in:
>> CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
>> RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
>> RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
>> RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
>> RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
>> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
>> FS:  00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
>> Call Trace:
>>  percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
>>  io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
>>  io_uring_release+0x3e/0x50 fs/io_uring.c:8420
>>  __fput+0x285/0x920 fs/file_table.c:281
>>  task_work_run+0xdd/0x190 kernel/task_work.c:151
>>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>>  exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
>>  exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
>>  syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x441309
>> Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
>> RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
> 
> Avoid double kill by checking ctx health.

Let's focus on _how_ it can happen. Refs may be killed by
__io_uring_register(), but this one holds a ref to the file, so
io_uring_release() -> io_ring_ctx_wait_and_kill() shouldn't even happen.
And when io_ring_ctx_wait_and_kill() is called fdget() for that ring
wouldn't be possible. That's if no other bugs are present.

We want to solve a problem rather than mask it. So, can it really
happen or a problem is somewhere else?

> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8379,7 +8379,13 @@ static void io_ring_exit_work(struct wor
>  static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
>  {
>  	mutex_lock(&ctx->uring_lock);
> -	percpu_ref_kill(&ctx->refs);
> +	/*
> +	 * try to avoid killing dead ctx, see the comments for dropping
> +	 * ring mutex in __io_uring_register()
> +	 */
> +	if (!percpu_ref_is_dying(&ctx->refs))
> +		percpu_ref_kill(&ctx->refs);
> +
>  	mutex_unlock(&ctx->uring_lock);
>  
>  	io_kill_timeouts(ctx, NULL);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-18 16:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 21:14 WARNING in percpu_ref_kill_and_confirm (2) syzbot
2020-12-16 21:46 ` Jens Axboe
2020-12-16 22:33   ` Jens Axboe
     [not found] ` <20201217031742.1072-1-hdanton@sina.com>
2020-12-18 16:34   ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.