mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [syzbot] INFO: rcu detected stall in sys_recvmmsg
@ 2021-09-22  0:13 syzbot
  2021-09-22 15:13 ` Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: syzbot @ 2021-09-22  0:13 UTC (permalink / raw)
  To: axboe, christian.brauner, davem, dkadashev, kuba, linux-fsdevel,
	linux-kernel, mathew.j.martineau, matthieu.baerts, mptcp, netdev,
	syzkaller-bugs, torvalds, viro

Hello,

syzbot found the following issue on:

HEAD commit:    1f77990c4b79 Add linux-next specific files for 20210920
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1383891d300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ab1346371f2e6884
dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1625f1ab300000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17eb1b3b300000

The issue was bisected to:

commit 020250f31c4c75ac7687a673e29c00786582a5f4
Author: Dmitry Kadashev <dkadashev@gmail.com>
Date:   Thu Jul 8 06:34:43 2021 +0000

    namei: make do_linkat() take struct filename

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15f5ef77300000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=17f5ef77300000
console output: https://syzkaller.appspot.com/x/log.txt?x=13f5ef77300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
Fixes: 020250f31c4c ("namei: make do_linkat() take struct filename")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1 
	(t=10500 jiffies g=13089 q=109)
rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28696 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 context_switch kernel/sched/core.c:4955 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6236
 schedule+0xd3/0x270 kernel/sched/core.c:6315
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
FS:  0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
 mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
 release_sock+0xb4/0x1b0 net/core/sock.c:3204
 mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
 mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
 inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
 sock_recvmsg_nosec net/socket.c:944 [inline]
 ____sys_recvmsg+0x527/0x600 net/socket.c:2626
 ___sys_recvmsg+0x127/0x200 net/socket.c:2670
 do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
 __sys_recvmmsg net/socket.c:2843 [inline]
 __do_sys_recvmmsg net/socket.c:2866 [inline]
 __se_sys_recvmmsg net/socket.c:2859 [inline]
 __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc200d2dc39
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.444 msecs
NMI backtrace for cpu 0
CPU: 0 PID: 8509 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
 rcu_pending kernel/rcu/tree.c:3878 [inline]
 rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:lock_release+0x3f1/0x720 kernel/locking/lockdep.c:5633
Code: 7e 83 f8 01 0f 85 8d 01 00 00 9c 58 f6 c4 02 0f 85 78 01 00 00 48 f7 04 24 00 02 00 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c5 48 c7 45 00 00 00 00 00 c7 45 08 00 00 00 00 48 8b 84 24
RSP: 0018:ffffc9000cde7660 EFLAGS: 00000206
RAX: dffffc0000000000 RBX: 70dbee1ebc366ab9 RCX: ffffc9000cde76b0
RDX: 1ffff11003a0bc2b RSI: 0000000000000201 RDI: 0000000000000000
RBP: 1ffff920019bcece R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000443 R12: 0000000000000001
R13: 0000000000000002 R14: ffff88801d05e160 R15: ffff88801d05d700
 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:174 [inline]
 _raw_spin_unlock_bh+0x12/0x30 kernel/locking/spinlock.c:210
 spin_unlock_bh include/linux/spinlock.h:413 [inline]
 unlock_sock_fast include/net/sock.h:1644 [inline]
 unlock_sock_fast include/net/sock.h:1636 [inline]
 mptcp_subflow_cleanup_rbuf net/mptcp/protocol.c:468 [inline]
 mptcp_cleanup_rbuf net/mptcp/protocol.c:499 [inline]
 mptcp_recvmsg+0xb8b/0x27b0 net/mptcp/protocol.c:2027
 inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
 sock_recvmsg_nosec net/socket.c:944 [inline]
 ____sys_recvmsg+0x527/0x600 net/socket.c:2626
 ___sys_recvmsg+0x127/0x200 net/socket.c:2670
 do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
 __sys_recvmmsg net/socket.c:2843 [inline]
 __do_sys_recvmmsg net/socket.c:2866 [inline]
 __se_sys_recvmmsg net/socket.c:2859 [inline]
 __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc200d2dc39
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
----------------
Code disassembly (best guess):
   0:	38 00                	cmp    %al,(%rax)
   2:	74 ed                	je     0xfffffff1
   4:	48 8d 50 08          	lea    0x8(%rax),%rdx
   8:	eb 09                	jmp    0x13
   a:	48 83 c0 01          	add    $0x1,%rax
   e:	48 39 d0             	cmp    %rdx,%rax
  11:	74 7a                	je     0x8d
  13:	80 38 00             	cmpb   $0x0,(%rax)
  16:	74 f2                	je     0xa
  18:	48 89 c2             	mov    %rax,%rdx
  1b:	b8 01 00 00 00       	mov    $0x1,%eax
  20:	48 85 d2             	test   %rdx,%rdx
  23:	75 56                	jne    0x7b
  25:	5b                   	pop    %rbx
  26:	5d                   	pop    %rbp
  27:	41 5c                	pop    %r12
  29:	c3                   	retq
* 2a:	48 85 d2             	test   %rdx,%rdx <-- trapping instruction
  2d:	74 5e                	je     0x8d
  2f:	48 01 ea             	add    %rbp,%rdx
  32:	eb 09                	jmp    0x3d
  34:	48 83 c0 01          	add    $0x1,%rax
  38:	48 39 d0             	cmp    %rdx,%rax
  3b:	74 50                	je     0x8d
  3d:	80 38 00             	cmpb   $0x0,(%rax)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-22  0:13 [syzbot] INFO: rcu detected stall in sys_recvmmsg syzbot
@ 2021-09-22 15:13 ` Paolo Abeni
  2021-09-22 15:15   ` Jens Axboe
  2021-09-22 17:25 ` Paolo Abeni
  2021-09-23 13:49 ` Paolo Abeni
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-09-22 15:13 UTC (permalink / raw)
  To: syzbot, axboe, christian.brauner, davem, dkadashev, kuba,
	linux-fsdevel, linux-kernel, mathew.j.martineau, matthieu.baerts,
	mptcp, netdev, syzkaller-bugs, torvalds, viro

On Tue, 2021-09-21 at 17:13 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    1f77990c4b79 Add linux-next specific files for 20210920
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1383891d300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ab1346371f2e6884
> dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1625f1ab300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17eb1b3b300000
> 
> The issue was bisected to:
> 
> commit 020250f31c4c75ac7687a673e29c00786582a5f4
> Author: Dmitry Kadashev <dkadashev@gmail.com>
> Date:   Thu Jul 8 06:34:43 2021 +0000
> 
>     namei: make do_linkat() take struct filename
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15f5ef77300000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17f5ef77300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f5ef77300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
> Fixes: 020250f31c4c ("namei: make do_linkat() take struct filename")

I'm unsure why the bisection points to this commit. This looks like an
MPTCP specific issue, due to bad handling of MSG_WAITALL recvmsg()
flag.

I'll test a fix with syzbot to confirm the above.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-22 15:13 ` Paolo Abeni
@ 2021-09-22 15:15   ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-09-22 15:15 UTC (permalink / raw)
  To: Paolo Abeni, syzbot, christian.brauner, davem, dkadashev, kuba,
	linux-fsdevel, linux-kernel, mathew.j.martineau, matthieu.baerts,
	mptcp, netdev, syzkaller-bugs, torvalds, viro

On 9/22/21 9:13 AM, Paolo Abeni wrote:
> On Tue, 2021-09-21 at 17:13 -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    1f77990c4b79 Add linux-next specific files for 20210920
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1383891d300000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=ab1346371f2e6884
>> dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1625f1ab300000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17eb1b3b300000
>>
>> The issue was bisected to:
>>
>> commit 020250f31c4c75ac7687a673e29c00786582a5f4
>> Author: Dmitry Kadashev <dkadashev@gmail.com>
>> Date:   Thu Jul 8 06:34:43 2021 +0000
>>
>>     namei: make do_linkat() take struct filename
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15f5ef77300000
>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17f5ef77300000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f5ef77300000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
>> Fixes: 020250f31c4c ("namei: make do_linkat() take struct filename")
> 
> I'm unsure why the bisection points to this commit. This looks like an
> MPTCP specific issue, due to bad handling of MSG_WAITALL recvmsg()
> flag.

This seems to happen quite often, and I'm guessing it's mostly likely
due to the test being run not always triggering the issue. Hence you end
up with git bisection results that go off into the weeds.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-22  0:13 [syzbot] INFO: rcu detected stall in sys_recvmmsg syzbot
  2021-09-22 15:13 ` Paolo Abeni
@ 2021-09-22 17:25 ` Paolo Abeni
  2021-09-22 18:42   ` syzbot
  2021-09-23 13:49 ` Paolo Abeni
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-09-22 17:25 UTC (permalink / raw)
  To: syzbot, mptcp, syzkaller-bugs

On Tue, 2021-09-21 at 17:13 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    1f77990c4b79 Add linux-next specific files for 20210920
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1383891d300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ab1346371f2e6884
> dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1625f1ab300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17eb1b3b300000
> 
> The issue was bisected to:
> 
> commit 020250f31c4c75ac7687a673e29c00786582a5f4
> Author: Dmitry Kadashev <dkadashev@gmail.com>
> Date:   Thu Jul 8 06:34:43 2021 +0000
> 
>     namei: make do_linkat() take struct filename
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15f5ef77300000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17f5ef77300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f5ef77300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
> Fixes: 020250f31c4c ("namei: make do_linkat() take struct filename")
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 	0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1 
> 	(t=10500 jiffies g=13089 q=109)
> rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task     stack:28696 pid:   14 ppid:     2 flags:0x00004000
> Call Trace:
>  context_switch kernel/sched/core.c:4955 [inline]
>  __schedule+0x940/0x26f0 kernel/sched/core.c:6236
>  schedule+0xd3/0x270 kernel/sched/core.c:6315
>  schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
>  rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
>  rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
>  kthread+0x405/0x4f0 kernel/kthread.c:327
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> rcu: Stack dump where RCU GP kthread last ran:
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
> Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
> RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
> RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
> RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
> RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
> R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
> R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
> FS:  0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>  test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
>  mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
>  release_sock+0xb4/0x1b0 net/core/sock.c:3204
>  mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
>  mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
>  inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
>  sock_recvmsg_nosec net/socket.c:944 [inline]
>  ____sys_recvmsg+0x527/0x600 net/socket.c:2626
>  ___sys_recvmsg+0x127/0x200 net/socket.c:2670
>  do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
>  __sys_recvmmsg net/socket.c:2843 [inline]
>  __do_sys_recvmmsg net/socket.c:2866 [inline]
>  __se_sys_recvmmsg net/socket.c:2859 [inline]
>  __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc200d2dc39
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
> RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
> R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.444 msecs
> NMI backtrace for cpu 0
> CPU: 0 PID: 8509 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
>  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>  rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
>  print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
>  rcu_pending kernel/rcu/tree.c:3878 [inline]
>  rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
>  update_process_times+0x16d/0x200 kernel/time/timer.c:1785
>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
>  tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
>  __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
>  __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
>  hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
>  __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
>  sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
>  </IRQ>
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
> RIP: 0010:lock_release+0x3f1/0x720 kernel/locking/lockdep.c:5633
> Code: 7e 83 f8 01 0f 85 8d 01 00 00 9c 58 f6 c4 02 0f 85 78 01 00 00 48 f7 04 24 00 02 00 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c5 48 c7 45 00 00 00 00 00 c7 45 08 00 00 00 00 48 8b 84 24
> RSP: 0018:ffffc9000cde7660 EFLAGS: 00000206
> RAX: dffffc0000000000 RBX: 70dbee1ebc366ab9 RCX: ffffc9000cde76b0
> RDX: 1ffff11003a0bc2b RSI: 0000000000000201 RDI: 0000000000000000
> RBP: 1ffff920019bcece R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000443 R12: 0000000000000001
> R13: 0000000000000002 R14: ffff88801d05e160 R15: ffff88801d05d700
>  __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:174 [inline]
>  _raw_spin_unlock_bh+0x12/0x30 kernel/locking/spinlock.c:210
>  spin_unlock_bh include/linux/spinlock.h:413 [inline]
>  unlock_sock_fast include/net/sock.h:1644 [inline]
>  unlock_sock_fast include/net/sock.h:1636 [inline]
>  mptcp_subflow_cleanup_rbuf net/mptcp/protocol.c:468 [inline]
>  mptcp_cleanup_rbuf net/mptcp/protocol.c:499 [inline]
>  mptcp_recvmsg+0xb8b/0x27b0 net/mptcp/protocol.c:2027
>  inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
>  sock_recvmsg_nosec net/socket.c:944 [inline]
>  ____sys_recvmsg+0x527/0x600 net/socket.c:2626
>  ___sys_recvmsg+0x127/0x200 net/socket.c:2670
>  do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
>  __sys_recvmmsg net/socket.c:2843 [inline]
>  __do_sys_recvmmsg net/socket.c:2866 [inline]
>  __se_sys_recvmmsg net/socket.c:2859 [inline]
>  __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc200d2dc39
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
> RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
> R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
> ----------------
> Code disassembly (best guess):
>    0:	38 00                	cmp    %al,(%rax)
>    2:	74 ed                	je     0xfffffff1
>    4:	48 8d 50 08          	lea    0x8(%rax),%rdx
>    8:	eb 09                	jmp    0x13
>    a:	48 83 c0 01          	add    $0x1,%rax
>    e:	48 39 d0             	cmp    %rdx,%rax
>   11:	74 7a                	je     0x8d
>   13:	80 38 00             	cmpb   $0x0,(%rax)
>   16:	74 f2                	je     0xa
>   18:	48 89 c2             	mov    %rax,%rdx
>   1b:	b8 01 00 00 00       	mov    $0x1,%eax
>   20:	48 85 d2             	test   %rdx,%rdx
>   23:	75 56                	jne    0x7b
>   25:	5b                   	pop    %rbx
>   26:	5d                   	pop    %rbp
>   27:	41 5c                	pop    %r12
>   29:	c3                   	retq
> * 2a:	48 85 d2             	test   %rdx,%rdx <-- trapping instruction
>   2d:	74 5e                	je     0x8d
>   2f:	48 01 ea             	add    %rbp,%rdx
>   32:	eb 09                	jmp    0x3d
>   34:	48 83 c0 01          	add    $0x1,%rax
>   38:	48 39 d0             	cmp    %rdx,%rax
>   3b:	74 50                	je     0x8d
>   3d:	80 38 00             	cmpb   $0x0,(%rax)
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

for net-next we should likely consider dropping the DATA_READY flag at
all, but that looks too invasive and error prone for a fix
---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index dbcebf56798f..6b334f9b6242 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1759,11 +1759,32 @@ static int mptcp_sendmsg(struct sock *sk,
struct msghdr *msg, size_t len)
 	return copied ? : ret;
 }
 
+static bool __mptcp_move_skbs(struct mptcp_sock *msk);
+
+static void mptcp_update_ready_flag(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	if (skb_queue_empty_lockless(&sk->sk_receive_queue) &&
+	    skb_queue_empty(&msk->receive_queue)) {
+		/* entire backlog drained, clear DATA_READY. */
+		clear_bit(MPTCP_DATA_READY, &msk->flags);
+
+		/* .. race-breaker: ssk might have gotten new data
+		 * after last __mptcp_move_skbs() returned false.
+		 */
+		if (unlikely(__mptcp_move_skbs(msk)))
+			set_bit(MPTCP_DATA_READY, &msk->flags);
+	}
+}
+
 static void mptcp_wait_data(struct sock *sk, long *timeo)
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
+	mptcp_update_ready_flag(sk);
+
 	add_wait_queue(sk_sleep(sk), &wait);
 	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 
@@ -2080,17 +2101,7 @@ static int mptcp_recvmsg(struct sock *sk, struct
msghdr *msg, size_t len,
 		mptcp_wait_data(sk, &timeo);
 	}
 
-	if (skb_queue_empty_lockless(&sk->sk_receive_queue) &&
-	    skb_queue_empty(&msk->receive_queue)) {
-		/* entire backlog drained, clear DATA_READY. */
-		clear_bit(MPTCP_DATA_READY, &msk->flags);
-
-		/* .. race-breaker: ssk might have gotten new data
-		 * after last __mptcp_move_skbs() returned false.
-		 */
-		if (unlikely(__mptcp_move_skbs(msk)))
-			set_bit(MPTCP_DATA_READY, &msk->flags);
-	}
+	mptcp_update_ready_flag(sk);
 
 out_err:
 	if (cmsg_flags && copied >= 0) {



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-22 17:25 ` Paolo Abeni
@ 2021-09-22 18:42   ` syzbot
  0 siblings, 0 replies; 7+ messages in thread
From: syzbot @ 2021-09-22 18:42 UTC (permalink / raw)
  To: mptcp, pabeni, syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com

Tested on:

commit:         cf1d2c3e Merge tag 'nfsd-5.15-2' of git://git.kernel.o..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=e917f3dfc452c977
dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch:          https://syzkaller.appspot.com/x/patch.diff?x=11e0894b300000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-22  0:13 [syzbot] INFO: rcu detected stall in sys_recvmmsg syzbot
  2021-09-22 15:13 ` Paolo Abeni
  2021-09-22 17:25 ` Paolo Abeni
@ 2021-09-23 13:49 ` Paolo Abeni
  2021-09-23 15:06   ` syzbot
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-09-23 13:49 UTC (permalink / raw)
  To: syzbot, mptcp, syzkaller-bugs

On Tue, 2021-09-21 at 17:13 -0700, syzbot wrote:
> syzbot found the following issue on:
> 
> HEAD commit:    1f77990c4b79 Add linux-next specific files for 20210920
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1383891d300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ab1346371f2e6884
> dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1625f1ab300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17eb1b3b300000
> 
> The issue was bisected to:
> 
> commit 020250f31c4c75ac7687a673e29c00786582a5f4
> Author: Dmitry Kadashev <dkadashev@gmail.com>
> Date:   Thu Jul 8 06:34:43 2021 +0000
> 
>     namei: make do_linkat() take struct filename
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15f5ef77300000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17f5ef77300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f5ef77300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
> Fixes: 020250f31c4c ("namei: make do_linkat() take struct filename")
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 	0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1 
> 	(t=10500 jiffies g=13089 q=109)
> rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task     stack:28696 pid:   14 ppid:     2 flags:0x00004000
> Call Trace:
>  context_switch kernel/sched/core.c:4955 [inline]
>  __schedule+0x940/0x26f0 kernel/sched/core.c:6236
>  schedule+0xd3/0x270 kernel/sched/core.c:6315
>  schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
>  rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
>  rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
>  kthread+0x405/0x4f0 kernel/kthread.c:327
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> rcu: Stack dump where RCU GP kthread last ran:
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
> Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
> RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
> RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
> RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
> RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
> R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
> R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
> FS:  0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>  test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
>  mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
>  release_sock+0xb4/0x1b0 net/core/sock.c:3204
>  mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
>  mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
>  inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
>  sock_recvmsg_nosec net/socket.c:944 [inline]
>  ____sys_recvmsg+0x527/0x600 net/socket.c:2626
>  ___sys_recvmsg+0x127/0x200 net/socket.c:2670
>  do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
>  __sys_recvmmsg net/socket.c:2843 [inline]
>  __do_sys_recvmmsg net/socket.c:2866 [inline]
>  __se_sys_recvmmsg net/socket.c:2859 [inline]
>  __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc200d2dc39
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
> RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
> R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.444 msecs
> NMI backtrace for cpu 0
> CPU: 0 PID: 8509 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
>  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>  rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
>  print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
>  rcu_pending kernel/rcu/tree.c:3878 [inline]
>  rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
>  update_process_times+0x16d/0x200 kernel/time/timer.c:1785
>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
>  tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
>  __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
>  __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
>  hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
>  __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
>  sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
>  </IRQ>
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
> RIP: 0010:lock_release+0x3f1/0x720 kernel/locking/lockdep.c:5633
> Code: 7e 83 f8 01 0f 85 8d 01 00 00 9c 58 f6 c4 02 0f 85 78 01 00 00 48 f7 04 24 00 02 00 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c5 48 c7 45 00 00 00 00 00 c7 45 08 00 00 00 00 48 8b 84 24
> RSP: 0018:ffffc9000cde7660 EFLAGS: 00000206
> RAX: dffffc0000000000 RBX: 70dbee1ebc366ab9 RCX: ffffc9000cde76b0
> RDX: 1ffff11003a0bc2b RSI: 0000000000000201 RDI: 0000000000000000
> RBP: 1ffff920019bcece R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000443 R12: 0000000000000001
> R13: 0000000000000002 R14: ffff88801d05e160 R15: ffff88801d05d700
>  __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:174 [inline]
>  _raw_spin_unlock_bh+0x12/0x30 kernel/locking/spinlock.c:210
>  spin_unlock_bh include/linux/spinlock.h:413 [inline]
>  unlock_sock_fast include/net/sock.h:1644 [inline]
>  unlock_sock_fast include/net/sock.h:1636 [inline]
>  mptcp_subflow_cleanup_rbuf net/mptcp/protocol.c:468 [inline]
>  mptcp_cleanup_rbuf net/mptcp/protocol.c:499 [inline]
>  mptcp_recvmsg+0xb8b/0x27b0 net/mptcp/protocol.c:2027
>  inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
>  sock_recvmsg_nosec net/socket.c:944 [inline]
>  ____sys_recvmsg+0x527/0x600 net/socket.c:2626
>  ___sys_recvmsg+0x127/0x200 net/socket.c:2670
>  do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
>  __sys_recvmmsg net/socket.c:2843 [inline]
>  __do_sys_recvmmsg net/socket.c:2866 [inline]
>  __se_sys_recvmmsg net/socket.c:2859 [inline]
>  __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc200d2dc39
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
> RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
> R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
> ----------------
> Code disassembly (best guess):
>    0:	38 00                	cmp    %al,(%rax)
>    2:	74 ed                	je     0xfffffff1
>    4:	48 8d 50 08          	lea    0x8(%rax),%rdx
>    8:	eb 09                	jmp    0x13
>    a:	48 83 c0 01          	add    $0x1,%rax
>    e:	48 39 d0             	cmp    %rdx,%rax
>   11:	74 7a                	je     0x8d
>   13:	80 38 00             	cmpb   $0x0,(%rax)
>   16:	74 f2                	je     0xa
>   18:	48 89 c2             	mov    %rax,%rdx
>   1b:	b8 01 00 00 00       	mov    $0x1,%eax
>   20:	48 85 d2             	test   %rdx,%rdx
>   23:	75 56                	jne    0x7b
>   25:	5b                   	pop    %rbx
>   26:	5d                   	pop    %rbp
>   27:	41 5c                	pop    %r12
>   29:	c3                   	retq
> * 2a:	48 85 d2             	test   %rdx,%rdx <-- trapping instruction
>   2d:	74 5e                	je     0x8d
>   2f:	48 01 ea             	add    %rbp,%rdx
>   32:	eb 09                	jmp    0x3d
>   34:	48 83 c0 01          	add    $0x1,%rax
>   38:	48 39 d0             	cmp    %rdx,%rax
>   3b:	74 50                	je     0x8d
>   3d:	80 38 00             	cmpb   $0x0,(%rax)
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

DATA_READY removal is less invasive than I thought, and the code really
looks better, so checking such alternative fix, too.
---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e5df0b5971c8..cc996e8973b3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -528,7 +528,6 @@ static bool mptcp_check_data_fin(struct sock *sk)
 
 		sk->sk_shutdown |= RCV_SHUTDOWN;
 		smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
-		set_bit(MPTCP_DATA_READY, &msk->flags);
 
 		switch (sk->sk_state) {
 		case TCP_ESTABLISHED:
@@ -742,10 +741,9 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 
 	/* Wake-up the reader only for in-sequence data */
 	mptcp_data_lock(sk);
-	if (move_skbs_to_msk(msk, ssk)) {
-		set_bit(MPTCP_DATA_READY, &msk->flags);
+	if (move_skbs_to_msk(msk, ssk))
 		sk->sk_data_ready(sk);
-	}
+
 	mptcp_data_unlock(sk);
 }
 
@@ -847,7 +845,6 @@ static void mptcp_check_for_eof(struct mptcp_sock *msk)
 		sk->sk_shutdown |= RCV_SHUTDOWN;
 
 		smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
-		set_bit(MPTCP_DATA_READY, &msk->flags);
 		sk->sk_data_ready(sk);
 	}
 
@@ -1759,21 +1756,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return copied ? : ret;
 }
 
-static void mptcp_wait_data(struct sock *sk, long *timeo)
-{
-	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	struct mptcp_sock *msk = mptcp_sk(sk);
-
-	add_wait_queue(sk_sleep(sk), &wait);
-	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-
-	sk_wait_event(sk, timeo,
-		      test_bit(MPTCP_DATA_READY, &msk->flags), &wait);
-
-	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	remove_wait_queue(sk_sleep(sk), &wait);
-}
-
 static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
 				struct msghdr *msg,
 				size_t len, int flags,
@@ -2077,19 +2059,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 
 		pr_debug("block timeout %ld", timeo);
-		mptcp_wait_data(sk, &timeo);
-	}
-
-	if (skb_queue_empty_lockless(&sk->sk_receive_queue) &&
-	    skb_queue_empty(&msk->receive_queue)) {
-		/* entire backlog drained, clear DATA_READY. */
-		clear_bit(MPTCP_DATA_READY, &msk->flags);
-
-		/* .. race-breaker: ssk might have gotten new data
-		 * after last __mptcp_move_skbs() returned false.
-		 */
-		if (unlikely(__mptcp_move_skbs(msk)))
-			set_bit(MPTCP_DATA_READY, &msk->flags);
+		sk_wait_data(sk, &timeo, NULL);
 	}
 
 out_err:
@@ -2098,9 +2068,9 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			tcp_recv_timestamp(msg, sk, &tss);
 	}
 
-	pr_debug("msk=%p data_ready=%d rx queue empty=%d copied=%d",
-		 msk, test_bit(MPTCP_DATA_READY, &msk->flags),
-		 skb_queue_empty_lockless(&sk->sk_receive_queue), copied);
+	pr_debug("msk=%p rx queue empty=%d:%d copied=%d",
+		 msk, skb_queue_empty_lockless(&sk->sk_receive_queue),
+		 skb_queue_empty(&msk->receive_queue), copied);
 	if (!(flags & MSG_PEEK))
 		mptcp_rcv_space_adjust(msk, copied);
 
@@ -2368,7 +2338,6 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
 	inet_sk_state_store(sk, TCP_CLOSE);
 	sk->sk_shutdown = SHUTDOWN_MASK;
 	smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
-	set_bit(MPTCP_DATA_READY, &msk->flags);
 	set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags);
 
 	mptcp_close_wake_up(sk);
@@ -3385,8 +3354,11 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 
 static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
 {
-	return test_bit(MPTCP_DATA_READY, &msk->flags) ? EPOLLIN | EPOLLRDNORM :
-	       0;
+	if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) &&
+            skb_queue_empty_lockless(&msk->receive_queue))
+		return 0;
+
+	return EPOLLIN | EPOLLRDNORM;
 }
 
 static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
@@ -3421,7 +3393,7 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 	state = inet_sk_state_load(sk);
 	pr_debug("msk=%p state=%d flags=%lx", msk, state, msk->flags);
 	if (state == TCP_LISTEN)
-		return mptcp_check_readable(msk);
+		return test_bit(MPTCP_DATA_READY, &msk->flags) ? EPOLLIN | EPOLLRDNORM : 0;
 
 	if (state != TCP_SYN_SENT && state != TCP_SYN_RECV) {
 		mask |= mptcp_check_readable(msk);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in sys_recvmmsg
  2021-09-23 13:49 ` Paolo Abeni
@ 2021-09-23 15:06   ` syzbot
  0 siblings, 0 replies; 7+ messages in thread
From: syzbot @ 2021-09-23 15:06 UTC (permalink / raw)
  To: mptcp, pabeni, syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com

Tested on:

commit:         58e2cf5d init: Revert accidental changes to print irqs..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=e917f3dfc452c977
dashboard link: https://syzkaller.appspot.com/bug?extid=3360da629681aa0d22fe
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1379f923300000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-23 15:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  0:13 [syzbot] INFO: rcu detected stall in sys_recvmmsg syzbot
2021-09-22 15:13 ` Paolo Abeni
2021-09-22 15:15   ` Jens Axboe
2021-09-22 17:25 ` Paolo Abeni
2021-09-22 18:42   ` syzbot
2021-09-23 13:49 ` Paolo Abeni
2021-09-23 15:06   ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).