linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] <20210904005650.2914-1-hdanton@sina.com>
@ 2021-09-04  2:26 ` syzbot
       [not found] ` <20210904080739.3026-1-hdanton@sina.com>
  1 sibling, 0 replies; 8+ messages in thread
From: syzbot @ 2021-09-04  2:26 UTC (permalink / raw)
  To: eric.dumazet, hdanton, linux-kernel, netdev, syzkaller-bugs, tglx

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in __lru_add_drain_all

INFO: task khugepaged:1665 blocked for more than 143 seconds.
      Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:khugepaged      state:D stack:24384 pid: 1665 ppid:     2 flags:0x00004000
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __flush_work+0x56e/0xb10 kernel/workqueue.c:3083
 __lru_add_drain_all+0x3fd/0x760 mm/swap.c:820
 khugepaged_do_scan mm/khugepaged.c:2214 [inline]
 khugepaged+0x10f/0x5630 mm/khugepaged.c:2275
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 1658 Comm: khungtaskd Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10 Comm: kworker/u4:1 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: phy12 ieee80211_iface_work
RIP: 0010:cmp_bss.part.0+0x1b6/0x860 net/wireless/scan.c:1320
Code: 45 31 c0 49 8d 75 1d 31 c9 e8 96 b7 ff ff 49 8d 7e 18 48 89 fa 48 89 c3 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 <84> c0 74 08 3c 03 0f 8e bc 05 00 00 41 8b 56 18 45 31 c9 45 31 c0
RSP: 0018:ffffc90000cf6d00 EFLAGS: 00000a06
RAX: 0000000000000000 RBX: ffff88805f1b8e1d RCX: 0000000000000000
RDX: 1ffff1100c345993 RSI: ffffffff88697263 RDI: ffff888061a2cc98
RBP: ffff88801e32c468 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff88697224 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88805f1b8e00 R14: ffff888061a2cc80 R15: fffff5200019edea
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000052f7b0 CR3: 0000000034b9b000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 cmp_bss net/wireless/scan.c:1505 [inline]
 rb_find_bss+0x17d/0x200 net/wireless/scan.c:1505
 cfg80211_bss_update+0xc6/0x2070 net/wireless/scan.c:1704
 cfg80211_inform_single_bss_frame_data+0x6e8/0xee0 net/wireless/scan.c:2411
 cfg80211_inform_bss_frame_data+0xa7/0xb10 net/wireless/scan.c:2444
 ieee80211_bss_info_update+0x376/0xb60 net/mac80211/scan.c:190
 ieee80211_rx_bss_info net/mac80211/ibss.c:1119 [inline]
 ieee80211_rx_mgmt_probe_beacon+0xcce/0x17c0 net/mac80211/ibss.c:1608
 ieee80211_ibss_rx_queued_mgmt+0xd37/0x1610 net/mac80211/ibss.c:1635
 ieee80211_iface_process_skb net/mac80211/iface.c:1439 [inline]
 ieee80211_iface_work+0xa65/0xd00 net/mac80211/iface.c:1493
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2444
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
   0:	45 31 c0             	xor    %r8d,%r8d
   3:	49 8d 75 1d          	lea    0x1d(%r13),%rsi
   7:	31 c9                	xor    %ecx,%ecx
   9:	e8 96 b7 ff ff       	callq  0xffffb7a4
   e:	49 8d 7e 18          	lea    0x18(%r14),%rdi
  12:	48 89 fa             	mov    %rdi,%rdx
  15:	48 89 c3             	mov    %rax,%rbx
  18:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  1f:	fc ff df
  22:	48 c1 ea 03          	shr    $0x3,%rdx
  26:	0f b6 04 02          	movzbl (%rdx,%rax,1),%eax
* 2a:	84 c0                	test   %al,%al <-- trapping instruction
  2c:	74 08                	je     0x36
  2e:	3c 03                	cmp    $0x3,%al
  30:	0f 8e bc 05 00 00    	jle    0x5f2
  36:	41 8b 56 18          	mov    0x18(%r14),%edx
  3a:	45 31 c9             	xor    %r9d,%r9d
  3d:	45 31 c0             	xor    %r8d,%r8d


Tested on:

commit:         f1583cb1 Merge tag 'linux-kselftest-next-5.15-rc1' of ..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13a4f735300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9c582b69de20dde2
dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] ` <20210904080739.3026-1-hdanton@sina.com>
@ 2021-09-04  9:18   ` syzbot
       [not found]   ` <20210904104951.3084-1-hdanton@sina.com>
  2021-09-05 23:55   ` Thomas Gleixner
  2 siblings, 0 replies; 8+ messages in thread
From: syzbot @ 2021-09-04  9:18 UTC (permalink / raw)
  To: eric.dumazet, hdanton, linux-kernel, netdev, syzkaller-bugs, tglx

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in synchronize_rcu

INFO: task kworker/u4:3:57 blocked for more than 143 seconds.
      Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:3    state:D stack:24944 pid:   57 ppid:     2 flags:0x00004000
Workqueue: events_unbound fsnotify_mark_destroy_workfn
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __synchronize_srcu+0x1f4/0x290 kernel/rcu/srcutree.c:930
 fsnotify_mark_destroy_workfn+0xfd/0x340 fs/notify/mark.c:860
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 process_scheduled_works kernel/workqueue.c:2360 [inline]
 worker_thread+0x85c/0x11f0 kernel/workqueue.c:2446
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 PID: 1649 Comm: khungtaskd Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 4027 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:197
Code: 01 f0 4d 89 03 e9 63 fd ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 00 <65> 8b 05 39 d2 8b 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
RSP: 0018:ffffc90014246c50 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8880600a5580
RDX: 0000000000000000 RSI: ffff8880600a5580 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000002f R09: 0000000000000000
R10: ffffffff83f5086c R11: 0000000000000000 R12: ffffc90014246fd6
R13: ffff88802e928008 R14: ffffc90014246fe0 R15: ffffc90014246fd6
FS:  00007f0e855b9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000052f7b0 CR3: 00000000532b1000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 number+0x16a/0xae0 lib/vsprintf.c:465
 vsnprintf+0xf09/0x14f0 lib/vsprintf.c:2863
 snprintf+0xbb/0xf0 lib/vsprintf.c:2930
 __dev_alloc_name net/core/dev.c:1126 [inline]
 dev_alloc_name_ns+0x3a4/0x6b0 net/core/dev.c:1154
 dev_get_valid_name+0x67/0x160 net/core/dev.c:1189
 register_netdevice+0x361/0x1500 net/core/dev.c:10214
 ipvlan_link_new+0x39b/0xc00 drivers/net/ipvlan/ipvlan_main.c:586
 __rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458
 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665e9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0e855b9188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665e9
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffdedf0960f R14: 00007f0e855b9300 R15: 0000000000022000
----------------
Code disassembly (best guess):
   0:	01 f0                	add    %esi,%eax
   2:	4d 89 03             	mov    %r8,(%r11)
   5:	e9 63 fd ff ff       	jmpq   0xfffffd6d
   a:	b9 ff ff ff ff       	mov    $0xffffffff,%ecx
   f:	ba 08 00 00 00       	mov    $0x8,%edx
  14:	4d 8b 03             	mov    (%r11),%r8
  17:	48 0f bd ca          	bsr    %rdx,%rcx
  1b:	49 8b 45 00          	mov    0x0(%r13),%rax
  1f:	48 63 c9             	movslq %ecx,%rcx
  22:	e9 64 ff ff ff       	jmpq   0xffffff8b
  27:	0f 1f 00             	nopl   (%rax)
* 2a:	65 8b 05 39 d2 8b 7e 	mov    %gs:0x7e8bd239(%rip),%eax        # 0x7e8bd26a <-- trapping instruction
  31:	89 c1                	mov    %eax,%ecx
  33:	48 8b 34 24          	mov    (%rsp),%rsi
  37:	81 e1 00 01 00 00    	and    $0x100,%ecx
  3d:	65                   	gs
  3e:	48                   	rex.W
  3f:	8b                   	.byte 0x8b


Tested on:

commit:         f1583cb1 Merge tag 'linux-kselftest-next-5.15-rc1' of ..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=123f2ab9300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9c582b69de20dde2
dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
patch:          https://syzkaller.appspot.com/x/patch.diff?x=12e12515300000


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found]   ` <20210904104951.3084-1-hdanton@sina.com>
@ 2021-09-04 17:04     ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-09-04 17:04 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, eric.dumazet, Jan Kara, tglx, linux-kernel, netdev,
	syzkaller-bugs

On Sat, Sep 04, 2021 at 06:49:51PM +0800, Hillf Danton wrote:
> On Sat, 04 Sep 2021 02:18:16 -0700
> > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > INFO: task hung in synchronize_rcu
> 
> Could you please take a look at this report, Jan and Paul?
> 
> Thanks
> Hillf
> > 
> > INFO: task kworker/u4:3:57 blocked for more than 143 seconds.
> >       Not tainted 5.14.0-syzkaller #0
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/u4:3    state:D stack:24944 pid:   57 ppid:     2 flags:0x00004000
> > Workqueue: events_unbound fsnotify_mark_destroy_workfn
> > Call Trace:
> >  context_switch kernel/sched/core.c:4940 [inline]
> >  __schedule+0x940/0x26f0 kernel/sched/core.c:6287
> >  schedule+0xd3/0x270 kernel/sched/core.c:6366
> >  schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
> >  do_wait_for_common kernel/sched/completion.c:85 [inline]
> >  __wait_for_common kernel/sched/completion.c:106 [inline]
> >  wait_for_common kernel/sched/completion.c:117 [inline]
> >  wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
> >  __synchronize_srcu+0x1f4/0x290 kernel/rcu/srcutree.c:930
> >  fsnotify_mark_destroy_workfn+0xfd/0x340 fs/notify/mark.c:860
> >  process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
> >  process_scheduled_works kernel/workqueue.c:2360 [inline]
> >  worker_thread+0x85c/0x11f0 kernel/workqueue.c:2446
> >  kthread+0x3e5/0x4d0 kernel/kthread.c:319
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > INFO: lockdep is turned off.
> > NMI backtrace for cpu 1
> > CPU: 1 PID: 1649 Comm: khungtaskd Not tainted 5.14.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:88 [inline]
> >  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
> >  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
> >  nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
> >  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
> >  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
> >  watchdog+0xc1d/0xf50 kernel/hung_task.c:295
> >  kthread+0x3e5/0x4d0 kernel/kthread.c:319
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > Sending NMI from CPU 1 to CPUs 0:
> > NMI backtrace for cpu 0
> > CPU: 0 PID: 4027 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
> > RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
> > RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:197
> > Code: 01 f0 4d 89 03 e9 63 fd ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 00 <65> 8b 05 39 d2 8b 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
> > RSP: 0018:ffffc90014246c50 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8880600a5580
> > RDX: 0000000000000000 RSI: ffff8880600a5580 RDI: 0000000000000003
> > RBP: 0000000000000000 R08: 000000000000002f R09: 0000000000000000
> > R10: ffffffff83f5086c R11: 0000000000000000 R12: ffffc90014246fd6
> > R13: ffff88802e928008 R14: ffffc90014246fe0 R15: ffffc90014246fd6
> > FS:  00007f0e855b9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000052f7b0 CR3: 00000000532b1000 CR4: 00000000001506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  number+0x16a/0xae0 lib/vsprintf.c:465
> >  vsnprintf+0xf09/0x14f0 lib/vsprintf.c:2863
> >  snprintf+0xbb/0xf0 lib/vsprintf.c:2930

The above three return addresses usually mean that the kernel is printing
console messages faster than they can be accommodated.  For example,
serial lines (even ones traditionally considered to be fast) get you here
quite easily.  Running a bunch of guest OSes all spewing console messages
can overrun many combinations of file systems and mass-storage devices
(and don't even get me started on NFS!).

My advice is:

1.	Get a faster path for your console output.

2.	Reduce the amount of console output being produced.

3.	Figure out what console output is overrunning your
	console setup and fix whatever is spewing so much
	output.

							Thanx, Paul

> >  __dev_alloc_name net/core/dev.c:1126 [inline]
> >  dev_alloc_name_ns+0x3a4/0x6b0 net/core/dev.c:1154
> >  dev_get_valid_name+0x67/0x160 net/core/dev.c:1189
> >  register_netdevice+0x361/0x1500 net/core/dev.c:10214
> >  ipvlan_link_new+0x39b/0xc00 drivers/net/ipvlan/ipvlan_main.c:586
> >  __rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458
> >  rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
> >  rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
> >  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
> >  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
> >  netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
> >  sock_sendmsg_nosec net/socket.c:704 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:724
> >  ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
> >  ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x4665e9
> > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007f0e855b9188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665e9
> > RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
> > RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
> > R13: 00007ffdedf0960f R14: 00007f0e855b9300 R15: 0000000000022000
> > ----------------
> > Code disassembly (best guess):
> >    0:	01 f0                	add    %esi,%eax
> >    2:	4d 89 03             	mov    %r8,(%r11)
> >    5:	e9 63 fd ff ff       	jmpq   0xfffffd6d
> >    a:	b9 ff ff ff ff       	mov    $0xffffffff,%ecx
> >    f:	ba 08 00 00 00       	mov    $0x8,%edx
> >   14:	4d 8b 03             	mov    (%r11),%r8
> >   17:	48 0f bd ca          	bsr    %rdx,%rcx
> >   1b:	49 8b 45 00          	mov    0x0(%r13),%rax
> >   1f:	48 63 c9             	movslq %ecx,%rcx
> >   22:	e9 64 ff ff ff       	jmpq   0xffffff8b
> >   27:	0f 1f 00             	nopl   (%rax)
> > * 2a:	65 8b 05 39 d2 8b 7e 	mov    %gs:0x7e8bd239(%rip),%eax        # 0x7e8bd26a <-- trapping instruction
> >   31:	89 c1                	mov    %eax,%ecx
> >   33:	48 8b 34 24          	mov    (%rsp),%rsi
> >   37:	81 e1 00 01 00 00    	and    $0x100,%ecx
> >   3d:	65                   	gs
> >   3e:	48                   	rex.W
> >   3f:	8b                   	.byte 0x8b
> > 
> > 
> > Tested on:
> > 
> > commit:         f1583cb1 Merge tag 'linux-kselftest-next-5.15-rc1' of ..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=123f2ab9300000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=9c582b69de20dde2
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
> > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
> > patch:          https://syzkaller.appspot.com/x/patch.diff?x=12e12515300000
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] ` <20210904080739.3026-1-hdanton@sina.com>
  2021-09-04  9:18   ` syzbot
       [not found]   ` <20210904104951.3084-1-hdanton@sina.com>
@ 2021-09-05 23:55   ` Thomas Gleixner
  2 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2021-09-05 23:55 UTC (permalink / raw)
  To: Hillf Danton, syzbot
  Cc: eric.dumazet, hdanton, linux-kernel, netdev, syzkaller-bugs

On Sat, Sep 04 2021 at 16:07, Hillf Danton wrote:
>
> See if ieee80211_iface_work is burning more CPU cycles than thought, given the
> bound workqueue work blocked for more than 143 seconds.
>
> #syz test git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -1494,6 +1494,7 @@ static void ieee80211_iface_work(struct
>  
>  		kfree_skb(skb);
>  		kcov_remote_stop();
> +		cond_resched();
>  	}
>  
>  	/* process status queue */
> @@ -1504,6 +1505,7 @@ static void ieee80211_iface_work(struct
>  		kfree_skb(skb);
>  
>  		kcov_remote_stop();
> +		cond_resched();
>  	}
>  
>  	/* then other type-dependent work */
> --

Again. What are you trying to achieve here? 

ieee80211_iface_work() is a work function invoked from a worker thread
in preemptible task context.

The kernel config used for this has CONFIG_PREEMPT=y, which means that
the context in which you are sprinkling cond_resched() is already fully
preemtible and the only reason for this fail would be a fatal bug in the
scheduler core or in the preemption mechanism. Pretty unlikely to go
unnoticed for anything else than for this particular reproducer.

Can you please stop waisting precious compute power?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] <20210903111011.2811-1-hdanton@sina.com>
  2021-09-03 12:04 ` syzbot
@ 2021-09-05 23:36 ` Thomas Gleixner
  1 sibling, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2021-09-05 23:36 UTC (permalink / raw)
  To: Hillf Danton, syzbot; +Cc: linux-kernel, netdev, syzkaller-bugs, eric.dumazet

Hillf,

On Fri, Sep 03 2021 at 19:10, Hillf Danton wrote:
>
> See if ksoftirqd is preventing bound workqueue work from running.

What?

> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -521,6 +521,7 @@ asmlinkage __visible void __softirq_entr
>  	bool in_hardirq;
>  	__u32 pending;
>  	int softirq_bit;
> +	bool is_ksoftirqd = __this_cpu_read(ksoftirqd) == current;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -565,6 +566,8 @@ restart:
>  		}
>  		h++;
>  		pending >>= softirq_bit;
> +		if (is_ksoftirqd && in_task())

Can you please explain how this would ever be true?

 #define in_task()	(!(in_nmi() | in_hardirq() | in_serving_softirq()))

in_task() is guaranteed to be false here, because in_serving_softirq()
is guaranteed to be true simply because this is the softirq processing
context.

> +			cond_resched();

___do_softirq() returns after 2 msec of softirq processing whether it is
invoked on return from interrupt or in ksoftirqd context. On return from
interrupt this wakes ksoftirqd and returns. In ksoftirqd this is a
rescheduling point.

But that only works when the action handlers, e.g. net_rx_action(),
behave well and respect that limit as well.

net_rx_action() has it's own time limit: netdev_budget_usecs

That defaults to: 2 * USEC_PER_SEC / HZ 

The config has HZ=100, so this loop should terminate after

    2 * 1e6 / 100 = 20000us = 20ms

The provided C-reproducer does not change that default.

But again this loop can only terminate if napi_poll() and the
subsequently invoked callchain behaves well.

So instead of sending obviously useless "debug" patches, why are you not
grabbing the kernel config and the reproducer and figure out what the
root cause is?

Enable tracing, add some trace_printks and let ftrace_dump_on_oops spill
it out when the problem triggers. That will pinpoint the issue.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] <20210905022044.3176-1-hdanton@sina.com>
@ 2021-09-05  5:21 ` syzbot
  0 siblings, 0 replies; 8+ messages in thread
From: syzbot @ 2021-09-05  5:21 UTC (permalink / raw)
  To: hdanton, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in synchronize_rcu

INFO: task kworker/u4:4:135 blocked for more than 143 seconds.
      Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:4    state:D stack:26552 pid:  135 ppid:     2 flags:0x00004000
Workqueue: events_unbound fsnotify_connector_destroy_workfn
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __synchronize_srcu+0x1f4/0x290 kernel/rcu/srcutree.c:930
 fsnotify_connector_destroy_workfn+0x49/0xa0 fs/notify/mark.c:164
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 process_scheduled_works kernel/workqueue.c:2360 [inline]
 worker_thread+0x85c/0x11f0 kernel/workqueue.c:2446
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/u4:6:852 blocked for more than 144 seconds.
      Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:6    state:D stack:25784 pid:  852 ppid:     2 flags:0x00004000
Workqueue: events_unbound fsnotify_mark_destroy_workfn
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __synchronize_srcu+0x1f4/0x290 kernel/rcu/srcutree.c:930
 fsnotify_mark_destroy_workfn+0xfd/0x340 fs/notify/mark.c:860
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 process_scheduled_works kernel/workqueue.c:2360 [inline]
 worker_thread+0x85c/0x11f0 kernel/workqueue.c:2446
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 PID: 1634 Comm: khungtaskd Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: kworker/u4:2 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
RIP: 0010:lock_acquire+0x33/0x510 kernel/locking/lockdep.c:5593
Code: df 41 57 41 89 f7 41 56 49 89 fe 41 55 41 89 d5 41 54 41 89 cc 55 44 89 c5 53 48 81 ec b0 00 00 00 48 8d 5c 24 10 4c 89 0c 24 <48> c7 44 24 10 b3 8a b5 41 48 c1 eb 03 48 c7 44 24 18 ca d8 14 8b
RSP: 0018:ffffc90000e2fba0 EFLAGS: 00000286
RAX: dffffc0000000000 RBX: ffffc90000e2fbb0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8b97f960
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff88d220e1 R11: 0000000000000000 R12: 0000000000000002
R13: 0000000000000000 R14: ffffffff8b97f960 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc6d467a718 CR3: 000000002bece000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
 rcu_read_lock include/linux/rcupdate.h:687 [inline]
 batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:410 [inline]
 batadv_nc_worker+0x12d/0xfa0 net/batman-adv/network-coding.c:721
 process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2444
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
   0:	df 41 57             	filds  0x57(%rcx)
   3:	41 89 f7             	mov    %esi,%r15d
   6:	41 56                	push   %r14
   8:	49 89 fe             	mov    %rdi,%r14
   b:	41 55                	push   %r13
   d:	41 89 d5             	mov    %edx,%r13d
  10:	41 54                	push   %r12
  12:	41 89 cc             	mov    %ecx,%r12d
  15:	55                   	push   %rbp
  16:	44 89 c5             	mov    %r8d,%ebp
  19:	53                   	push   %rbx
  1a:	48 81 ec b0 00 00 00 	sub    $0xb0,%rsp
  21:	48 8d 5c 24 10       	lea    0x10(%rsp),%rbx
  26:	4c 89 0c 24          	mov    %r9,(%rsp)
* 2a:	48 c7 44 24 10 b3 8a 	movq   $0x41b58ab3,0x10(%rsp) <-- trapping instruction
  31:	b5 41
  33:	48 c1 eb 03          	shr    $0x3,%rbx
  37:	48 c7 44 24 18 ca d8 	movq   $0xffffffff8b14d8ca,0x18(%rsp)
  3e:	14 8b


Tested on:

commit:         49624efa Merge tag 'denywrite-for-5.15' of git://githu..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16810735300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=faed7df0f442c217
dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
patch:          https://syzkaller.appspot.com/x/patch.diff?x=11d0a286300000


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] INFO: task hung in __lru_add_drain_all
       [not found] <20210903111011.2811-1-hdanton@sina.com>
@ 2021-09-03 12:04 ` syzbot
  2021-09-05 23:36 ` Thomas Gleixner
  1 sibling, 0 replies; 8+ messages in thread
From: syzbot @ 2021-09-03 12:04 UTC (permalink / raw)
  To: eric.dumazet, hdanton, linux-kernel, netdev, syzkaller-bugs, tglx

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in synchronize_rcu

INFO: task kworker/u4:3:57 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc3-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:3    state:D stack:26128 pid:   57 ppid:     2 flags:0x00004000
Workqueue: events_unbound fsnotify_mark_destroy_workfn
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1855
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __synchronize_srcu+0x1f4/0x290 kernel/rcu/srcutree.c:930
 fsnotify_mark_destroy_workfn+0xfd/0x340 fs/notify/mark.c:832
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
3 locks held by kworker/0:1/7:
2 locks held by ksoftirqd/0/13:
2 locks held by kworker/u4:3/57:
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc90000f9fdb0 ((reaper_work).work){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
2 locks held by kworker/u4:4/129:
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888010869138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc900013afdb0 (connector_reaper_work){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
1 lock held by khungtaskd/1647:
 #0: ffffffff8b97ba40 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by khugepaged/1662:
 #0: ffffffff8ba5e948 (lock#6){+.+.}-{3:3}, at: __lru_add_drain_all+0x65/0x760 mm/swap.c:791
1 lock held by systemd-udevd/4873:
1 lock held by in:imklog/8269:
 #0: ffff888011d9cd70 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
3 locks held by kworker/1:7/10141:
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:620 [inline]
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff88802757d138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x871/0x1630 kernel/workqueue.c:2247
 #1: ffffc9000a747db0 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2251
 #2: ffffffff8d0cc6a8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_dad_work+0xa3/0x1340 net/ipv6/addrconf.c:4031
4 locks held by kworker/u4:7/16665:

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1647 Comm: khungtaskd Not tainted 5.14.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.14.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:match_held_lock+0xe/0xc0 kernel/locking/lockdep.c:5077
Code: 48 c7 c7 40 db 8b 89 e8 66 7a be ff e8 cd 1a cd ff 31 c0 5d c3 0f 1f 80 00 00 00 00 53 48 89 fb 48 83 ec 08 48 39 77 10 74 6a <66> f7 47 22 f0 ff 74 5a 48 8b 46 08 48 89 f7 48 85 c0 74 42 8b 15
RSP: 0018:ffffc90000d27410 EFLAGS: 00000093
RAX: 0000000000000005 RBX: ffff888010a68a18 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff8d0cc6a8 RDI: ffff888010a68a18
RBP: ffffffff8d0cc6a8 R08: ffffc90000d276d8 R09: ffffc90000d276c8
R10: ffffffff87d55c4a R11: 0000000000000000 R12: ffff888010a68000
R13: ffff888010a689f0 R14: 00000000ffffffff R15: ffff888010a68a18
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000052f7b0 CR3: 000000001d209000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __lock_is_held kernel/locking/lockdep.c:5368 [inline]
 lock_is_held_type+0xa7/0x140 kernel/locking/lockdep.c:5668
 lock_is_held include/linux/lockdep.h:283 [inline]
 lockdep_rtnl_is_held+0x17/0x30 net/core/rtnetlink.c:137
 __in6_dev_get include/net/addrconf.h:313 [inline]
 ip6_ignore_linkdown include/net/addrconf.h:404 [inline]
 find_match.part.0+0x78/0xcc0 net/ipv6/route.c:737
 find_match net/ipv6/route.c:824 [inline]
 __find_rr_leaf+0x17f/0xd20 net/ipv6/route.c:825
 find_rr_leaf net/ipv6/route.c:846 [inline]
 rt6_select net/ipv6/route.c:890 [inline]
 fib6_table_lookup+0x649/0xa20 net/ipv6/route.c:2174
 ip6_pol_route+0x1c5/0x11d0 net/ipv6/route.c:2210
 pol_lookup_func include/net/ip6_fib.h:579 [inline]
 fib6_rule_lookup+0x111/0x6f0 net/ipv6/fib6_rules.c:115
 ip6_route_input_lookup net/ipv6/route.c:2280 [inline]
 ip6_route_input+0x63c/0xb30 net/ipv6/route.c:2576
 ip6_rcv_finish_core.constprop.0.isra.0+0x168/0x570 net/ipv6/ip6_input.c:63
 ip6_rcv_finish net/ipv6/ip6_input.c:74 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ipv6_rcv+0x229/0x3c0 net/ipv6/ip6_input.c:297
 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5498
 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5612
 process_backlog+0x2a5/0x6c0 net/core/dev.c:6492
 __napi_poll+0xaf/0x440 net/core/dev.c:7047
 napi_poll net/core/dev.c:7114 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7201
 __do_softirq+0x292/0xa27 kernel/softirq.c:559
 run_ksoftirqd kernel/softirq.c:923 [inline]
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:915
 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
   0:	48 c7 c7 40 db 8b 89 	mov    $0xffffffff898bdb40,%rdi
   7:	e8 66 7a be ff       	callq  0xffbe7a72
   c:	e8 cd 1a cd ff       	callq  0xffcd1ade
  11:	31 c0                	xor    %eax,%eax
  13:	5d                   	pop    %rbp
  14:	c3                   	retq
  15:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  1c:	53                   	push   %rbx
  1d:	48 89 fb             	mov    %rdi,%rbx
  20:	48 83 ec 08          	sub    $0x8,%rsp
  24:	48 39 77 10          	cmp    %rsi,0x10(%rdi)
  28:	74 6a                	je     0x94
* 2a:	66 f7 47 22 f0 ff    	testw  $0xfff0,0x22(%rdi) <-- trapping instruction
  30:	74 5a                	je     0x8c
  32:	48 8b 46 08          	mov    0x8(%rsi),%rax
  36:	48 89 f7             	mov    %rsi,%rdi
  39:	48 85 c0             	test   %rax,%rax
  3c:	74 42                	je     0x80
  3e:	8b                   	.byte 0x8b
  3f:	15                   	.byte 0x15


Tested on:

commit:         c7d10223 Merge tag 'net-5.14-rc4' of git://git.kernel...
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15c81b33300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bfd78f4abd4edaa6
dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
patch:          https://syzkaller.appspot.com/x/patch.diff?x=13983115300000


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [syzbot] INFO: task hung in __lru_add_drain_all
@ 2021-09-03  8:32 syzbot
  0 siblings, 0 replies; 8+ messages in thread
From: syzbot @ 2021-09-03  8:32 UTC (permalink / raw)
  To: akpm, bp, frederic, hpa, jmattson, joro, kvm, linux-kernel,
	linux-mm, mark.rutland, masahiroy, mingo, netdev, npiggin,
	pbonzini, peterz, rafael.j.wysocki, rostedt, seanjc, sedat.dilek,
	syzkaller-bugs, tglx, vitor, vkuznets, wanpengli, will, x86

Hello,

syzbot found the following issue on:

HEAD commit:    c7d102232649 Merge tag 'net-5.14-rc4' of git://git.kernel...
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=131efc8a300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bfd78f4abd4edaa6
dashboard link: https://syzkaller.appspot.com/bug?extid=a9b681dcbc06eb2bca04
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e91804300000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1508bb1e300000

The issue was bisected to:

commit 997acaf6b4b59c6a9c259740312a69ea549cc684
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Mon Jan 11 15:37:07 2021 +0000

    lockdep: report broken irq restoration

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=102f551a300000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=122f551a300000
console output: https://syzkaller.appspot.com/x/log.txt?x=142f551a300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a9b681dcbc06eb2bca04@syzkaller.appspotmail.com
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")

INFO: task khugepaged:1663 blocked for more than 143 seconds.
      Not tainted 5.14.0-rc3-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:khugepaged      state:D stack:28464 pid: 1663 ppid:     2 flags:0x00004000
Call Trace:
 context_switch kernel/sched/core.c:4683 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:5940
 schedule+0xd3/0x270 kernel/sched/core.c:6019
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1855
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 __flush_work+0x50e/0xad0 kernel/workqueue.c:3053
 __lru_add_drain_all+0x3fd/0x760 mm/swap.c:842
 khugepaged_do_scan mm/khugepaged.c:2214 [inline]
 khugepaged+0x10f/0x5590 mm/khugepaged.c:2275
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/1647:
 #0: ffffffff8b97ba40 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by khugepaged/1663:
 #0: ffffffff8ba5e948 (lock#6){+.+.}-{3:3}, at: __lru_add_drain_all+0x65/0x760 mm/swap.c:791
1 lock held by in:imklog/8274:
 #0: ffff88801b6b4ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:974
3 locks held by kworker/1:4/8673:
3 locks held by kworker/1:5/8674:
2 locks held by syz-executor764/31172:

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1647 Comm: khungtaskd Not tainted 5.14.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xd0a/0xfc0 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.14.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__find_rr_leaf+0x2d4/0xd20 net/ipv6/route.c:794
Code: 08 0f 84 58 01 00 00 e8 aa 3d a0 f9 48 83 3c 24 00 74 43 e8 9e 3d a0 f9 48 8d bb 80 00 00 00 48 89 f8 48 c1 e8 03 0f b6 04 28 <84> c0 74 08 3c 03 0f 8e 49 08 00 00 44 8b a3 80 00 00 00 44 8b 74
RSP: 0018:ffffc90000d97508 EFLAGS: 00000a06
RAX: 0000000000000000 RBX: ffff88804f143400 RCX: 0000000000000100
RDX: ffff888010a7d4c0 RSI: ffffffff87d55dc2 RDI: ffff88804f143480
RBP: dffffc0000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff87d55d8a R11: 0000000000000000 R12: 0000000000000001
R13: ffffc90000d977f0 R14: 0000000000000000 R15: ffff88804f142aa0
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200000c0 CR3: 000000001ff23000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 find_rr_leaf net/ipv6/route.c:846 [inline]
 rt6_select net/ipv6/route.c:890 [inline]
 fib6_table_lookup+0x649/0xa20 net/ipv6/route.c:2174
 ip6_pol_route+0x1c5/0x11d0 net/ipv6/route.c:2210
 pol_lookup_func include/net/ip6_fib.h:579 [inline]
 fib6_rule_lookup+0x111/0x6f0 net/ipv6/fib6_rules.c:115
 ip6_route_input_lookup net/ipv6/route.c:2280 [inline]
 ip6_route_input+0x63c/0xb30 net/ipv6/route.c:2576
 ip6_rcv_finish_core.constprop.0.isra.0+0x168/0x570 net/ipv6/ip6_input.c:63
 ip6_rcv_finish net/ipv6/ip6_input.c:74 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ipv6_rcv+0x229/0x3c0 net/ipv6/ip6_input.c:297
 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5498
 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5612
 process_backlog+0x2a5/0x6c0 net/core/dev.c:6492
 __napi_poll+0xaf/0x440 net/core/dev.c:7047
 napi_poll net/core/dev.c:7114 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7201
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 run_ksoftirqd kernel/softirq.c:920 [inline]
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:912
 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-05 23:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210904005650.2914-1-hdanton@sina.com>
2021-09-04  2:26 ` [syzbot] INFO: task hung in __lru_add_drain_all syzbot
     [not found] ` <20210904080739.3026-1-hdanton@sina.com>
2021-09-04  9:18   ` syzbot
     [not found]   ` <20210904104951.3084-1-hdanton@sina.com>
2021-09-04 17:04     ` Paul E. McKenney
2021-09-05 23:55   ` Thomas Gleixner
     [not found] <20210905022044.3176-1-hdanton@sina.com>
2021-09-05  5:21 ` syzbot
     [not found] <20210903111011.2811-1-hdanton@sina.com>
2021-09-03 12:04 ` syzbot
2021-09-05 23:36 ` Thomas Gleixner
2021-09-03  8:32 syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).