Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: syzbot <syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com>,
	David Miller <davem@davemloft.net>,
	linux-can@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: INFO: task hung in unregister_netdevice_notifier (3)
Date: Mon, 15 Jul 2019 11:58:57 -0700
Message-ID: <CAM_iQpXDDyXPn8C6Z1R-GdcarFpiRd3-S_E6GQrEaVdvjFfxeA@mail.gmail.com> (raw)
In-Reply-To: <be6c249e-3b99-8388-5b13-547645b2fac9@hartkopp.net>

On Mon, Jul 15, 2019 at 10:23 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
> Hello all,
>
> On 14.07.19 06:07, syzbot wrote:
> > syzbot has found a reproducer for the following crash on:
>
> the internal users of the CAN networking subsystem like CAN_BCM and
> CAN_RAW hold a number of CAN identifier subscriptions ('filters') for
> CAN netdevices (only type ARPHRD_CAN) in their socket data structures.
>
> The per-socket netdevice notifier is used to manage the ad-hoc removal
> of these filters at netdevice removal time.
>
> What I can see in the console output at
>
> https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
>
> seems to be a race between an unknown register_netdevice_notifier() call
> ("A") and the unregister_netdevice_notifier() ("B") likely invoked by
> bcm_release() ("C"):
>
> [ 1047.294207][ T1049]  schedule+0xa8/0x270
> [ 1047.318401][ T1049]  rwsem_down_write_slowpath+0x70a/0xf70
> [ 1047.324114][ T1049]  ? downgrade_write+0x3c0/0x3c0
> [ 1047.438644][ T1049]  ? mark_held_locks+0xf0/0xf0
> [ 1047.443483][ T1049]  ? lock_acquire+0x190/0x410
> [ 1047.448191][ T1049]  ? unregister_netdevice_notifier+0x7e/0x390
> [ 1047.547227][ T1049]  down_write+0x13c/0x150
> [ 1047.579535][ T1049]  ? down_write+0x13c/0x150
> [ 1047.584106][ T1049]  ? __down_timeout+0x2d0/0x2d0
> [ 1047.635356][ T1049]  ? mark_held_locks+0xf0/0xf0
> [ 1047.640721][ T1049]  unregister_netdevice_notifier+0x7e/0x390  <- "B"
> [ 1047.646667][ T1049]  ? __sock_release+0x89/0x280
> [ 1047.709126][ T1049]  ? register_netdevice_notifier+0x630/0x630 <- "A"
> [ 1047.715203][ T1049]  ? __kasan_check_write+0x14/0x20
> [ 1047.775138][ T1049]  bcm_release+0x93/0x5e0                    <- "C"
> [ 1047.795337][ T1049]  __sock_release+0xce/0x280
> [ 1047.829016][ T1049]  sock_close+0x1e/0x30
>
> The question to me is now:
>
> Is the problem located in an (un)register_netdevice_notifier race OR is
> it generally a bad idea to call unregister_netdevice_notifier() in a
> sock release?

To me it doesn't look like anything wrong in CAN. If you look at a few
more reports from syzbot for the same bug, it actually indicates
something else is holding the pernet_ops_rwsem which caused this
hung task.

When NMI is kicked, it shows nf_ct_iterate_cleanup() was getting
stuck:


NMI backtrace for cpu 0
CPU: 0 PID: 1044 Comm: khungtaskd Not tainted 5.2.0-rc5+ #42
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
 watchdog+0x9b7/0xec0 kernel/hung_task.c:289
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 6877 Comm: kworker/u4:1 Not tainted 5.2.0-rc5+ #42
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
Code: 89 25 e4 bc 15 09 41 bc f4 ff ff ff e8 6d 04 ea ff 48 c7 05 ce
bc 15 09 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48
89 e5 48 8b 75 08 65 48 8b 04 25 c0 fd 01 00 65 8b 15 f0 3a
RSP: 0018:ffff888088fe7ac0 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff817637c6
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: ffff888088fe7af8 R08: ffff888096406340 R09: fffffbfff118bda9
R10: fffffbfff118bda8 R11: ffffffff88c5ed43 R12: ffffffff85b86d11
R13: ffff888096406340 R14: 0000000000000001 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 0000000099e59000 CR4: 00000000001406e0
Call Trace:
 __local_bh_enable_ip+0x11a/0x270 kernel/softirq.c:171
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 get_next_corpse net/netfilter/nf_conntrack_core.c:2015 [inline]
 nf_ct_iterate_cleanup+0x217/0x4e0 net/netfilter/nf_conntrack_core.c:2038
 nf_conntrack_cleanup_net_list+0x7a/0x240 net/netfilter/nf_conntrack_core.c:2225
 nf_conntrack_pernet_exit+0x159/0x1a0
net/netfilter/nf_conntrack_standalone.c:1151
 ops_exit_list.isra.0+0xfc/0x150 net/core/net_namespace.c:168
 cleanup_net+0x4e2/0xa40 net/core/net_namespace.c:575
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

      reply index

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-17  8:11 syzbot
2019-07-14  4:07 ` syzbot
2019-07-15 17:16   ` Oliver Hartkopp
2019-07-15 18:58     ` Cong Wang [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM_iQpXDDyXPn8C6Z1R-GdcarFpiRd3-S_E6GQrEaVdvjFfxeA@mail.gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=netdev@vger.kernel.org \
    --cc=socketcan@hartkopp.net \
    --cc=syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox