linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Joseph" <Joseph.Huang@garmin.com>
To: Nikolay Aleksandrov <nikolay@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>
Cc: "Roopa Prabhu" <roopa@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	"bridge@lists.linux-foundation.org"
	<bridge@lists.linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Linus Lüssing" <linus.luessing@c0d3.blue>
Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
Date: Thu, 3 Dec 2020 21:53:08 +0000	[thread overview]
Message-ID: <c82ce96d74ed4d3897d2e68a258f7834@garmin.com> (raw)
In-Reply-To: <bd84ca4c-c694-6fd2-81ef-08e9253c18a4@nvidia.com>

> From: Nikolay Aleksandrov <nikolay@nvidia.com>
> Sent: Thursday, December 3, 2020 3:47 PM
> To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph
> <Joseph.Huang@garmin.com>
> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
> <davem@davemloft.net>; bridge@lists.linux-foundation.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
> <linus.luessing@c0d3.blue>
> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
> snooping
>
> On 03/12/2020 20:28, Jakub Kicinski wrote:
> > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
> >> When enabling multicast snooping, bridge module deadlocks on
> >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
> >> querier on the same L2 network.
> >>
> >> The deadlock was caused by the following sequence: While holding the
> >> lock, br_multicast_open calls br_multicast_join_snoopers, which
> >> eventually causes IP stack to (attempt to) send out a Listener Report (in
> igmp6_join_group).
> >> Since the destination Ethernet address is a multicast address,
> >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
> >> which in turn calls br_multicast_add_group, which then deadlocks on
> multicast_lock.
> >>
> >> The fix is to move the call br_multicast_join_snoopers outside of the
> >> critical section. This works since br_multicast_join_snoopers only
> >> deals with IP and does not modify any multicast data structures of
> >> the bridge, so there's no need to hold the lock.
> >>
> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> >>
> >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
> >
> > Nik, Linus - how does this one look?
> >
>
> Hi,
> Thanks, somehow I missed this one too. Need to check my email config. :) I
> believe I see how it can happen, although it's not straight-forward to follow.
> A selftest for this case would be great, and any traces (e.g. hung task) would
> help a lot as well.
> Correct me if I'm wrong but the sequence is something like:
> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
> igmp6_group_added
> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
> -> igmp6_join_group() -> igmp6_join_group
> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
> -> mcast processing code
> which uses the multicast_lock spinlock. Right?

That is correct.

Here's a stack trace from a typical run:

echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
[  936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
[  936.152534] rcu: 0-....: (5594 ticks this GP) idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625
[  936.162026] (t=5253 jiffies g=4205 q=12)
[  936.166041] Task dump for CPU 0:
[  936.169272] sh              R  running task        0  1315   1295 0x00000002
[  936.176332] Call trace:
[  936.178797]  dump_backtrace+0x0/0x140
[  936.182469]  show_stack+0x14/0x20
[  936.185793]  sched_show_task+0x108/0x138
[  936.189727]  dump_cpu_task+0x40/0x50
[  936.193313]  rcu_dump_cpu_stacks+0x94/0xd0
[  936.197420]  rcu_sched_clock_irq+0x75c/0x9c0
[  936.201698]  update_process_times+0x2c/0x68
[  936.205893]  tick_sched_handle.isra.0+0x30/0x50
[  936.210432]  tick_sched_timer+0x48/0x98
[  936.214272]  __hrtimer_run_queues+0x110/0x1b0
[  936.218635]  hrtimer_interrupt+0xe4/0x240
[  936.222656]  arch_timer_handler_phys+0x30/0x40
[  936.227106]  handle_percpu_devid_irq+0x80/0x140
[  936.231654]  generic_handle_irq+0x24/0x38
[  936.235669]  __handle_domain_irq+0x60/0xb8
[  936.239774]  gic_handle_irq+0x5c/0x148
[  936.243535]  el1_irq+0xb8/0x180
[  936.246689]  queued_spin_lock_slowpath+0x118/0x3b0
[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]


>
> One question - shouldn't leaving have the same problem? I.e.
> br_multicast_toggle -> br_multicast_leave_snoopers
> -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec ->
> -> igmp6_group_dropped -> igmp6_leave_group ->
> MLDv1 mode && last reporter -> igmp6_send() ?
>
> I think it was saved by the fact that !br_opt_get(br,
> BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be
> acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that
> because it's not really trivial to find out. :)

That's a really good point. Leave should have deadlocked as well, but when I
tested the patch, I was able to turn on/off multicast snooping multiple times
without any problem.

Is it because this line in igmp6_leave_group?

if (ma->mca_flags & MAF_LAST_REPORTER)
igmp6_send(&ma->mca_addr, ma->idev->dev,
ICMPV6_MGM_REDUCTION);

Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called?

>
> Anyhow, the patch is fine as-is too:
> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
>
> Thanks,
>  Nik

Thanks,
Joseph

________________________________

CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.

  reply	other threads:[~2020-12-03 22:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01 21:40 [PATCH] bridge: Fix a deadlock when enabling multicast snooping Joseph Huang
2020-12-03 18:28 ` Jakub Kicinski
2020-12-03 20:46   ` Nikolay Aleksandrov
2020-12-03 21:53     ` Huang, Joseph [this message]
2020-12-03 22:42       ` Huang, Joseph
2020-12-03 23:34         ` Nikolay Aleksandrov
2020-12-04 21:39 ` [PATCH v2] " Joseph Huang
2020-12-04 22:11   ` Nikolay Aleksandrov
2020-12-04 22:30     ` Huang, Joseph
2020-12-04 23:56 ` [PATCH v3] " Joseph Huang
2020-12-05  8:56   ` Nikolay Aleksandrov
2020-12-08  1:20     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c82ce96d74ed4d3897d2e68a258f7834@garmin.com \
    --to=joseph.huang@garmin.com \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linus.luessing@c0d3.blue \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).