stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ivan Vecera <ivecera@redhat.com>,
	Petr Oros <poros@redhat.com>,
	Dave Ertman <david.m.ertman@intel.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH 5.16 28/28] ice: Fix race condition during interface enslave
Date: Thu, 17 Mar 2022 13:46:19 +0100	[thread overview]
Message-ID: <20220317124527.562546441@linuxfoundation.org> (raw)
In-Reply-To: <20220317124526.768423926@linuxfoundation.org>

From: Ivan Vecera <ivecera@redhat.com>

commit 5cb1ebdbc4342b1c2ce89516e19808d64417bdbc upstream.

Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.

Reproducer:
```
#!/bin/sh

ip link add lag0 type bond mode 1 miimon 100
ip link set lag0

for n in {1..10}; do
        echo Cycle: $n
        ip link set ens7f0 master lag0
        sleep 1
        ip link set ens7f0 nomaster
done
```

This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871]  __schedule+0x2d1/0x830
[20976.219364]  schedule+0x35/0xa0
[20976.222510]  schedule_preempt_disabled+0xa/0x10
[20976.227043]  __mutex_lock.isra.7+0x310/0x420
[20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
[20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
[20976.285691]  auxiliary_bus_probe+0x45/0x70
[20976.289790]  really_probe+0x1f2/0x480
[20976.298509]  driver_probe_device+0x49/0xc0
[20976.302609]  bus_for_each_drv+0x79/0xc0
[20976.306448]  __device_attach+0xdc/0x160
[20976.310286]  bus_probe_device+0x9d/0xb0
[20976.314128]  device_add+0x43c/0x890
[20976.321287]  __auxiliary_device_add+0x43/0x60
[20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
[20976.342591]  process_one_work+0x1a7/0x360
[20976.350536]  worker_thread+0x30/0x390
[20976.358128]  kthread+0x10a/0x120
[20976.365547]  ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921]  __schedule+0x2d1/0x830
[20976.452414]  schedule+0x35/0xa0
[20976.455559]  schedule_preempt_disabled+0xa/0x10
[20976.460090]  __mutex_lock.isra.7+0x310/0x420
[20976.464364]  device_del+0x36/0x3c0
[20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288]  notifier_call_chain+0x47/0x70
[20976.481386]  __netdev_upper_dev_link+0x18b/0x280
[20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
[20976.494475]  do_setlink+0x336/0xf50
[20976.502517]  __rtnl_newlink+0x529/0x8b0
[20976.543441]  rtnl_newlink+0x43/0x60
[20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238]  netlink_rcv_skb+0x4c/0x120
[20976.563079]  netlink_unicast+0x196/0x230
[20976.567005]  netlink_sendmsg+0x204/0x3d0
[20976.570930]  sock_sendmsg+0x4c/0x50
[20976.574423]  ____sys_sendmsg+0x1eb/0x250
[20976.586807]  ___sys_sendmsg+0x7c/0xc0
[20976.606353]  __sys_sendmsg+0x57/0xa0
[20976.609930]  do_syscall_64+0x5b/0x1a0
[20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca

1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
   is called from ice_service_task() context, aux device is created
   and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
   RTNL lock and that notifier calls ice_unplug_aux_dev(). That
   function tries to take aux device->lock but this is already taken
   by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
   taken in step 2
4. Dead-lock

The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
  call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
  then ice_unplug_aux_dev() is called. If it is set (in other words
  plugging of aux device was requested and ice_plug_aux_dev() is
  potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
  the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
  whether it was already cleared by ice_clear_rdma_cap(). If so then
  aux device is unplugged.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/intel/ice/ice.h      |   11 ++++++++++-
 drivers/net/ethernet/intel/ice/ice_main.c |   12 +++++++++++-
 2 files changed, 21 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -891,7 +891,16 @@ static inline void ice_set_rdma_cap(stru
  */
 static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 {
-	ice_unplug_aux_dev(pf);
+	/* We can directly unplug aux device here only if the flag bit
+	 * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
+	 * could race with ice_plug_aux_dev() called from
+	 * ice_service_task(). In this case we only clear that bit now and
+	 * aux device will be unplugged later once ice_plug_aux_device()
+	 * called from ice_service_task() finishes (see ice_service_task()).
+	 */
+	if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+		ice_unplug_aux_dev(pf);
+
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 	clear_bit(ICE_FLAG_AUX_ENA, pf->flags);
 }
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2237,9 +2237,19 @@ static void ice_service_task(struct work
 		return;
 	}
 
-	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
+		/* Plug aux device per request */
 		ice_plug_aux_dev(pf);
 
+		/* Mark plugging as done but check whether unplug was
+		 * requested during ice_plug_aux_dev() call
+		 * (e.g. from ice_clear_rdma_cap()) and if so then
+		 * plug aux device.
+		 */
+		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+			ice_unplug_aux_dev(pf);
+	}
+
 	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
 		struct iidc_event *event;
 



  parent reply	other threads:[~2022-03-17 12:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 12:45 [PATCH 5.16 00/28] 5.16.16-rc1 review Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 01/28] Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 02/28] arm64: dts: rockchip: fix dma-controller node names on rk356x Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 03/28] arm64: dts: rockchip: fix rk3399-puma-haikou USB OTG mode Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 04/28] xfrm: Check if_id in xfrm_migrate Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 05/28] xfrm: Fix xfrm migrate issues when address family changes Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 06/28] arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 07/28] arm64: dts: rockchip: align pl330 node name with dtschema Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 08/28] arm64: dts: rockchip: reorder rk3399 hdmi clocks Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 09/28] arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg" Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 10/28] ARM: dts: rockchip: reorder rk322x hmdi clocks Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 11/28] ARM: dts: rockchip: fix a typo on rk3288 crypto-controller Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 12/28] mac80211: refuse aggregations sessions before authorized Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 13/28] MIPS: smp: fill in sibling and core maps earlier Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 14/28] ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 15/28] Bluetooth: hci_core: Fix leaking sent_cmd skb Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 16/28] can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 17/28] atm: firestream: check the return value of ioremap() in fs_init() Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 18/28] netfilter: egress: silence egress hook lockdep splats Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 19/28] Input: goodix - use the new soc_intel_is_byt() helper Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 20/28] Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 21/28] iwlwifi: dont advertise TWT support Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 22/28] drm/vrr: Set VRR capable prop only if it is attached to connector Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 23/28] nl80211: Update bss channel on channel switch for P2P_CLIENT Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 24/28] tcp: make tcp_read_sock() more robust Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 25/28] sfc: extend the locking on mcdi->seqno Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 26/28] bnx2: Fix an error message Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 27/28] kselftest/vm: fix tests build with old libc Greg Kroah-Hartman
2022-03-17 12:46 ` Greg Kroah-Hartman [this message]
2022-03-17 17:01 ` [PATCH 5.16 00/28] 5.16.16-rc1 review Fox Chen
2022-03-17 22:17 ` Florian Fainelli
2022-03-18  2:20 ` Guenter Roeck
2022-03-18  9:10 ` Naresh Kamboju
2022-03-18 11:09 ` Bagas Sanjaya
2022-03-18 13:14 ` Jon Hunter
2022-03-18 13:48 ` Ron Economos
2022-03-18 13:52 ` Rudi Heitbaum
2022-03-18 22:56 ` Justin Forbes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220317124527.562546441@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=david.m.ertman@intel.com \
    --cc=ivecera@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=poros@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).