All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Reinette Chatre <reinette.chatre@intel.com>,
	Xiaochen Shen <xiaochen.shen@intel.com>,
	Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 04/70] x86/resctrl: Fix a deadlock due to inaccurate reference
Date: Mon,  3 Feb 2020 16:19:16 +0000	[thread overview]
Message-ID: <20200203161912.983146506@linuxfoundation.org> (raw)
In-Reply-To: <20200203161912.158976871@linuxfoundation.org>

From: Xiaochen Shen <xiaochen.shen@intel.com>

commit 334b0f4e9b1b4a1d475f803419d202f6c5e4d18e upstream.

There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:

$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus  cpus_list  mon_data  tasks

Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1

3 locks held by mkdir/48649:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
 #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

4 locks held by rmdir/48652:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
 #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
 #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.

Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.

As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.

Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
-------------------------   -------------------------
                            kernfs_iop_mkdir
                              /*
                               * kn: "m1", parent_kn: "mon_groups",
                               * prgrp_kn: parent_kn->parent: "c1",
                               *
                               * "mon_groups", parent_kn->active++: 1
                               */
                              kernfs_get_active(parent_kn)
kernfs_iop_rmdir
  /* "c1", kn->active++ */
  kernfs_get_active(kn)

  rdtgroup_kn_lock_live
    atomic_inc(&rdtgrp->waitcount)
    /* "c1", kn->active-- */
    kernfs_break_active_protection(kn)
    mutex_lock

  rdtgroup_rmdir_ctrl
    free_all_child_rdtgrp
      sentry->flags = RDT_DELETED

    rdtgroup_ctrl_remove
      rdtgrp->flags = RDT_DELETED
      kernfs_get(kn)
      kernfs_remove(rdtgrp->kn)
        __kernfs_remove
          /* "mon_groups", sub_kn */
          atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
          kernfs_drain(sub_kn)
            /*
             * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
             * waiting on sub_kn->active to drop, but it
             * never drops in Thread 2 which is blocked
             * on getting rdtgroup_mutex.
             */
Thread 1 hangs here ---->
            wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
            ...
                              rdtgroup_mkdir
                                rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                  mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                    rdtgroup_kn_lock_live(prgrp_kn)
                                      atomic_inc(&rdtgrp->waitcount)
                                      /*
                                       * "c1", prgrp_kn->active--
                                       *
                                       * The active reference on "c1" is
                                       * dropped, but not matching the
                                       * actual active reference taken
                                       * on "mon_groups", thus causing
                                       * Thread 1 to wait forever while
                                       * holding rdtgroup_mutex.
                                       */
                                      kernfs_break_active_protection(
                                                               prgrp_kn)
                                      /*
                                       * Trying to get rdtgroup_mutex
                                       * which is held by Thread 1.
                                       */
Thread 2 hangs here ---->             mutex_lock
                                      ...

The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.

Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.

This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.

Backporting notes:

Since upstream commit fa7d949337cc ("x86/resctrl: Rename and move rdt
files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/rdtgroup.c.
Apply the change against file arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
for older stable trees.

Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 77770caeea242..11c5accfa4db5 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2005,7 +2005,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 
 	if (rdt_mon_capable) {
 		ret = mongroup_create_dir(rdtgroup_default.kn,
-					  NULL, "mon_groups",
+					  &rdtgroup_default, "mon_groups",
 					  &kn_mongrp);
 		if (ret) {
 			dentry = ERR_PTR(ret);
@@ -2415,7 +2415,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 	/*
 	 * Create the mon_data directory first.
 	 */
-	ret = mongroup_create_dir(parent_kn, NULL, "mon_data", &kn);
+	ret = mongroup_create_dir(parent_kn, prgrp, "mon_data", &kn);
 	if (ret)
 		return ret;
 
@@ -2568,7 +2568,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 	uint files = 0;
 	int ret;
 
-	prdtgrp = rdtgroup_kn_lock_live(prgrp_kn);
+	prdtgrp = rdtgroup_kn_lock_live(parent_kn);
 	rdt_last_cmd_clear();
 	if (!prdtgrp) {
 		ret = -ENODEV;
@@ -2643,7 +2643,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 	kernfs_activate(kn);
 
 	/*
-	 * The caller unlocks the prgrp_kn upon success.
+	 * The caller unlocks the parent_kn upon success.
 	 */
 	return 0;
 
@@ -2654,7 +2654,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 out_free_rgrp:
 	kfree(rdtgrp);
 out_unlock:
-	rdtgroup_kn_unlock(prgrp_kn);
+	rdtgroup_kn_unlock(parent_kn);
 	return ret;
 }
 
@@ -2692,7 +2692,7 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 	 */
 	list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list);
 
-	rdtgroup_kn_unlock(prgrp_kn);
+	rdtgroup_kn_unlock(parent_kn);
 	return ret;
 }
 
@@ -2735,7 +2735,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 		 * Create an empty mon_groups directory to hold the subset
 		 * of tasks and cpus to monitor.
 		 */
-		ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
+		ret = mongroup_create_dir(kn, rdtgrp, "mon_groups", NULL);
 		if (ret) {
 			rdt_last_cmd_puts("kernfs subdir error\n");
 			goto out_del_list;
@@ -2751,7 +2751,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 out_common_fail:
 	mkdir_rdt_prepare_clean(rdtgrp);
 out_unlock:
-	rdtgroup_kn_unlock(prgrp_kn);
+	rdtgroup_kn_unlock(parent_kn);
 	return ret;
 }
 
-- 
2.20.1




  parent reply	other threads:[~2020-02-03 16:31 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 16:19 [PATCH 4.19 00/70] 4.19.102-stable review Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 01/70] vfs: fix do_last() regression Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 02/70] x86/resctrl: Fix use-after-free when deleting resource groups Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 03/70] x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup Greg Kroah-Hartman
2020-02-03 16:19 ` Greg Kroah-Hartman [this message]
2020-02-03 16:19 ` [PATCH 4.19 05/70] crypto: pcrypt - Fix user-after-free on module unload Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 06/70] rsi: add hci detach for hibernation and poweroff Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 07/70] rsi: fix use-after-free on failed probe and unbind Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 08/70] perf c2c: Fix return type for histogram sorting comparision functions Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 09/70] PM / devfreq: Add new name attribute for sysfs Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 10/70] tools lib: Fix builds when glibc contains strlcpy() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 11/70] arm64: kbuild: remove compressed images on make ARCH=arm64 (dist)clean Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 12/70] ext4: validate the debug_want_extra_isize mount option at parse time Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 13/70] mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 14/70] reiserfs: Fix memory leak of journal device string Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 15/70] media: digitv: dont continue if remote control state cant be read Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 16/70] media: af9005: uninitialized variable printked Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 17/70] media: vp7045: do not read uninitialized values if usb transfer fails Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 18/70] media: gspca: zero usb_buf Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 19/70] media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0 Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 20/70] tomoyo: Use atomic_t for statistics counter Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 21/70] ttyprintk: fix a potential deadlock in interrupt context issue Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 22/70] Bluetooth: Fix race condition in hci_release_sock() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 23/70] cgroup: Prevent double killing of css when enabling threaded cgroup Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 24/70] media: si470x-i2c: Move free() past last use of radio Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 25/70] ARM: dts: sun8i: a83t: Correct USB3503 GPIOs polarity Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 26/70] ARM: dts: am57xx-beagle-x15/am57xx-idk: Remove "gpios" for endpoint dt nodes Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 27/70] ARM: dts: beagle-x15-common: Model 5V0 regulator Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 28/70] soc: ti: wkup_m3_ipc: Fix race condition with rproc_boot Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 29/70] tools lib traceevent: Fix memory leakage in filter_event Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 30/70] rseq: Unregister rseq for clone CLONE_VM Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 31/70] clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 32/70] mac80211: mesh: restrict airtime metric to peered established plinks Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 33/70] clk: mmp2: Fix the order of timer mux parents Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 34/70] ASoC: rt5640: Fix NULL dereference on module unload Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 35/70] ixgbevf: Remove limit of 10 entries for unicast filter list Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 36/70] ixgbe: Fix calculation of queue with VFs and flow director on interface flap Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 37/70] igb: Fix SGMII SFP module discovery for 100FX/LX Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 38/70] platform/x86: GPD pocket fan: Allow somewhat lower/higher temperature limits Greg Kroah-Hartman
2020-02-04 19:12   ` Pavel Machek
2020-02-03 16:19 ` [PATCH 4.19 39/70] ASoC: sti: fix possible sleep-in-atomic Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 40/70] qmi_wwan: Add support for Quectel RM500Q Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 41/70] parisc: Use proper printk format for resource_size_t Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 42/70] wireless: fix enabling channel 12 for custom regulatory domain Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 43/70] cfg80211: Fix radar event during another phy CAC Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 44/70] mac80211: Fix TKIP replay protection immediately after key setup Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 45/70] wireless: wext: avoid gcc -O3 warning Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 46/70] netfilter: nft_tunnel: ERSPAN_VERSION must not be null Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 47/70] net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 48/70] bnxt_en: Fix ipv6 RFS filter matching logic Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 49/70] riscv: delete temporary files Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 50/70] iwlwifi: mvm: fix NVM check for 3168 devices Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 51/70] iwlwifi: Dont ignore the cap field upon mcc update Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 52/70] Input: aiptek - use descriptors of current altsetting Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 53/70] ARM: dts: am335x-boneblack-common: fix memory size Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 54/70] vti[6]: fix packet tx through bpf_redirect() Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 55/70] xfrm interface: " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 56/70] xfrm: interface: do not confirm neighbor when do pmtu update Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 57/70] scsi: fnic: do not queue commands during fwreset Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 58/70] ARM: 8955/1: virt: Relax arch timer version check during early boot Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 59/70] tee: optee: Fix compilation issue with nommu Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 60/70] airo: Fix possible info leak in AIROOLDIOCTL/SIOCDEVPRIVATE Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 61/70] airo: Add missing CAP_NET_ADMIN check " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 62/70] r8152: get default setting of WOL before initializing Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 63/70] ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 64/70] qlcnic: Fix CPU soft lockup while collecting firmware dump Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 65/70] powerpc/fsl/dts: add fsl,erratum-a011043 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 66/70] net/fsl: treat fsl,erratum-a011043 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 67/70] net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 68/70] seq_tab_next() should increase position index Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 69/70] l2t_seq_next " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 70/70] net: Fix skb->csum update in inet_proto_csum_replace16() Greg Kroah-Hartman
     [not found] ` <20200203161912.158976871-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-02-03 21:39   ` [PATCH 4.19 00/70] 4.19.102-stable review Jon Hunter
2020-02-03 21:39     ` Jon Hunter
2020-02-04 12:37 ` Pavel Machek
2020-02-05 14:42   ` Greg Kroah-Hartman
2020-02-04 13:37 ` Naresh Kamboju
2020-02-04 17:19 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200203161912.983146506@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=xiaochen.shen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.