All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Zhihao Cheng <chengzhihao1@huawei.com>,
	Zhang Yi <yi.zhang@huawei.com>, Brian Foster <bfoster@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Matthew Wilcox <willy@infradead.org>, Baoquan He <bhe@redhat.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Yu Kuai <yukuai3@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Suraj Jitindar Singh <surajjs@amazon.com>
Subject: [PATCH 5.10 72/83] proc: fix a dentry lock race between release_task and lookup
Date: Wed, 20 Sep 2023 13:32:02 +0200	[thread overview]
Message-ID: <20230920112829.502498246@linuxfoundation.org> (raw)
In-Reply-To: <20230920112826.634178162@linuxfoundation.org>

5.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Zhihao Cheng <chengzhihao1@huawei.com>

commit d919a1e79bac890421537cf02ae773007bf55e6b upstream.

Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal().  Then, process systemd can
take long period high cpu usage during releasing task in following
concurrent processes:

  systemd                                 ps
kernel_waitid                 stat(/proc/tgid)
  do_wait                       filename_lookup
    wait_consider_task            lookup_fast
      release_task
        __exit_signal
          __unhash_process
            detach_pid
              __change_pid // remove task->pid_links
                                     d_revalidate -> pid_revalidate  // 0
                                     d_invalidate(/proc/tgid)
                                       shrink_dcache_parent(/proc/tgid)
                                         d_walk(/proc/tgid)
                                           spin_lock_nested(/proc/tgid/fd)
                                           // iterating opened fd
        proc_flush_pid                                    |
           d_invalidate (/proc/tgid/fd)                   |
              shrink_dcache_parent(/proc/tgid/fd)         |
                shrink_dentry_list(subdirs)               ↓
                  shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock

Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry
'/proc/tgid'?  That's because proc_pid_make_inode() adds proc inode in
reverse order by invoking hlist_add_head_rcu().  But proc should not add
any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by
adding inode into 'pid->inodes' only if the inode is /proc/tgid or
/proc/tgid/task/pid.

Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).

Before fix:
$ time killall -wq aa
  real    4m40.946s   # During this period, we can see 'ps' and 'systemd'
			taking high cpu usage.

After fix:
$ time killall -wq aa
  real    1m20.732s   # During this period, we can see 'systemd' taking
			high cpu usage.

Link: https://lkml.kernel.org/r/20220713130029.4133533-1-chengzhihao1@huawei.com
Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Context adjustments ]
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/proc/base.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 8 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1881,7 +1881,7 @@ void proc_pid_evict_inode(struct proc_in
 	put_pid(pid);
 }
 
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
 				  struct task_struct *task, umode_t mode)
 {
 	struct inode * inode;
@@ -1910,11 +1910,6 @@ struct inode *proc_pid_make_inode(struct
 
 	/* Let the pid remember us for quick removal */
 	ei->pid = pid;
-	if (S_ISDIR(mode)) {
-		spin_lock(&pid->lock);
-		hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
-		spin_unlock(&pid->lock);
-	}
 
 	task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
 	security_task_to_inode(task, inode);
@@ -1927,6 +1922,39 @@ out_unlock:
 	return NULL;
 }
 
+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+				struct task_struct *task, umode_t mode)
+{
+	struct inode *inode;
+	struct proc_inode *ei;
+	struct pid *pid;
+
+	inode = proc_pid_make_inode(sb, task, mode);
+	if (!inode)
+		return NULL;
+
+	/* Let proc_flush_pid find this directory inode */
+	ei = PROC_I(inode);
+	pid = ei->pid;
+	spin_lock(&pid->lock);
+	hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+	spin_unlock(&pid->lock);
+
+	return inode;
+}
+
 int pid_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int query_flags)
 {
@@ -3341,7 +3369,8 @@ static struct dentry *proc_pid_instantia
 {
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_base_inode(dentry->d_sb, task,
+					 S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
@@ -3637,7 +3666,8 @@ static struct dentry *proc_task_instanti
 	struct task_struct *task, const void *ptr)
 {
 	struct inode *inode;
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_base_inode(dentry->d_sb, task,
+					 S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 



  parent reply	other threads:[~2023-09-20 12:23 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 11:30 [PATCH 5.10 00/83] 5.10.196-rc1 review Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 01/83] autofs: fix memory leak of waitqueues in autofs_catatonic_mode Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 02/83] btrfs: output extra debug info if we failed to find an inline backref Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 03/83] locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 04/83] ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 05/83] kernel/fork: beware of __put_task_struct() calling context Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 06/83] rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 07/83] scftorture: Forgive memory-allocation failure if KASAN Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 08/83] ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470 Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 09/83] perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 10/83] ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 11/83] hw_breakpoint: fix single-stepping when using bpf_overflow_handler Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 12/83] devlink: remove reload failed checks in params get/set callbacks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 13/83] crypto: lrw,xts - Replace strlcpy with strscpy Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 14/83] wifi: ath9k: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 15/83] wifi: ath9k: fix printk specifier Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 16/83] wifi: mwifiex: fix fortify warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 17/83] wifi: wil6210: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 18/83] crypto: lib/mpi - avoid null pointer deref in mpi_cmp_ui() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 19/83] tpm_tis: Resend command to recover from data transfer errors Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 20/83] mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 21/83] alx: fix OOB-read compiler warning Greg Kroah-Hartman
2023-09-20 11:31   ` Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 22/83] netfilter: ebtables: fix fortify warnings in size_entry_mwt() Greg Kroah-Hartman
2023-09-20 11:31   ` Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 23/83] wifi: mac80211_hwsim: drop short frames Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 24/83] libbpf: Free btf_vmlinux when closing bpf_object Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 25/83] drm/bridge: tc358762: Instruct DSI host to generate HSE packets Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 26/83] samples/hw_breakpoint: Fix kernel BUG invalid opcode: 0000 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 27/83] ASoC: Intel: sof_sdw: Update BT offload config for soundwire config Greg Kroah-Hartman
2023-09-20 20:13   ` Salvatore Bonaccorso
2023-09-22  4:39     ` Salvatore Bonaccorso
2023-09-23  8:23       ` Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 28/83] ALSA: hda: intel-dsp-cfg: add LunarLake support Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 29/83] drm/exynos: fix a possible null-pointer dereference due to data race in exynos_drm_crtc_atomic_disable() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 30/83] bus: ti-sysc: Configure uart quirks for k3 SoC Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 31/83] md: raid1: fix potential OOB in raid1_remove_disk() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 32/83] ext2: fix datatype of block number in ext2_xattr_set2() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 33/83] fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 34/83] jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 35/83] powerpc/pseries: fix possible memory leak in ibmebus_bus_init() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 36/83] media: dvb-usb-v2: af9035: Fix null-ptr-deref in af9035_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 37/83] media: dw2102: Fix null-ptr-deref in dw2102_i2c_transfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 38/83] media: af9005: Fix null-ptr-deref in af9005_i2c_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 39/83] media: anysee: fix null-ptr-deref in anysee_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 40/83] media: az6007: Fix null-ptr-deref in az6007_i2c_xfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 41/83] media: dvb-usb-v2: gl861: Fix null-ptr-deref in gl861_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 42/83] media: tuners: qt1010: replace BUG_ON with a regular error Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 43/83] media: pci: cx23885: replace BUG with error return Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 44/83] usb: gadget: fsl_qe_udc: validate endpoint index for ch9 udc Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 45/83] scsi: target: iscsi: Fix buffer overflow in lio_target_nacl_info_show() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 46/83] serial: cpm_uart: Avoid suspicious locking Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 47/83] media: pci: ipu3-cio2: Initialise timing struct to avoid a compiler warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 48/83] kobject: Add sanity check for kset->kobj.ktype in kset_register() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 49/83] interconnect: Fix locking for runpm vs reclaim Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 50/83] mtd: rawnand: brcmnand: Allow SoC to provide I/O operations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 51/83] mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 52/83] perf jevents: Make build dependency on test JSONs Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 53/83] perf tools: Add an option to build without libbfd Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 54/83] perf jevents: Switch build to use jevents.py Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 55/83] perf build: Update build rule for generated files Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 56/83] btrfs: move btrfs_pinned_by_swapfile prototype into volumes.h Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 57/83] btrfs: add a helper to read the superblock metadata_uuid Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 58/83] btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 59/83] drm: gm12u320: Fix the timeout usage for usb_bulk_msg() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 60/83] scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 61/83] selftests: tracing: Fix to unmount tracefs for recovering environment Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 62/83] scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 63/83] x86/boot/compressed: Reserve more memory for page tables Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 64/83] samples/hw_breakpoint: fix building without module unloading Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 65/83] md/raid1: fix error: ISO C90 forbids mixed declarations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 66/83] attr: block mode changes of symlinks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 67/83] ovl: fix incorrect fdput() on aio completion Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 68/83] btrfs: fix lockdep splat and potential deadlock after failure running delayed items Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 69/83] btrfs: release path before inode lookup during the ino lookup ioctl Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 70/83] drm/amdgpu: fix amdgpu_cs_p1_user_fence Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 71/83] net/sched: Retire rsvp classifier Greg Kroah-Hartman
2023-09-20 11:32 ` Greg Kroah-Hartman [this message]
2023-09-20 11:32 ` [PATCH 5.10 73/83] mm/filemap: fix infinite loop in generic_file_buffered_read() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 74/83] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 75/83] tracing: Have current_trace inc the trace array ref count Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 76/83] tracing: Have option files " Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 77/83] nfsd: fix change_info in NFSv4 RENAME replies Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 78/83] tracefs: Add missing lockdown check to tracefs_create_dir() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 79/83] i2c: aspeed: Reset the i2c controller when timeout occurs Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 80/83] ata: libata: disallow dev-initiated LPM transitions to unsupported states Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 81/83] scsi: megaraid_sas: Fix deadlock on firmware crashdump Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 82/83] scsi: pm8001: Setup IRQs on resume Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 83/83] ext4: fix rec_len verify error Greg Kroah-Hartman
2023-09-20 18:13 ` [PATCH 5.10 00/83] 5.10.196-rc1 review Florian Fainelli
     [not found]   ` <CABbpPsz0Brmfw3zg2Y_r54Hx2mN1Ly=AtghKNE9chmtSP_-Y6A@mail.gmail.com>
2023-09-20 18:47     ` Florian Fainelli
2023-09-20 21:35 ` Shuah Khan
2023-09-21  6:07 ` Dominique Martinet
2023-09-21 13:32 ` Naresh Kamboju
2023-09-21 15:59 ` Guenter Roeck
2023-09-23  8:22   ` Greg Kroah-Hartman
2023-09-21 20:38 ` Joel Fernandes
2023-09-22  9:46 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920112829.502498246@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=bhe@redhat.com \
    --cc=chengzhihao1@huawei.com \
    --cc=ebiederm@xmission.com \
    --cc=kaleshsingh@google.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surajjs@amazon.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.