linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, kernel test robot <oliver.sang@intel.com>,
	Carel Si <beibei.si@intel.com>, Jann Horn <jannh@google.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.15 13/41] fget: clarify and improve __fget_files() implementation
Date: Fri, 14 Jan 2022 09:16:13 +0100	[thread overview]
Message-ID: <20220114081545.603027941@linuxfoundation.org> (raw)
In-Reply-To: <20220114081545.158363487@linuxfoundation.org>

From: Linus Torvalds <torvalds@linux-foundation.org>

commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.

Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed.  It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.

But I was wrong, as pointed out by the kernel test robot.

The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably.  Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.

That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with).  And as a result it
shows the extra "re-check fd" cost as a sore thumb.

Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer.  There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.

[ Side note: this patch is also a classic case of one that looks very
  messy with the default greedy Myers diff - it's much more legible with
  either the patience of histogram diff algorithm ]

Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Carel Si <beibei.si@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/file.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 56 insertions(+), 16 deletions(-)

--- a/fs/file.c
+++ b/fs/file.c
@@ -841,28 +841,68 @@ void do_close_on_exec(struct files_struc
 	spin_unlock(&files->file_lock);
 }
 
-static struct file *__fget_files(struct files_struct *files, unsigned int fd,
-				 fmode_t mask, unsigned int refs)
+static inline struct file *__fget_files_rcu(struct files_struct *files,
+	unsigned int fd, fmode_t mask, unsigned int refs)
 {
-	struct file *file;
+	for (;;) {
+		struct file *file;
+		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+		struct file __rcu **fdentry;
 
-	rcu_read_lock();
-loop:
-	file = files_lookup_fd_rcu(files, fd);
-	if (file) {
-		/* File object ref couldn't be taken.
-		 * dup2() atomicity guarantee is the reason
-		 * we loop to catch the new file (or NULL pointer)
+		if (unlikely(fd >= fdt->max_fds))
+			return NULL;
+
+		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		file = rcu_dereference_raw(*fdentry);
+		if (unlikely(!file))
+			return NULL;
+
+		if (unlikely(file->f_mode & mask))
+			return NULL;
+
+		/*
+		 * Ok, we have a file pointer. However, because we do
+		 * this all locklessly under RCU, we may be racing with
+		 * that file being closed.
+		 *
+		 * Such a race can take two forms:
+		 *
+		 *  (a) the file ref already went down to zero,
+		 *      and get_file_rcu_many() fails. Just try
+		 *      again:
 		 */
-		if (file->f_mode & mask)
-			file = NULL;
-		else if (!get_file_rcu_many(file, refs))
-			goto loop;
-		else if (files_lookup_fd_raw(files, fd) != file) {
+		if (unlikely(!get_file_rcu_many(file, refs)))
+			continue;
+
+		/*
+		 *  (b) the file table entry has changed under us.
+		 *       Note that we don't need to re-check the 'fdt->fd'
+		 *       pointer having changed, because it always goes
+		 *       hand-in-hand with 'fdt'.
+		 *
+		 * If so, we need to put our refs and try again.
+		 */
+		if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
+		    unlikely(rcu_dereference_raw(*fdentry) != file)) {
 			fput_many(file, refs);
-			goto loop;
+			continue;
 		}
+
+		/*
+		 * Ok, we have a ref to the file, and checked that it
+		 * still exists.
+		 */
+		return file;
 	}
+}
+
+static struct file *__fget_files(struct files_struct *files, unsigned int fd,
+				 fmode_t mask, unsigned int refs)
+{
+	struct file *file;
+
+	rcu_read_lock();
+	file = __fget_files_rcu(files, fd, mask, refs);
 	rcu_read_unlock();
 
 	return file;



  parent reply	other threads:[~2022-01-14  8:20 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14  8:16 [PATCH 5.15 00/41] 5.15.15-rc1 review Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 01/41] s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 02/41] workqueue: Fix unbind_workers() VS wq_worker_running() race Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 03/41] staging: r8188eu: switch the led off during deinit Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 04/41] bpf: Fix out of bounds access from invalid *_or_null type verification Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 05/41] Bluetooth: btusb: Add protocol for MediaTek bluetooth devices(MT7922) Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 06/41] Bluetooth: btusb: Add the new support ID for Realtek RTL8852A Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 07/41] Bluetooth: btusb: Add support for IMC Networks Mediatek Chip(MT7921) Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 08/41] Bbluetooth: btusb: Add another Bluetooth part for Realtek 8852AE Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 09/41] Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 10/41] Bluetooth: btusb: enable Mediatek to support AOSP extension Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 11/41] Bluetooth: btusb: Add one more Bluetooth part for the Realtek RTL8852AE Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 12/41] Bluetooth: btusb: Add the new support IDs for WCN6855 Greg Kroah-Hartman
2022-01-14  8:16 ` Greg Kroah-Hartman [this message]
2022-01-14  8:16 ` [PATCH 5.15 14/41] Bluetooth: btusb: Add one more Bluetooth part " Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 15/41] Bluetooth: btusb: Add two more Bluetooth parts " Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 16/41] Bluetooth: btusb: Add support for Foxconn MT7922A Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 17/41] Bluetooth: btintel: Fix broken LED quirk for legacy ROM devices Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 18/41] Bluetooth: btusb: Add support for Foxconn QCA 0xe0d0 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 19/41] Bluetooth: bfusb: fix division by zero in send path Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 20/41] ARM: dts: exynos: Fix BCM4330 Bluetooth reset polarity in I9100 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 21/41] USB: core: Fix bug in resuming hubs handling of wakeup requests Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 22/41] USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 23/41] ath11k: Fix buffer overflow when scanning with extraie Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 24/41] mmc: sdhci-pci: Add PCI ID for Intel ADL Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 25/41] Bluetooth: add quirk disabling LE Read Transmit Power Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 26/41] Bluetooth: btbcm: disable read tx power for some Macs with the T2 Security chip Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 27/41] Bluetooth: btbcm: disable read tx power for MacBook Air 8,1 and 8,2 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 28/41] veth: Do not record rx queue hint in veth_xmit Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 29/41] mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 30/41] x86/mce: Remove noinstr annotation from mce_setup() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 31/41] can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 32/41] can: isotp: convert struct tpcon::{idx,len} to unsigned int Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 33/41] can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 34/41] random: fix data race on crng_node_pool Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 35/41] random: fix data race on crng init time Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 36/41] random: fix crash on multiple early calls to add_bootloader_randomness() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 37/41] platform/x86/intel: hid: add quirk to support Surface Go 3 Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 38/41] media: Revert "media: uvcvideo: Set unique vdev name based in type" Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 39/41] staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 40/41] drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() Greg Kroah-Hartman
2022-01-14  8:16 ` [PATCH 5.15 41/41] staging: greybus: fix stack size warning with UBSAN Greg Kroah-Hartman
2022-01-14 17:43 ` [PATCH 5.15 00/41] 5.15.15-rc1 review Naresh Kamboju
2022-01-14 19:59 ` Ron Economos
2022-01-15  8:14   ` Greg Kroah-Hartman
2022-01-15 11:52     ` Ron Economos
2022-01-15 12:15       ` Greg Kroah-Hartman
2022-01-15 12:31         ` Ron Economos
2022-01-14 22:29 ` Florian Fainelli
2022-01-14 23:32 ` Fox Chen
2022-01-15  0:24 ` Shuah Khan
2022-01-15 11:03 ` Sudip Mukherjee
2022-01-15 14:47 ` Andrei Rabusov
2022-01-15 16:39 ` Guenter Roeck
2022-01-15 16:48 ` Jeffrin Jose T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220114081545.603027941@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=beibei.si@intel.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).