linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jann Horn <jannh@google.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, kernel test robot <lkp@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Feng Tang <feng.tang@intel.com>,
	Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	fengwei.yin@intel.com
Subject: Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression
Date: Fri, 10 Dec 2021 13:59:08 -0800	[thread overview]
Message-ID: <CAHk-=wh5iFv1MOx6r8zyGYkYGfgfxqcPSrUDwfuOCdis+VR+BQ@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wg1uxUTmdEYgTcxWGQ-s6vb_V_Jux+Z+qwoAcVGkCTDYA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

On Fri, Dec 10, 2021 at 1:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We could make a special light-weight version of files_lookup_fd_raw(),
> I guess. We don't need the *whole* "look it up again".  We don't need
> to re-check the array bounds, and we don't need to do the nospec
> lookup - we would have triggered a NULL file pointer if that happened
> the first time around.
>
> So all we'd need to do is "check that fdt is the same, and check that
> fdt->fd[fd] is the same".

This is an ENTIRELY UNTESTED patch to do that.

It basically rewrites __fget_files() from scratch: it really wants to
do the fd array lookup by hand, in order to cache the intermediate fdt
pointer, and in order to cache the intermediate speculation-safe fd
array index etc.

It's not a very complicated function, and rewriting it actually cleans
up the loop to not need the ugly goto.

I made it use a helper wrapper function for the rcu locking, so that
the "meat" of the function can just use plain "return NULL" for the
error cases.

However, not only is it entirely untested, this rewrite also means
that gcc has now decided that the result is so simple and clear that
it will inline it into all the callers.

I guess that's a good sign - writing the code in a way that makes the
compiler say "now it's so trivial that it should be inlined" is
certainly not a bad thing. But it makes it hard to really compare the
asm.

I did try a version with "noinline" just to make it more comparable,
and hey, it all looked sane to me there too.

I added more comments about what is going on.

Again - this is UNTESTED. I've looked at the code, I've looked at the
diff, and I've looked at the code it generates. It all looks fine to
me. But I've looked at it so much that I suspect that I'd be entirely
blind to any completely obvious bug by now.

Comments?

Oliver, does this make any difference in the performance department?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2333 bytes --]

 fs/file.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 53 insertions(+), 16 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ad4a8bf3cf10..70662fb1ab32 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -841,28 +841,65 @@ void do_close_on_exec(struct files_struct *files)
 	spin_unlock(&files->file_lock);
 }
 
+static inline struct file *__fget_files_rcu(struct files_struct *files,
+	unsigned int fd, fmode_t mask, unsigned int refs)
+{
+	for (;;) {
+		struct file *file;
+		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+		struct file __rcu **fdentry;
+
+		if (unlikely(fd >= fdt->max_fds))
+			return NULL;
+
+		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		file = rcu_dereference_raw(*fdentry);
+		if (unlikely(!file))
+			return NULL;
+
+		if (unlikely(file->f_mode & mask))
+			return NULL;
+
+		/*
+		 * Ok, we have a file pointer. However, because we do
+		 * this all locklessly under RCU, we may be racing with
+		 * that file being closed.
+		 *
+		 * Such a race can take two forms:
+		 *
+		 *  (a) the file ref already went down to zero,
+		 *      and get_file_rcu_many() fails. Just try
+		 *      again:
+		 */
+		if (unlikely(!get_file_rcu_many(file, refs)))
+			continue;
+
+		/*
+		 *  (b) the file table entry has changed under us.
+		 *
+		 * If so, we need to put our refs and try again.
+		 */
+		if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
+		    unlikely(rcu_dereference_raw(*fdentry) != file)) {
+			fput_many(file, refs);
+			continue;
+		}
+
+		/*
+		 * Ok, we have a ref to the file, and checked that it
+		 * still exists.
+		 */
+		return file;
+	}
+}
+
 static struct file *__fget_files(struct files_struct *files, unsigned int fd,
 				 fmode_t mask, unsigned int refs)
 {
 	struct file *file;
 
 	rcu_read_lock();
-loop:
-	file = files_lookup_fd_rcu(files, fd);
-	if (file) {
-		/* File object ref couldn't be taken.
-		 * dup2() atomicity guarantee is the reason
-		 * we loop to catch the new file (or NULL pointer)
-		 */
-		if (file->f_mode & mask)
-			file = NULL;
-		else if (!get_file_rcu_many(file, refs))
-			goto loop;
-		else if (files_lookup_fd_raw(files, fd) != file) {
-			fput_many(file, refs);
-			goto loop;
-		}
-	}
+	file = __fget_files_rcu(files, fd, mask, refs);
 	rcu_read_unlock();
 
 	return file;

  reply	other threads:[~2021-12-10 21:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  5:37 [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression kernel test robot
2021-12-10 18:33 ` Linus Torvalds
2021-12-10 20:29   ` Jann Horn
2021-12-10 21:25     ` Linus Torvalds
2021-12-10 21:59       ` Linus Torvalds [this message]
2021-12-10 23:29         ` Jann Horn
2021-12-11  1:01           ` Linus Torvalds
2021-12-11  1:32         ` Linus Torvalds
2021-12-13  8:31         ` [LKP] " Carel Si
2021-12-13 18:37           ` Linus Torvalds
2021-12-13 19:44             ` Linus Torvalds
2021-12-15 12:54               ` Greg KH
2021-12-13 10:57   ` Carel Si

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wh5iFv1MOx6r8zyGYkYGfgfxqcPSrUDwfuOCdis+VR+BQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mszeredi@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).