archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <>
To: kernel test robot <>
Cc: Jann Horn <>,
	Miklos Szeredi <>,
	LKML <>,, kernel test robot <>,
	"Huang, Ying" <>,
	Feng Tang <>,
	Zhengjun Xing <>,
Subject: Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression
Date: Fri, 10 Dec 2021 10:33:43 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20211210053743.GA36420@xsang-OptiPlex-9020>

[-- Attachment #1: Type: text/plain, Size: 3751 bytes --]

On Thu, Dec 9, 2021 at 9:38 PM kernel test robot <> wrote:
> FYI, we noticed a -5.7% regression of will-it-scale.per_thread_ops due to commit:
> 054aa8d439b9  ("fget: check that the fd still exists after getting a ref to it")

Well, some downside of the new checks was expected, that's just much
more than I really like or would have thought.

But it's exactly where you'd expect:

>      27.16 ± 10%      +4.3       31.51 ±  2%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      22.91 ± 10%      +4.4       27.34 ±  2%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
>      26.33 ± 10%      +4.4       30.70 ±  2%  perf-profile.children.cycles-pp.__fget_light
>      22.92 ± 10%      +4.4       27.35 ±  2%  perf-profile.children.cycles-pp.__fget_files
>      22.70 ± 10%      +4.4       27.11 ±  2%  perf-profile.self.cycles-pp.__fget_files

although there's odd spikes in dTLB-loads etc.

I checked whether it's some unexpected code generation issue, but the
new "re-check file table after refcount update" really looks very
cheap when I look at what gcc generates, there's nothing really
unexpected there.

What did change was:

 (a) some branches go other ways, which might well affect branch
prediction and just be unlucky. It might be that just marking the
mismatch case "unlikely()" will help.

 (b) the obvious few new instructions (re-load and check file table
pointer, re-load and check file pointer)

 (c) that __fget_files() function is now no longer a leaf function in
a simple config case, since it calls "fput_many" in the error case.

And that (c) is worth mentioning simply because it means that the
function goes from not having any stack frame at all, to having to
save/restore four registers. So now it has the usual push/pop

It may also be that the test-case actually does a lot of threaded
open/close/poll, and either actually triggers the re-lookup looping
case (unlikely) or just sees a lot of cacheline bouncing that now got
worse due to the re-check of the file pointer.

So this regression looks real, and the issue seems to be that
__fget_files() really is _that_ important for this do_sys_poll()
benchmark, and even just the handful of extra instructions end up
being meaningful.

Oliver - I'm attaching the obvious "unlikely9)" oneliner in case it's
just "gcc thought the retry loop was the common case" issue and bad
branch prediction.

And it would perhaps be interesting to get an actual instruction-level
profile of that __fget_files() thing for that benchmark, if that
pinpoints exactly what is going on and in case that would be easy to
get on that machine.

Because it might just be truly horrendously bad luck, with the 32-byte
stack frame meaning that the kernel stack goes one more page down
(just jhandwaving from the dTLB number spike), and this all being just
random bad luck on that particular benchmark.

Of course, the thing about poll() is that for that case, we *don't*
really need the "re-check the file descriptor" code at all, since the
resulting fd isn't going to be installed as a new fd, and it doesn't
matter for the socket garbage collector logic.

So maybe it was a mistake to put that re-check in the generic fdget()
code - yes, it should be cheap, but it's also some of the most hot
code in the kernel on some loads.

But if we move it elsewhere, we'd need to come up with some list of
"these cases need it". Some are obvious: dup, dup2, unix domain file
passing. It's the non-obvious ones I'd worry about.



[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 512 bytes --]

 fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/file.c b/fs/file.c
index ad4a8bf3cf10..f802360e240d 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -858,7 +858,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd,
 			file = NULL;
 		else if (!get_file_rcu_many(file, refs))
 			goto loop;
-		else if (files_lookup_fd_raw(files, fd) != file) {
+		else if (unlikely(files_lookup_fd_raw(files, fd) != file)) {
 			fput_many(file, refs);
 			goto loop;

  reply	other threads:[~2021-12-10 18:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  5:37 [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression kernel test robot
2021-12-10 18:33 ` Linus Torvalds [this message]
2021-12-10 20:29   ` Jann Horn
2021-12-10 21:25     ` Linus Torvalds
2021-12-10 21:59       ` Linus Torvalds
2021-12-10 23:29         ` Jann Horn
2021-12-11  1:01           ` Linus Torvalds
2021-12-11  1:32         ` Linus Torvalds
2021-12-13  8:31         ` [LKP] " Carel Si
2021-12-13 18:37           ` Linus Torvalds
2021-12-13 19:44             ` Linus Torvalds
2021-12-15 12:54               ` Greg KH
2021-12-13 10:57   ` Carel Si

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).