All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: brauner@kernel.org
Cc: viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org,
	Mateusz Guzik <mjguzik@gmail.com>
Subject: [PATCH v2] vfs: shave work on failed file open
Date: Tue, 26 Sep 2023 18:22:28 +0200	[thread overview]
Message-ID: <20230926162228.68666-1-mjguzik@gmail.com> (raw)

Failed opens (mostly ENOENT) legitimately happen a lot, for example here
are stats from stracing kernel build for few seconds (strace -fc make):

  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ------------------
    0.76    0.076233           5     15040      3688 openat

(this is tons of header files tried in different paths)

In the common case of there being nothing to close (only the file object
to free) there is a lot of overhead which can be avoided.

This is most notably delegation of freeing to task_work, which comes
with an enormous cost (see 021a160abf62 ("fs: use __fput_sync in
close(2)" for an example).

Benchmarked with will-it-scale with a custom testcase based on
tests/open1.c, stuffed into tests/openneg.c:
[snip]
        while (1) {
                int fd = open("/tmp/nonexistent", O_RDONLY);
                assert(fd == -1);

                (*iterations)++;
        }
[/snip]

Sapphire Rapids, openneg_processes -t 1 (ops/s):
before:	1950013
after:	2914973 (+49%)

file refcount is checked as a safety belt against buggy consumers with
an atomic cmpxchg. Technically it is not necessary, but it happens to
not be measurable due to several other atomics which immediately follow.
Optmizing them away to make this atomic into a problem is left as an
exercise for the reader.

v2:
- unexport fput_badopen and move to fs/internal.h
- handle the refcount with cmpxchg, adjust commentary accordingly
- tweak the commit message

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 fs/file_table.c | 35 +++++++++++++++++++++++++++++++++++
 fs/internal.h   |  2 ++
 fs/namei.c      |  2 +-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ee21b3da9d08..6cbd5bc551d0 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -82,6 +82,16 @@ static inline void file_free(struct file *f)
 	call_rcu(&f->f_rcuhead, file_free_rcu);
 }
 
+static inline void file_free_badopen(struct file *f)
+{
+	BUG_ON(f->f_mode & (FMODE_BACKING | FMODE_OPENED));
+	security_file_free(f);
+	put_cred(f->f_cred);
+	if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
+		percpu_counter_dec(&nr_files);
+	kmem_cache_free(filp_cachep, f);
+}
+
 /*
  * Return the total number of open files in the system
  */
@@ -468,6 +478,31 @@ void __fput_sync(struct file *file)
 EXPORT_SYMBOL(fput);
 EXPORT_SYMBOL(__fput_sync);
 
+/*
+ * Clean up after failing to open (e.g., open(2) returns with -ENOENT).
+ *
+ * This represents opportunities to shave on work in the common case of
+ * FMODE_OPENED not being set:
+ * 1. there is nothing to close, just the file object to free and consequently
+ *    no need to delegate to task_work
+ * 2. as nobody else had seen the file then there is no need to delegate
+ *    freeing to RCU
+ */
+void fput_badopen(struct file *file)
+{
+	if (unlikely(file->f_mode & (FMODE_BACKING | FMODE_OPENED))) {
+		fput(file);
+		return;
+	}
+
+	if (WARN_ON_ONCE(atomic_long_cmpxchg(&file->f_count, 1, 0) != 1)) {
+		fput(file);
+		return;
+	}
+
+	file_free_badopen(file);
+}
+
 void __init files_init(void)
 {
 	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
diff --git a/fs/internal.h b/fs/internal.h
index d64ae03998cc..93da6d815e90 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -95,6 +95,8 @@ struct file *alloc_empty_file(int flags, const struct cred *cred);
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
 struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
 
+void fput_badopen(struct file *);
+
 static inline void put_file_access(struct file *file)
 {
 	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) {
diff --git a/fs/namei.c b/fs/namei.c
index 567ee547492b..67579fe30b28 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3802,7 +3802,7 @@ static struct file *path_openat(struct nameidata *nd,
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	fput(file);
+	fput_badopen(file);
 	if (error == -EOPENSTALE) {
 		if (flags & LOOKUP_RCU)
 			error = -ECHILD;
-- 
2.39.2


             reply	other threads:[~2023-09-26 16:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 16:22 Mateusz Guzik [this message]
2023-09-26 19:00 ` [PATCH v2] vfs: shave work on failed file open Linus Torvalds
2023-09-26 19:28   ` Mateusz Guzik
2023-09-27 14:09     ` Christian Brauner
2023-09-27 14:34       ` Mateusz Guzik
2023-09-27 17:48       ` Linus Torvalds
2023-09-27 17:56         ` Mateusz Guzik
2023-09-27 18:05           ` Linus Torvalds
2023-09-27 18:32             ` Mateusz Guzik
2023-09-27 20:27               ` Linus Torvalds
2023-09-27 21:06                 ` Mateusz Guzik
2023-09-27 21:18                   ` Linus Torvalds
2023-09-27 21:30                     ` Mateusz Guzik
2023-09-28 13:25                 ` Christian Brauner
2023-09-28 14:05                   ` Christian Brauner
2023-09-28 14:43                     ` Jann Horn
2023-09-28 17:21                       ` Linus Torvalds
2023-09-29  9:20                         ` Christian Brauner
2023-09-29 13:31                           ` Jann Horn
2023-09-29 19:57                             ` Christian Brauner
2023-09-29 21:23                               ` Mateusz Guzik
2023-09-29 21:39                                 ` Mateusz Guzik
2023-09-29 23:57                                   ` Linus Torvalds
2023-09-30  9:04                                     ` Christian Brauner
2023-10-03 16:45                                       ` Nathan Chancellor
2023-10-03 16:45                                         ` Nathan Chancellor
2023-10-10  3:06                                       ` Al Viro
2023-10-10  8:29                                         ` Christian Brauner
2023-09-29 22:24                                 ` Matthew Wilcox
2023-09-29 23:02                                   ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230926162228.68666-1-mjguzik@gmail.com \
    --to=mjguzik@gmail.com \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.