linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED
@ 2020-12-17 16:19 Jens Axboe
  2020-12-17 16:19 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 16:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro

Hi,

Here's v3 of the LOOKUP_CACHED change. It's exposed as both a flag for
openat2(), and it's used internally by io_uring to speed up (and make more
efficient) the openat/openat2 support there. As posted in the v3 thread,
performance numbers for various levels of the filename lookup already
being cached:

Cached		5.10-git	5.10-git+LOOKUP_CACHED	Speedup
---------------------------------------------------------------
33%		1,014,975	900,474			1.1x
89%		 545,466	292,937			1.9x
100%		 435,636	151,475			2.9x

The more cache hot we are, the faster the inline LOOKUP_CACHED
optimization helps. This is unsurprising and expected, as a thread
offload becomes a more dominant part of the total overhead. If we look
at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in
the dentry cache will yield:

275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0
275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0
275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991
275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1
275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4

which shows a failed nonblock lookup, then punt to worker, and then we
complete with fd == 4. This takes 65 usec in total. Re-running the same
test case again:

281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0
281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0
281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4

shows the same request completing inline, also returning fd == 4. This
takes 6 usec.

Using openat2, we see that an attempted RESOLVE_CACHED open of an uncached
file will fail with -EAGAIN, and a subsequent attempt will too as it's
still not cached. ls the file and retry, and we successfully open it
with RESOLVE_CACHED:

[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: -1
openat2: Resource temporarily unavailable
[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: -1
openat2: Resource temporarily unavailable
[test@archlinux ~]$ ls -al /etc/nanorc
-rw-r--r-- 1 root root 10066 Dec 17 16:15 /etc/nanorc
[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: 3

Minor polish since v3:

- Rename LOOKUP_NONBLOCK -> LOOKUP_CACHED, and ditto for the RESOLVE_
  flag. This better explains what the feature does, making it more self
  explanatory in terms of both code readability and for the user visible
  part.

- Remove dead LOOKUP_NONBLOCK check after we've dropped LOOKUP_RCU
  already, spotted by Al.

- Add O_TMPFILE to the checks upfront, so we can drop the checking in
  do_tmpfile().

-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
@ 2020-12-17 16:19 ` Jens Axboe
  2020-12-26  2:41   ` Jens Axboe
  2020-12-17 16:19 ` [PATCH 2/4] fs: add support for LOOKUP_CACHED Jens Axboe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 16:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro, Jens Axboe

Most callers check for non-zero return, and assume it's -ECHILD (which
it always will be). One caller uses the actual error return. Clean this
up and make it fully consistent, by having unlazy_walk() return a bool
instead. Rename it to try_to_unlazy() and return true on success, and
failure on error. That's easier to read.

No functional changes in this patch.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/namei.c | 43 +++++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 03d0e11e4f36..c31ddddcef3c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -669,17 +669,17 @@ static bool legitimize_root(struct nameidata *nd)
  */
 
 /**
- * unlazy_walk - try to switch to ref-walk mode.
+ * try_to_unlazy - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
- * Returns: 0 on success, -ECHILD on failure
+ * Returns: true on success, false on failure
  *
- * unlazy_walk attempts to legitimize the current nd->path and nd->root
+ * try_to_unlazy attempts to legitimize the current nd->path and nd->root
  * for ref-walk mode.
  * Must be called from rcu-walk context.
- * Nothing should touch nameidata between unlazy_walk() failure and
+ * Nothing should touch nameidata between try_to_unlazy() failure and
  * terminate_walk().
  */
-static int unlazy_walk(struct nameidata *nd)
+static bool try_to_unlazy(struct nameidata *nd)
 {
 	struct dentry *parent = nd->path.dentry;
 
@@ -694,14 +694,14 @@ static int unlazy_walk(struct nameidata *nd)
 		goto out;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
-	return 0;
+	return true;
 
 out1:
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
 	rcu_read_unlock();
-	return -ECHILD;
+	return false;
 }
 
 /**
@@ -792,7 +792,7 @@ static int complete_walk(struct nameidata *nd)
 		 */
 		if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
 			nd->root.mnt = NULL;
-		if (unlikely(unlazy_walk(nd)))
+		if (!try_to_unlazy(nd))
 			return -ECHILD;
 	}
 
@@ -1466,7 +1466,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		unsigned seq;
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
-			if (unlazy_walk(nd))
+			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
@@ -1567,10 +1567,8 @@ static inline int may_lookup(struct nameidata *nd)
 {
 	if (nd->flags & LOOKUP_RCU) {
 		int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
-		if (err != -ECHILD)
+		if (err != -ECHILD || !try_to_unlazy(nd))
 			return err;
-		if (unlazy_walk(nd))
-			return -ECHILD;
 	}
 	return inode_permission(nd->inode, MAY_EXEC);
 }
@@ -1592,7 +1590,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 		// unlazy even if we fail to grab the link - cleanup needs it
 		bool grabbed_link = legitimize_path(nd, link, seq);
 
-		if (unlazy_walk(nd) != 0 || !grabbed_link)
+		if (!try_to_unlazy(nd) != 0 || !grabbed_link)
 			return -ECHILD;
 
 		if (nd_alloc_stack(nd))
@@ -1634,7 +1632,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 		touch_atime(&last->link);
 		cond_resched();
 	} else if (atime_needs_update(&last->link, inode)) {
-		if (unlikely(unlazy_walk(nd)))
+		if (!try_to_unlazy(nd))
 			return ERR_PTR(-ECHILD);
 		touch_atime(&last->link);
 	}
@@ -1651,11 +1649,8 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 		get = inode->i_op->get_link;
 		if (nd->flags & LOOKUP_RCU) {
 			res = get(NULL, inode, &last->done);
-			if (res == ERR_PTR(-ECHILD)) {
-				if (unlikely(unlazy_walk(nd)))
-					return ERR_PTR(-ECHILD);
+			if (res == ERR_PTR(-ECHILD) && try_to_unlazy(nd))
 				res = get(link->dentry, inode, &last->done);
-			}
 		} else {
 			res = get(link->dentry, inode, &last->done);
 		}
@@ -2193,7 +2188,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		}
 		if (unlikely(!d_can_lookup(nd->path.dentry))) {
 			if (nd->flags & LOOKUP_RCU) {
-				if (unlazy_walk(nd))
+				if (!try_to_unlazy(nd))
 					return -ECHILD;
 			}
 			return -ENOTDIR;
@@ -3127,7 +3122,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
-	int error;
 
 	nd->flags |= op->intent;
 
@@ -3151,9 +3145,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	} else {
 		/* create side of things */
 		if (nd->flags & LOOKUP_RCU) {
-			error = unlazy_walk(nd);
-			if (unlikely(error))
-				return ERR_PTR(error);
+			if (!try_to_unlazy(nd))
+				return ERR_PTR(-ECHILD);
 		}
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
@@ -3162,9 +3155,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
-		error = mnt_want_write(nd->path.mnt);
-		if (!error)
-			got_write = true;
+		got_write = !mnt_want_write(nd->path.mnt);
 		/*
 		 * do _not_ fail yet - we might not need that or fail with
 		 * a different error; let lookup_open() decide; we'll be
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] fs: add support for LOOKUP_CACHED
  2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
  2020-12-17 16:19 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
@ 2020-12-17 16:19 ` Jens Axboe
  2020-12-17 16:19 ` [PATCH 3/4] fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED Jens Axboe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 16:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro, Jens Axboe

io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU based lookups, which we know will not block. If
we can do a cached path resolution of the filename, then we don't have
to always punt lookups for a worker.

During path resolution, we always do LOOKUP_RCU first. If that fails and
we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/namei.c            | 9 +++++++++
 include/linux/namei.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index c31ddddcef3c..e051eb0fa7f9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -686,6 +686,8 @@ static bool try_to_unlazy(struct nameidata *nd)
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
 	nd->flags &= ~LOOKUP_RCU;
+	if (nd->flags & LOOKUP_CACHED)
+		goto out1;
 	if (unlikely(!legitimize_links(nd)))
 		goto out1;
 	if (unlikely(!legitimize_path(nd, &nd->path, nd->seq)))
@@ -722,6 +724,8 @@ static int unlazy_child(struct nameidata *nd, struct dentry *dentry, unsigned se
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
 	nd->flags &= ~LOOKUP_RCU;
+	if (nd->flags & LOOKUP_CACHED)
+		goto out2;
 	if (unlikely(!legitimize_links(nd)))
 		goto out2;
 	if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq)))
@@ -792,6 +796,7 @@ static int complete_walk(struct nameidata *nd)
 		 */
 		if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
 			nd->root.mnt = NULL;
+		nd->flags &= ~LOOKUP_CACHED;
 		if (!try_to_unlazy(nd))
 			return -ECHILD;
 	}
@@ -2202,6 +2207,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	int error;
 	const char *s = nd->name->name;
 
+	/* LOOKUP_CACHED requires RCU, ask caller to retry */
+	if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED)
+		return ERR_PTR(-EAGAIN);
+
 	if (!*s)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a4bb992623c4..b9605b2b46e7 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -46,6 +46,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_NO_XDEV		0x040000 /* No mountpoint crossing. */
 #define LOOKUP_BENEATH		0x080000 /* No escaping from starting point. */
 #define LOOKUP_IN_ROOT		0x100000 /* Treat dirfd as fs root. */
+#define LOOKUP_CACHED		0x200000 /* Only do cached lookup */
 /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
 #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED
  2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
  2020-12-17 16:19 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
  2020-12-17 16:19 ` [PATCH 2/4] fs: add support for LOOKUP_CACHED Jens Axboe
@ 2020-12-17 16:19 ` Jens Axboe
  2020-12-17 16:19 ` [PATCH 4/4] io_uring: enable LOOKUP_CACHED path resolution for filename lookups Jens Axboe
  2020-12-17 18:09 ` [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Linus Torvalds
  4 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 16:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro, Jens Axboe

Now that we support non-blocking path resolution internally, expose it
via openat2() in the struct open_how ->resolve flags. This allows
applications using openat2() to limit path resolution to the extent that
it is already cached.

If the lookup cannot be satisfied in a non-blocking manner, openat2(2)
will return -1/-EAGAIN.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/open.c                    | 6 ++++++
 include/linux/fcntl.h        | 2 +-
 include/uapi/linux/openat2.h | 4 ++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 1e06e443a565..ca5444733acd 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1091,6 +1091,12 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 		lookup_flags |= LOOKUP_BENEATH;
 	if (how->resolve & RESOLVE_IN_ROOT)
 		lookup_flags |= LOOKUP_IN_ROOT;
+	if (how->resolve & RESOLVE_CACHED) {
+		/* Don't bother even trying for create/truncate/tmpfile open */
+		if (flags & (O_TRUNC | O_CREAT | O_TMPFILE))
+			return -EAGAIN;
+		lookup_flags |= LOOKUP_CACHED;
+	}
 
 	op->lookup_flags = lookup_flags;
 	return 0;
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index 921e750843e6..766fcd973beb 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -19,7 +19,7 @@
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
 	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
-	 RESOLVE_BENEATH | RESOLVE_IN_ROOT)
+	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_CACHED)
 
 /* List of all open_how "versions". */
 #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index 58b1eb711360..a5feb7604948 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -35,5 +35,9 @@ struct open_how {
 #define RESOLVE_IN_ROOT		0x10 /* Make all jumps to "/" and ".."
 					be scoped inside the dirfd
 					(similar to chroot(2)). */
+#define RESOLVE_CACHED		0x20 /* Only complete if resolution can be
+					completed through cached lookup. May
+					return -EAGAIN if that's not
+					possible. */
 
 #endif /* _UAPI_LINUX_OPENAT2_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] io_uring: enable LOOKUP_CACHED path resolution for filename lookups
  2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
                   ` (2 preceding siblings ...)
  2020-12-17 16:19 ` [PATCH 3/4] fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED Jens Axboe
@ 2020-12-17 16:19 ` Jens Axboe
  2020-12-17 18:09 ` [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Linus Torvalds
  4 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 16:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro, Jens Axboe

Instead of being pessimistic and assume that path lookup will block, use
LOOKUP_CACHED to attempt just a cached lookup. This ensures that the
fast path is always done inline, and we only punt to async context if
IO is needed to satisfy the lookup.

For forced nonblock open attempts, mark the file O_NONBLOCK over the
actual ->open() call as well. We can safely clear this again before
doing fd_install(), so it'll never be user visible that we fiddled with
it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6f9392c35eef..5a703c8a4521 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -487,7 +487,6 @@ struct io_sr_msg {
 struct io_open {
 	struct file			*file;
 	int				dfd;
-	bool				ignore_nonblock;
 	struct filename			*filename;
 	struct open_how			how;
 	unsigned long			nofile;
@@ -3996,7 +3995,6 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 		return ret;
 	}
 	req->open.nofile = rlimit(RLIMIT_NOFILE);
-	req->open.ignore_nonblock = false;
 	req->flags |= REQ_F_NEED_CLEANUP;
 	return 0;
 }
@@ -4038,39 +4036,48 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 {
 	struct open_flags op;
 	struct file *file;
+	bool nonblock_set;
+	bool resolve_nonblock;
 	int ret;
 
-	if (force_nonblock && !req->open.ignore_nonblock)
-		return -EAGAIN;
-
 	ret = build_open_flags(&req->open.how, &op);
 	if (ret)
 		goto err;
+	nonblock_set = op.open_flag & O_NONBLOCK;
+	resolve_nonblock = req->open.how.resolve & RESOLVE_CACHED;
+	if (force_nonblock) {
+		/*
+		 * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open,
+		 * it'll always -EAGAIN
+		 */
+		if (req->open.how.flags & (O_TRUNC | O_CREAT | O_TMPFILE))
+			return -EAGAIN;
+		op.lookup_flags |= LOOKUP_CACHED;
+		op.open_flag |= O_NONBLOCK;
+	}
 
 	ret = __get_unused_fd_flags(req->open.how.flags, req->open.nofile);
 	if (ret < 0)
 		goto err;
 
 	file = do_filp_open(req->open.dfd, req->open.filename, &op);
+	/* only retry if RESOLVE_CACHED wasn't already set by application */
+	if ((!resolve_nonblock && force_nonblock) && file == ERR_PTR(-EAGAIN)) {
+		/*
+		 * We could hang on to this 'fd', but seems like marginal
+		 * gain for something that is now known to be a slower path.
+		 * So just put it, and we'll get a new one when we retry.
+		 */
+		put_unused_fd(ret);
+		return -EAGAIN;
+	}
+
 	if (IS_ERR(file)) {
 		put_unused_fd(ret);
 		ret = PTR_ERR(file);
-		/*
-		 * A work-around to ensure that /proc/self works that way
-		 * that it should - if we get -EOPNOTSUPP back, then assume
-		 * that proc_self_get_link() failed us because we're in async
-		 * context. We should be safe to retry this from the task
-		 * itself with force_nonblock == false set, as it should not
-		 * block on lookup. Would be nice to know this upfront and
-		 * avoid the async dance, but doesn't seem feasible.
-		 */
-		if (ret == -EOPNOTSUPP && io_wq_current_is_worker()) {
-			req->open.ignore_nonblock = true;
-			refcount_inc(&req->refs);
-			io_req_task_queue(req);
-			return 0;
-		}
 	} else {
+		if (force_nonblock && !nonblock_set)
+			file->f_flags &= ~O_NONBLOCK;
 		fsnotify_open(file);
 		fd_install(ret, file);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED
  2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
                   ` (3 preceding siblings ...)
  2020-12-17 16:19 ` [PATCH 4/4] io_uring: enable LOOKUP_CACHED path resolution for filename lookups Jens Axboe
@ 2020-12-17 18:09 ` Linus Torvalds
  2020-12-17 19:23   ` Jens Axboe
  4 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2020-12-17 18:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Al Viro

On Thu, Dec 17, 2020 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
> [..]
> which shows a failed nonblock lookup, then punt to worker, and then we
> complete with fd == 4. This takes 65 usec in total. Re-running the same
> test case again:
> [..]
> shows the same request completing inline, also returning fd == 4. This
> takes 6 usec.

I think this example needs to be part of the git history - either move
it into the individual commits, or we just need to make sure it
doesn't get lost as a cover letter and ends up part of the merge
message.

Despite having seen the patch series so many times now, I'm still just
really impressed by how small and neat it is, considering the above
numbers, and considering just how problematic this case was
historically (ie I remember all the discussions we had about
nonblocking opens back in the days).

So I continue to go "this is the RightWay(tm)" just from that.

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED
  2020-12-17 18:09 ` [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Linus Torvalds
@ 2020-12-17 19:23   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-17 19:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Al Viro

On 12/17/20 11:09 AM, Linus Torvalds wrote:
> On Thu, Dec 17, 2020 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
>> [..]
>> which shows a failed nonblock lookup, then punt to worker, and then we
>> complete with fd == 4. This takes 65 usec in total. Re-running the same
>> test case again:
>> [..]
>> shows the same request completing inline, also returning fd == 4. This
>> takes 6 usec.
> 
> I think this example needs to be part of the git history - either move
> it into the individual commits, or we just need to make sure it
> doesn't get lost as a cover letter and ends up part of the merge
> message.

I agree, and I think the best approach will depend a bit on whether
Al takes the core bits, and then I'm left with just the io_uring part,
or if the series goes in as a whole. In any case, I'll make sure it
gets in as either the merge message, or (at the least) in the io_uring
enablement of it.

> Despite having seen the patch series so many times now, I'm still just
> really impressed by how small and neat it is, considering the above
> numbers, and considering just how problematic this case was
> historically (ie I remember all the discussions we had about
> nonblocking opens back in the days).
> 
> So I continue to go "this is the RightWay(tm)" just from that.

Well, that's largely thanks to your and Al's feedback. I did attempt
this one time earlier, and the break through just wasn't there in
terms of condensing it properly. I'm very happy with it now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-17 16:19 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
@ 2020-12-26  2:41   ` Jens Axboe
  2020-12-26  4:50     ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2020-12-26  2:41 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro

On 12/17/20 9:19 AM, Jens Axboe wrote:
> Most callers check for non-zero return, and assume it's -ECHILD (which
> it always will be). One caller uses the actual error return. Clean this
> up and make it fully consistent, by having unlazy_walk() return a bool
> instead. Rename it to try_to_unlazy() and return true on success, and
> failure on error. That's easier to read.

Al, were you planning on queuing this one up for 5.11 still? I'm fine
with holding for 5.12 as well, would just like to know what your plans
are. Latter goes for the whole series too, fwiw.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-26  2:41   ` Jens Axboe
@ 2020-12-26  4:50     ` Al Viro
  2020-12-26 17:33       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2020-12-26  4:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, torvalds

On Fri, Dec 25, 2020 at 07:41:17PM -0700, Jens Axboe wrote:
> On 12/17/20 9:19 AM, Jens Axboe wrote:
> > Most callers check for non-zero return, and assume it's -ECHILD (which
> > it always will be). One caller uses the actual error return. Clean this
> > up and make it fully consistent, by having unlazy_walk() return a bool
> > instead. Rename it to try_to_unlazy() and return true on success, and
> > failure on error. That's easier to read.
> 
> Al, were you planning on queuing this one up for 5.11 still? I'm fine
> with holding for 5.12 as well, would just like to know what your plans
> are. Latter goes for the whole series too, fwiw.

Seeing that it has not sat in -next at all, what I'm going to do is
to put it into 5.11-rc1-based branch.  It's really been too late for
something like that for this cycle and IME a topic branch started
before the merges for previous cycle are over is too likely to require
backmerges, if not outright rebases.  So let's branch it at -rc1 and
it'll go into #for-next from the very beginning.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-26  4:50     ` Al Viro
@ 2020-12-26 17:33       ` Jens Axboe
  2020-12-26 17:58         ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2020-12-26 17:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, torvalds

On 12/25/20 9:50 PM, Al Viro wrote:
> On Fri, Dec 25, 2020 at 07:41:17PM -0700, Jens Axboe wrote:
>> On 12/17/20 9:19 AM, Jens Axboe wrote:
>>> Most callers check for non-zero return, and assume it's -ECHILD (which
>>> it always will be). One caller uses the actual error return. Clean this
>>> up and make it fully consistent, by having unlazy_walk() return a bool
>>> instead. Rename it to try_to_unlazy() and return true on success, and
>>> failure on error. That's easier to read.
>>
>> Al, were you planning on queuing this one up for 5.11 still? I'm fine
>> with holding for 5.12 as well, would just like to know what your plans
>> are. Latter goes for the whole series too, fwiw.
> 
> Seeing that it has not sat in -next at all, what I'm going to do is
> to put it into 5.11-rc1-based branch.  It's really been too late for
> something like that for this cycle and IME a topic branch started
> before the merges for previous cycle are over is too likely to require
> backmerges, if not outright rebases.  So let's branch it at -rc1 and
> it'll go into #for-next from the very beginning.

That sounds fine, thanks Al. Will you queue up 1-3 in a stable branch?
Then I can just pull that into mine for the io_uring bits. I'll also
then post the stat part of this too, for separate review.

BTW, I wrote a man page addition for openat2(2) for RESOLVE_CACHED:

commit 3b381a6bc50da79eee6a3a1da330480d0cb46302
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Dec 17 14:15:15 2020 -0700

    man2/openat2.2: add RESOLVE_CACHED
    
    RESOLVE_CACHED allows an application to attempt a cache-only open
    of a file. If this isn't possible, the request will fail with
    -1/EAGAIN and the caller should retry without RESOLVE_CACHED set.
    This will generally happen from a different context, where a slower
    open operation can be performed.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/man2/openat2.2 b/man2/openat2.2
index 3bda20620574..f5af3eee2cda 100644
--- a/man2/openat2.2
+++ b/man2/openat2.2
@@ -385,6 +385,17 @@ This may occur if, for example,
 a system pathname that is used by an application is modified
 (e.g., in a new distribution release)
 so that a pathname component (now) contains a bind mount.
+.TP
+.B RESOLVE_CACHED
+Make the open operation fail unless all path components are already present
+in the kernels lookup cache.
+If any kind of revalidation or IO is needed to satisfy the lookup,
+.BR openat2 ()
+fails with the error
+.B EAGAIN.
+This is useful in providing a fast path open that can be performed without
+resorting to thread offload, or other mechanism that an application might
+use to offload slower operations.
 .RE
 .IP
 If any bits other than those listed above are set in
@@ -421,6 +432,14 @@ The caller may choose to retry the
 .BR openat2 ()
 call.
 .TP
+.B EAGAIN
+.BR RESOLVE_CACHED
+was set, and the open operation cannot be performed cached.
+The caller should retry without
+.B RESOLVE_CACHED
+set in
+.I how.resolve
+.TP
 .B EINVAL
 An unknown flag or invalid value was specified in
 .IR how .

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-26 17:33       ` Jens Axboe
@ 2020-12-26 17:58         ` Matthew Wilcox
  2020-12-26 18:24           ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2020-12-26 17:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Al Viro, linux-fsdevel, torvalds

On Sat, Dec 26, 2020 at 10:33:25AM -0700, Jens Axboe wrote:
> +.TP
> +.B RESOLVE_CACHED
> +Make the open operation fail unless all path components are already present
> +in the kernels lookup cache.
> +If any kind of revalidation or IO is needed to satisfy the lookup,

Usually spelled I/O in manpages.

> +.BR openat2 ()
> +fails with the error
> +.B EAGAIN.
> +This is useful in providing a fast path open that can be performed without
> +resorting to thread offload, or other mechanism that an application might
> +use to offload slower operations.

That almost reads backwards ... how about this?

This provides a fast path open that can be used when an application does
not wish to block.  It allows the application to hand off the lookup to
a separate thread which can block.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-26 17:58         ` Matthew Wilcox
@ 2020-12-26 18:24           ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-26 18:24 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Al Viro, linux-fsdevel, torvalds

On 12/26/20 10:58 AM, Matthew Wilcox wrote:
> On Sat, Dec 26, 2020 at 10:33:25AM -0700, Jens Axboe wrote:
>> +.TP
>> +.B RESOLVE_CACHED
>> +Make the open operation fail unless all path components are already present
>> +in the kernels lookup cache.
>> +If any kind of revalidation or IO is needed to satisfy the lookup,
> 
> Usually spelled I/O in manpages.

Changed, thanks!

>> +.BR openat2 ()
>> +fails with the error
>> +.B EAGAIN.
>> +This is useful in providing a fast path open that can be performed without
>> +resorting to thread offload, or other mechanism that an application might
>> +use to offload slower operations.
> 
> That almost reads backwards ... how about this?
> 
> This provides a fast path open that can be used when an application does
> not wish to block.  It allows the application to hand off the lookup to
> a separate thread which can block.

I deliberately did not want to include anything about blocking, would
prefer to just keep it about having the necessary data/state cached.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] fs: make unlazy_walk() error handling consistent
  2020-12-14 19:13 [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Jens Axboe
@ 2020-12-14 19:13 ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2020-12-14 19:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: torvalds, viro, Jens Axboe

Most callers check for non-zero return, and assume it's -ECHILD (which
it always will be). One caller uses the actual error return. Clean this
up and make it fully consistent, by having unlazy_walk() return a bool
instead. Rename it to try_to_unlazy() and return true on success, and
failure on error. That's easier to read.

No functional changes in this patch.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/namei.c | 43 +++++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 03d0e11e4f36..7eb7830da298 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -669,17 +669,17 @@ static bool legitimize_root(struct nameidata *nd)
  */
 
 /**
- * unlazy_walk - try to switch to ref-walk mode.
+ * try_to_unlazy - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
- * Returns: 0 on success, -ECHILD on failure
+ * Returns: true on success, false on failure
  *
- * unlazy_walk attempts to legitimize the current nd->path and nd->root
+ * try_to_unlazyk attempts to legitimize the current nd->path and nd->root
  * for ref-walk mode.
  * Must be called from rcu-walk context.
- * Nothing should touch nameidata between unlazy_walk() failure and
+ * Nothing should touch nameidata between try_to_unlazy() failure and
  * terminate_walk().
  */
-static int unlazy_walk(struct nameidata *nd)
+static bool try_to_unlazy(struct nameidata *nd)
 {
 	struct dentry *parent = nd->path.dentry;
 
@@ -694,14 +694,14 @@ static int unlazy_walk(struct nameidata *nd)
 		goto out;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
-	return 0;
+	return true;
 
 out1:
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
 	rcu_read_unlock();
-	return -ECHILD;
+	return false;
 }
 
 /**
@@ -792,7 +792,7 @@ static int complete_walk(struct nameidata *nd)
 		 */
 		if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
 			nd->root.mnt = NULL;
-		if (unlikely(unlazy_walk(nd)))
+		if (!try_to_unlazy(nd))
 			return -ECHILD;
 	}
 
@@ -1466,7 +1466,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		unsigned seq;
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
-			if (unlazy_walk(nd))
+			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
@@ -1567,10 +1567,8 @@ static inline int may_lookup(struct nameidata *nd)
 {
 	if (nd->flags & LOOKUP_RCU) {
 		int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
-		if (err != -ECHILD)
+		if (err != -ECHILD || !try_to_unlazy(nd))
 			return err;
-		if (unlazy_walk(nd))
-			return -ECHILD;
 	}
 	return inode_permission(nd->inode, MAY_EXEC);
 }
@@ -1592,7 +1590,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 		// unlazy even if we fail to grab the link - cleanup needs it
 		bool grabbed_link = legitimize_path(nd, link, seq);
 
-		if (unlazy_walk(nd) != 0 || !grabbed_link)
+		if (!try_to_unlazy(nd) != 0 || !grabbed_link)
 			return -ECHILD;
 
 		if (nd_alloc_stack(nd))
@@ -1634,7 +1632,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 		touch_atime(&last->link);
 		cond_resched();
 	} else if (atime_needs_update(&last->link, inode)) {
-		if (unlikely(unlazy_walk(nd)))
+		if (!try_to_unlazy(nd))
 			return ERR_PTR(-ECHILD);
 		touch_atime(&last->link);
 	}
@@ -1651,11 +1649,8 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 		get = inode->i_op->get_link;
 		if (nd->flags & LOOKUP_RCU) {
 			res = get(NULL, inode, &last->done);
-			if (res == ERR_PTR(-ECHILD)) {
-				if (unlikely(unlazy_walk(nd)))
-					return ERR_PTR(-ECHILD);
+			if (res == ERR_PTR(-ECHILD) && try_to_unlazy(nd))
 				res = get(link->dentry, inode, &last->done);
-			}
 		} else {
 			res = get(link->dentry, inode, &last->done);
 		}
@@ -2193,7 +2188,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		}
 		if (unlikely(!d_can_lookup(nd->path.dentry))) {
 			if (nd->flags & LOOKUP_RCU) {
-				if (unlazy_walk(nd))
+				if (!try_to_unlazy(nd))
 					return -ECHILD;
 			}
 			return -ENOTDIR;
@@ -3127,7 +3122,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
-	int error;
 
 	nd->flags |= op->intent;
 
@@ -3151,9 +3145,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	} else {
 		/* create side of things */
 		if (nd->flags & LOOKUP_RCU) {
-			error = unlazy_walk(nd);
-			if (unlikely(error))
-				return ERR_PTR(error);
+			if (!try_to_unlazy(nd))
+				return ERR_PTR(-ECHILD);
 		}
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
@@ -3162,9 +3155,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
-		error = mnt_want_write(nd->path.mnt);
-		if (!error)
-			got_write = true;
+		got_write = !mnt_want_write(nd->path.mnt);
 		/*
 		 * do _not_ fail yet - we might not need that or fail with
 		 * a different error; let lookup_open() decide; we'll be
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-26 18:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 16:19 [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Jens Axboe
2020-12-17 16:19 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe
2020-12-26  2:41   ` Jens Axboe
2020-12-26  4:50     ` Al Viro
2020-12-26 17:33       ` Jens Axboe
2020-12-26 17:58         ` Matthew Wilcox
2020-12-26 18:24           ` Jens Axboe
2020-12-17 16:19 ` [PATCH 2/4] fs: add support for LOOKUP_CACHED Jens Axboe
2020-12-17 16:19 ` [PATCH 3/4] fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED Jens Axboe
2020-12-17 16:19 ` [PATCH 4/4] io_uring: enable LOOKUP_CACHED path resolution for filename lookups Jens Axboe
2020-12-17 18:09 ` [PATCHSET 0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED Linus Torvalds
2020-12-17 19:23   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2020-12-14 19:13 [PATCHSET v3 0/4] fs: Support for LOOKUP_NONBLOCK / RESOLVE_NONBLOCK Jens Axboe
2020-12-14 19:13 ` [PATCH 1/4] fs: make unlazy_walk() error handling consistent Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).