linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 43/45] namei: initialize parameters passed to step_into()
       [not found] <20220701142310.2188015-1-glider@google.com>
@ 2022-07-01 14:23 ` Alexander Potapenko
  2022-07-02 17:23   ` Linus Torvalds
  2022-08-08 16:37   ` Alexander Potapenko
  0 siblings, 2 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-01 14:23 UTC (permalink / raw)
  To: glider
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, linux-mm, linux-arch,
	linux-kernel, Evgenii Stepanov, Linus Torvalds,
	Nathan Chancellor, Nick Desaulniers, Segher Boessenkool,
	Vitaly Buka, linux-toolchains

Under certain circumstances initialization of `unsigned seq` and
`struct inode *inode` passed into step_into() may be skipped.
In particular, if the call to lookup_fast() in walk_component()
returns NULL, and lookup_slow() returns a valid dentry, then the
`seq` and `inode` will remain uninitialized until the call to
step_into() (see [1] for more info).

Right now step_into() does not use these uninitialized values,
yet passing uninitialized values to functions is considered undefined
behavior (see [2]). To fix that, we initialize `seq` and `inode` at
definition.

[1] https://github.com/ClangBuiltLinux/linux/issues/1648#issuecomment-1146608063
[2] https://lore.kernel.org/linux-toolchains/CAHk-=whjz3wO8zD+itoerphWem+JZz4uS3myf6u1Wd6epGRgmQ@mail.gmail.com/

Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Buka <vitalybuka@google.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-toolchains@vger.kernel.org
Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I94d4e8cc1f0ecc7174659e9506ce96aaf2201d0a
---
 fs/namei.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3b..6b39dfd3b41bc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1995,8 +1995,8 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
+	struct inode *inode = NULL;
+	unsigned seq = 0;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -3393,8 +3393,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
+	unsigned seq = 0;
+	struct inode *inode = NULL;
 	struct dentry *dentry;
 	const char *res;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
@ 2022-07-02 17:23   ` Linus Torvalds
  2022-07-03  3:59     ` Al Viro
  2022-07-04  2:52     ` Al Viro
  2022-08-08 16:37   ` Alexander Potapenko
  1 sibling, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2022-07-02 17:23 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Fri, Jul 1, 2022 at 7:25 AM Alexander Potapenko <glider@google.com> wrote:
>
> Under certain circumstances initialization of `unsigned seq` and
> `struct inode *inode` passed into step_into() may be skipped.
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into() (see [1] for more info).

So while I think this needs to be fixed, I think I'd really prefer to
make the initialization and/or usage rules stricter or at least
clearer.

For example, looking around, I think "handle_dotdot()" has the exact
same kind of issue, where follow_dotdot[_rcu|() doesn't initialize
seq/inode for certain cases, and it's *really* hard to see exactly
what the rules are.

It turns out that the rules are that seq/inode only get initialized if
these routines return a non-NULL and non-error result.

Now, that is true for all of these cases - both follow_dotdot*() and
lookup_fast(). Possibly others.

But the reason follow_dotdot*() doesn't cause the same issue is that
the caller actually does the checks that avoid it, and doesn't pass
down the uninitialized cases.

Now, the other part of the rule is that they only get _used_ for
LOOKUP_RCU cases, where they are used to validate the lookup after
we've finalized things.

Of course, sometimes the "only get used for LOOKUP_RCU" is very very
unclear, because even without being an RCU lookup, step_into() will
save it into nd->inode/seq. So the values were "used", and
initializing them makes them valid, but then *that* copy must not then
be used unless RCU was set.

Also, sometimes the LOOKUP_RCU check is in the caller, and has
actually been cleared, so by the time the actual use comes around, you
just have to trust that it was a RCU lookup (ie
legitimize_links/root()).

So it all seems to work, and this patch then gets rid of one
particular odd case, but I think this patch basically hides the
compiler warning without really clarifying the code or the rules.

Anyway, what I'm building up to here is that I think we should
*document* this a bit more. and then make those initializations then
be about that documentation. I also get the feeling that
"nd->inode/nd->seq" should also be initialized.

Right now we have those quite subtle rules about "set vs use", and
while a lot of the uses are conditional on LOOKUP_RCU, that makes the
code correct, but doesn't solve the "pass uninitialized values as
arguments" case.

I also think it's very unclear when nd->inode/nd->seq are initialized,
and the compiler warning only caught the case where they were *set*
(but by arguments that weren't initialized), but didn't necessarily
catch the case where they weren't set at all in the first place and
then passed around.

End result:

 - I think I'd like path_init() (or set_nameidata) to actually
initialize nd->inode and nd->seq unconditionally too.

   Right now, they get initialized only for that LOOKUP_RCU case.
Pretty much exactly the same issue as the one this patch tries to
solve, except the compiler didn't notice because it's all indirect
through those structure fields and it just didn't track far enough.

 - I suspect it would be good to initialize them to actual invalid
values (rather than NULL/0 - particularly the sequence number)

 - I look at that follow_dotdot*() caller case, and think "that looks
very similar to the lookup_fast() case, but then we have *very*
different initialization rules".

Al - can you please take a quick look?

                    Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-02 17:23   ` Linus Torvalds
@ 2022-07-03  3:59     ` Al Viro
  2022-07-04  2:52     ` Al Viro
  1 sibling, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-03  3:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:
> On Fri, Jul 1, 2022 at 7:25 AM Alexander Potapenko <glider@google.com> wrote:
> >
> > Under certain circumstances initialization of `unsigned seq` and
> > `struct inode *inode` passed into step_into() may be skipped.
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into() (see [1] for more info).

> So while I think this needs to be fixed, I think I'd really prefer to
> make the initialization and/or usage rules stricter or at least
> clearer.

Disclaimer: the bits below are nowhere near what I consider a decent
explanation; this might serve as the first approximation, but I really
need to get some sleep before I get it into coherent shape.  4 hours
of sleep today...

The rules are
	* no pathname resolution without successful path_init().
IOW, path_init() failure is an instant fuck off.
	* path_init() success sets nd->inode.  In all cases.
	* nd->inode must be set - LOOKUP_RCU or not, we simply cannot
proceed without it.

	* in non-RCU mode nd->inode must be equal to nd->path.dentry->d_inode.
	* in RCU mode nd->inode must be equal to a value observed in
nd->path.dentry->d_inode while nd->path.dentry->d_seq had been equal to
nd->seq.

	* step_into() gets a dentry/inode/seq triple.  In non-RCU
mode inode and seq are ignored; in RCU mode they must satisfy the
same relationship we have for nd->path.dentry/nd->inode/nd->seq.

> Of course, sometimes the "only get used for LOOKUP_RCU" is very very
> unclear, because even without being an RCU lookup, step_into() will
> save it into nd->inode/seq. So the values were "used", and
> initializing them makes them valid, but then *that* copy must not then
> be used unless RCU was set.

You are misreading that (and I admit that it badly needs documentation).
The whole point of step_into() is to move over to new place.  nd->inode
*MUST* be set on success, no matter what.

>  - I look at that follow_dotdot*() caller case, and think "that looks
> very similar to the lookup_fast() case, but then we have *very*
> different initialization rules".

follow_dotdot() might as well lose inodep and seqp arguments - everything
would've worked just as well without those.  We would've gotten the same
complaints about uninitialized values passed to step_into(), though.

This
                if (unlikely(!parent))
                        error = step_into(nd, WALK_NOFOLLOW,
                                         nd->path.dentry, nd->inode, nd->seq);
in handle_dots() probably contributes to confusion - it's the "we
have stepped on .. in the root, just jump into whatever's mounted on
it" case.  In non-RCU case it looks like a use of nd->seq in non-RCU
mode; however, in that case step_into() will end up ignoring the
last two arguments.

I'll post something more coherent after I get some sleep.  Sorry... ;-/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-02 17:23   ` Linus Torvalds
  2022-07-03  3:59     ` Al Viro
@ 2022-07-04  2:52     ` Al Viro
  2022-07-04  8:20       ` Alexander Potapenko
  2022-07-04 17:36       ` Linus Torvalds
  1 sibling, 2 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04  2:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:

> Al - can you please take a quick look?

FWIW, trying to write a coherent documentation had its usual effect...
The thing is, we don't really need to fetch the inode that early.
All we really care about is that in RCU mode ->d_seq gets sampled
before we fetch ->d_inode *and* we don't treat "it looks negative"
as hard -ENOENT in case of ->d_seq mismatch.

Which can be bloody well left to step_into().  So we don't need
to pass it inode argument at all - just dentry and seq.  Makes
a bunch of functions simpler as well...

It does *not* deal with the "uninitialized" seq argument in
!RCU case; I'll handle that in the followup, but that's a separate
story, IMO (and very clearly a false positive).

Cumulative diff follows; splitup is in #work.namei.  Comments?

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..7f4f61ade9e3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1467,7 +1467,7 @@ EXPORT_SYMBOL(follow_down);
  * we meet a managed dentry that would need blocking.
  */
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+			       unsigned *seqp)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1497,13 +1497,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
 				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
 				flags = dentry->d_flags;
 				continue;
 			}
@@ -1515,8 +1508,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path, unsigned int *seqp)
 {
 	bool jumped;
 	int ret;
@@ -1525,9 +1517,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		if (likely(__follow_mount_rcu(nd, path, seqp)))
 			return 0;
 		if (!try_to_unlazy_next(nd, dentry, seq))
 			return -ECHILD;
@@ -1547,7 +1537,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
 	} else {
-		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
@@ -1607,9 +1596,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1628,22 +1615,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
 		*seqp = seq;
@@ -1838,13 +1814,21 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * for the common case.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry, unsigned seq)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path, &seq);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
+	if (unlikely(!inode)) {
+		if ((nd->flags & LOOKUP_RCU) &&
+		    read_seqcount_retry(&path.dentry->d_seq, seq))
+			return ERR_PTR(-ECHILD);
+		return ERR_PTR(-ENOENT);
+	}
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
@@ -1870,9 +1854,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *parent, *old;
 
@@ -1895,7 +1877,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
 	*seqp = read_seqcount_begin(&parent->d_seq);
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
@@ -1910,9 +1891,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	return NULL;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd, unsigned *seqp)
 {
 	struct dentry *parent;
 
@@ -1937,7 +1916,6 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		return ERR_PTR(-ENOENT);
 	}
 	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
@@ -1952,7 +1930,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
 		unsigned seq;
 
 		if (!nd->root.mnt) {
@@ -1961,17 +1938,17 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd, &seq);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd, &seq);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
 		if (unlikely(!parent))
 			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
+					 nd->path.dentry, nd->seq);
 		else
 			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+					 parent, seq);
 		if (unlikely(error))
 			return error;
 
@@ -1995,7 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
 	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
@@ -2007,7 +1983,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd, &seq);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1993,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry, seq);
 }
 
 /*
@@ -2473,8 +2449,7 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry, nd->seq));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3394,7 +3369,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	int open_flag = op->open_flag;
 	bool got_write = false;
 	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3384,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd, &seq);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3438,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry, seq);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  2:52     ` Al Viro
@ 2022-07-04  8:20       ` Alexander Potapenko
  2022-07-04 13:44         ` Al Viro
  2022-07-04 17:36       ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-04  8:20 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 4:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Jul 02, 2022 at 10:23:16AM -0700, Linus Torvalds wrote:
>
> > Al - can you please take a quick look?
>
> FWIW, trying to write a coherent documentation had its usual effect...
> The thing is, we don't really need to fetch the inode that early.
> All we really care about is that in RCU mode ->d_seq gets sampled
> before we fetch ->d_inode *and* we don't treat "it looks negative"
> as hard -ENOENT in case of ->d_seq mismatch.
>
> Which can be bloody well left to step_into().  So we don't need
> to pass it inode argument at all - just dentry and seq.  Makes
> a bunch of functions simpler as well...
>
> It does *not* deal with the "uninitialized" seq argument in
> !RCU case; I'll handle that in the followup, but that's a separate
> story, IMO (and very clearly a false positive).

I can confirm that your patch fixes KMSAN reports on inode, yet the
following reports still persist:

=====================================================
BUG: KMSAN: uninit-value in walk_component+0x5e7/0x6c0 fs/namei.c:1996
 walk_component+0x5e7/0x6c0 fs/namei.c:1996
 lookup_last fs/namei.c:2445
 path_lookupat+0x27d/0x6f0 fs/namei.c:2468
 filename_lookup+0x24c/0x800 fs/namei.c:2497
 kern_path+0x79/0x3a0 fs/namei.c:2587
 init_stat+0x72/0x13f fs/init.c:132
 clean_path+0x74/0x24c init/initramfs.c:339
 do_name+0x12d/0xc17 init/initramfs.c:371
 write_buffer init/initramfs.c:457
 unpack_to_rootfs+0x49a/0xd9e init/initramfs.c:510
 do_populate_rootfs+0x57/0x40f init/initramfs.c:699
 async_run_entry_fn+0x8f/0x400 kernel/async.c:127
 process_one_work+0xb27/0x13e0 kernel/workqueue.c:2289
 worker_thread+0x1076/0x1d60 kernel/workqueue.c:2436
 kthread+0x31b/0x430 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 ??:?

Local variable seq created at:
 walk_component+0x46/0x6c0 fs/namei.c:1981
 lookup_last fs/namei.c:2445
 path_lookupat+0x27d/0x6f0 fs/namei.c:2468

CPU: 0 PID: 10 Comm: kworker/u9:0 Tainted: G    B
5.19.0-rc4-00059-gcf2d25715943-dirty #103
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_unbound async_run_entry_fn
=====================================================

What makes you think they are false positives? Is the scenario I
described above:

"""
In particular, if the call to lookup_fast() in walk_component()
returns NULL, and lookup_slow() returns a valid dentry, then the
`seq` and `inode` will remain uninitialized until the call to
step_into()
"""

impossible?

> Cumulative diff follows; splitup is in #work.namei.  Comments?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 1f28d3f463c3..7f4f61ade9e3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1467,7 +1467,7 @@ EXPORT_SYMBOL(follow_down);
>   * we meet a managed dentry that would need blocking.
>   */
>  static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
> -                              struct inode **inode, unsigned *seqp)
> +                              unsigned *seqp)
>  {
>         struct dentry *dentry = path->dentry;
>         unsigned int flags = dentry->d_flags;
> @@ -1497,13 +1497,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
>                                 dentry = path->dentry = mounted->mnt.mnt_root;
>                                 nd->state |= ND_JUMPED;
>                                 *seqp = read_seqcount_begin(&dentry->d_seq);
> -                               *inode = dentry->d_inode;
> -                               /*
> -                                * We don't need to re-check ->d_seq after this
> -                                * ->d_inode read - there will be an RCU delay
> -                                * between mount hash removal and ->mnt_root
> -                                * becoming unpinned.
> -                                */
>                                 flags = dentry->d_flags;
>                                 continue;
>                         }
> @@ -1515,8 +1508,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
>  }
>
>  static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
> -                         struct path *path, struct inode **inode,
> -                         unsigned int *seqp)
> +                         struct path *path, unsigned int *seqp)
>  {
>         bool jumped;
>         int ret;
> @@ -1525,9 +1517,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
>         path->dentry = dentry;
>         if (nd->flags & LOOKUP_RCU) {
>                 unsigned int seq = *seqp;
> -               if (unlikely(!*inode))
> -                       return -ENOENT;
> -               if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
> +               if (likely(__follow_mount_rcu(nd, path, seqp)))
>                         return 0;
>                 if (!try_to_unlazy_next(nd, dentry, seq))
>                         return -ECHILD;
> @@ -1547,7 +1537,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
>                 if (path->mnt != nd->path.mnt)
>                         mntput(path->mnt);
>         } else {
> -               *inode = d_backing_inode(path->dentry);
>                 *seqp = 0; /* out of RCU mode, so the value doesn't matter */
>         }
>         return ret;
> @@ -1607,9 +1596,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
>         return dentry;
>  }
>
> -static struct dentry *lookup_fast(struct nameidata *nd,
> -                                 struct inode **inode,
> -                                 unsigned *seqp)
> +static struct dentry *lookup_fast(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *dentry, *parent = nd->path.dentry;
>         int status = 1;
> @@ -1628,22 +1615,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
>                         return NULL;
>                 }
>
> -               /*
> -                * This sequence count validates that the inode matches
> -                * the dentry name information from lookup.
> -                */
> -               *inode = d_backing_inode(dentry);
> -               if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
> -                       return ERR_PTR(-ECHILD);
> -
> -               /*
> +               /*
>                  * This sequence count validates that the parent had no
>                  * changes while we did the lookup of the dentry above.
> -                *
> -                * The memory barrier in read_seqcount_begin of child is
> -                *  enough, we can use __read_seqcount_retry here.
>                  */
> -               if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
> +               if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
>                         return ERR_PTR(-ECHILD);
>
>                 *seqp = seq;
> @@ -1838,13 +1814,21 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
>   * for the common case.
>   */
>  static const char *step_into(struct nameidata *nd, int flags,
> -                    struct dentry *dentry, struct inode *inode, unsigned seq)
> +                    struct dentry *dentry, unsigned seq)
>  {
>         struct path path;
> -       int err = handle_mounts(nd, dentry, &path, &inode, &seq);
> +       struct inode *inode;
> +       int err = handle_mounts(nd, dentry, &path, &seq);
>
>         if (err < 0)
>                 return ERR_PTR(err);
> +       inode = path.dentry->d_inode;
> +       if (unlikely(!inode)) {
> +               if ((nd->flags & LOOKUP_RCU) &&
> +                   read_seqcount_retry(&path.dentry->d_seq, seq))
> +                       return ERR_PTR(-ECHILD);
> +               return ERR_PTR(-ENOENT);
> +       }
>         if (likely(!d_is_symlink(path.dentry)) ||
>            ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
>            (flags & WALK_NOFOLLOW)) {
> @@ -1870,9 +1854,7 @@ static const char *step_into(struct nameidata *nd, int flags,
>         return pick_link(nd, &path, inode, seq, flags);
>  }
>
> -static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
> -                                       struct inode **inodep,
> -                                       unsigned *seqp)
> +static struct dentry *follow_dotdot_rcu(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *parent, *old;
>
> @@ -1895,7 +1877,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
>         }
>         old = nd->path.dentry;
>         parent = old->d_parent;
> -       *inodep = parent->d_inode;
>         *seqp = read_seqcount_begin(&parent->d_seq);
>         if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
>                 return ERR_PTR(-ECHILD);
> @@ -1910,9 +1891,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
>         return NULL;
>  }
>
> -static struct dentry *follow_dotdot(struct nameidata *nd,
> -                                struct inode **inodep,
> -                                unsigned *seqp)
> +static struct dentry *follow_dotdot(struct nameidata *nd, unsigned *seqp)
>  {
>         struct dentry *parent;
>
> @@ -1937,7 +1916,6 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
>                 return ERR_PTR(-ENOENT);
>         }
>         *seqp = 0;
> -       *inodep = parent->d_inode;
>         return parent;
>
>  in_root:
> @@ -1952,7 +1930,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
>         if (type == LAST_DOTDOT) {
>                 const char *error = NULL;
>                 struct dentry *parent;
> -               struct inode *inode;
>                 unsigned seq;
>
>                 if (!nd->root.mnt) {
> @@ -1961,17 +1938,17 @@ static const char *handle_dots(struct nameidata *nd, int type)
>                                 return error;
>                 }
>                 if (nd->flags & LOOKUP_RCU)
> -                       parent = follow_dotdot_rcu(nd, &inode, &seq);
> +                       parent = follow_dotdot_rcu(nd, &seq);
>                 else
> -                       parent = follow_dotdot(nd, &inode, &seq);
> +                       parent = follow_dotdot(nd, &seq);
>                 if (IS_ERR(parent))
>                         return ERR_CAST(parent);
>                 if (unlikely(!parent))
>                         error = step_into(nd, WALK_NOFOLLOW,
> -                                        nd->path.dentry, nd->inode, nd->seq);
> +                                        nd->path.dentry, nd->seq);
>                 else
>                         error = step_into(nd, WALK_NOFOLLOW,
> -                                        parent, inode, seq);
> +                                        parent, seq);
>                 if (unlikely(error))
>                         return error;
>
> @@ -1995,7 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
>  static const char *walk_component(struct nameidata *nd, int flags)
>  {
>         struct dentry *dentry;
> -       struct inode *inode;
>         unsigned seq;
>         /*
>          * "." and ".." are special - ".." especially so because it has
> @@ -2007,7 +1983,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
>                         put_link(nd);
>                 return handle_dots(nd, nd->last_type);
>         }
> -       dentry = lookup_fast(nd, &inode, &seq);
> +       dentry = lookup_fast(nd, &seq);
>         if (IS_ERR(dentry))
>                 return ERR_CAST(dentry);
>         if (unlikely(!dentry)) {
> @@ -2017,7 +1993,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
>         }
>         if (!(flags & WALK_MORE) && nd->depth)
>                 put_link(nd);
> -       return step_into(nd, flags, dentry, inode, seq);
> +       return step_into(nd, flags, dentry, seq);
>  }
>
>  /*
> @@ -2473,8 +2449,7 @@ static int handle_lookup_down(struct nameidata *nd)
>  {
>         if (!(nd->flags & LOOKUP_RCU))
>                 dget(nd->path.dentry);
> -       return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
> -                       nd->path.dentry, nd->inode, nd->seq));
> +       return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry, nd->seq));
>  }
>
>  /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
> @@ -3394,7 +3369,6 @@ static const char *open_last_lookups(struct nameidata *nd,
>         int open_flag = op->open_flag;
>         bool got_write = false;
>         unsigned seq;
> -       struct inode *inode;
>         struct dentry *dentry;
>         const char *res;
>
> @@ -3410,7 +3384,7 @@ static const char *open_last_lookups(struct nameidata *nd,
>                 if (nd->last.name[nd->last.len])
>                         nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
>                 /* we _can_ be in RCU mode here */
> -               dentry = lookup_fast(nd, &inode, &seq);
> +               dentry = lookup_fast(nd, &seq);
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
>                 if (likely(dentry))
> @@ -3464,7 +3438,7 @@ static const char *open_last_lookups(struct nameidata *nd,
>  finish_lookup:
>         if (nd->depth)
>                 put_link(nd);
> -       res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
> +       res = step_into(nd, WALK_TRAILING, dentry, seq);
>         if (unlikely(res))
>                 nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
>         return res;



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  8:20       ` Alexander Potapenko
@ 2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 13:44 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:

> What makes you think they are false positives? Is the scenario I
> described above:
> 
> """
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into()
> """
> 
> impossible?

Suppose step_into() has been called in non-RCU mode.  The first
thing it does is
	int err = handle_mounts(nd, dentry, &path, &seq);
	if (err < 0) 
		return ERR_PTR(err);

And handle_mounts() in non-RCU mode is
	path->mnt = nd->path.mnt;
	path->dentry = dentry;
	if (nd->flags & LOOKUP_RCU) {
		[unreachable code]
	}
	[code not touching seqp]
	if (unlikely(ret)) {
		[code not touching seqp]
	} else {
		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
	}
	return ret;

In other words, the value seq argument of step_into() used to have ends up
being never fetched and, in case step_into() gets past that if (err < 0)
that value is replaced with zero before any further accesses.

So it's a false positive; yes, strictly speaking compiler is allowd
to do anything whatsoever if it manages to prove that the value is
uninitialized.  Realistically, though, especially since unsigned int
is not allowed any trapping representations...

If you want an test stripped of VFS specifics, consider this:

int g(int n, _Bool flag)
{
	if (!flag)
		n = 0;
	return n + 1;
}

int f(int n, _Bool flag)
{
	int x;

	if (flag)
		x = n + 2;
	return g(x, flag);
}

Do your tools trigger on it?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
@ 2022-07-04 13:55           ` Al Viro
  2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:00           ` Al Viro
  2 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 13:55 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> 
> > What makes you think they are false positives? Is the scenario I
> > described above:
> > 
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> > 
> > impossible?
> 
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
> 	int err = handle_mounts(nd, dentry, &path, &seq);
> 	if (err < 0) 
> 		return ERR_PTR(err);
> 
> And handle_mounts() in non-RCU mode is
> 	path->mnt = nd->path.mnt;
> 	path->dentry = dentry;
> 	if (nd->flags & LOOKUP_RCU) {
> 		[unreachable code]
> 	}
> 	[code not touching seqp]
> 	if (unlikely(ret)) {
> 		[code not touching seqp]
> 	} else {
> 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
> 	}
> 	return ret;

Make that
 	[code assigning ret a non-negative value and never using seqp]
 	if (unlikely(ret)) {
 		[code never using seqp or ret]
 	} else {
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;

so if (err < 0) in the caller is equivalent to if (err).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
@ 2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:03             ` Greg Kroah-Hartman
  2022-07-04 18:23             ` Segher Boessenkool
  2022-07-04 16:00           ` Al Viro
  2 siblings, 2 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-04 15:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 3:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
>
> > What makes you think they are false positives? Is the scenario I
> > described above:
> >
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> >
> > impossible?
>
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
>         int err = handle_mounts(nd, dentry, &path, &seq);
>         if (err < 0)
>                 return ERR_PTR(err);
>
> And handle_mounts() in non-RCU mode is
>         path->mnt = nd->path.mnt;
>         path->dentry = dentry;
>         if (nd->flags & LOOKUP_RCU) {
>                 [unreachable code]
>         }
>         [code not touching seqp]
>         if (unlikely(ret)) {
>                 [code not touching seqp]
>         } else {
>                 *seqp = 0; /* out of RCU mode, so the value doesn't matter */
>         }
>         return ret;
>
> In other words, the value seq argument of step_into() used to have ends up
> being never fetched and, in case step_into() gets past that if (err < 0)
> that value is replaced with zero before any further accesses.

Oh, I see. That is actually what had been discussed here:
https://lore.kernel.org/linux-toolchains/20220614144853.3693273-1-glider@google.com/
Indeed, step_into() in its current implementation does not use `seq`
(which is noted in the patch description ;)), but the question is
whether we want to catch such cases regardless of that.
One of the reasons to do so is standard compliance - passing an
uninitialized value to a function is UB in C11, as Segher pointed out
here: https://lore.kernel.org/linux-toolchains/20220614214039.GA25951@gate.crashing.org/
The compilers may not be smart enough to take advantage of this _yet_,
but I wouldn't underestimate their ability to evolve (especially that
of Clang).
I also believe it's fragile to rely on the callee to ignore certain
parameters: it may be doing so today, but if someone changes
step_into() tomorrow we may miss it.

If I am reading Linus's message here (and the following one from him
in the same thread):
https://lore.kernel.org/linux-toolchains/CAHk-=whjz3wO8zD+itoerphWem+JZz4uS3myf6u1Wd6epGRgmQ@mail.gmail.com/
correctly, we should be reporting uninitialized values passed to
functions, unless those values dissolve after inlining.
While this is a bit of a vague criterion, at least for Clang we always
know that KMSAN instrumentation is applied after inlining, so the
reports we see are due to values that are actually passed between
functions.

> So it's a false positive; yes, strictly speaking compiler is allowd
> to do anything whatsoever if it manages to prove that the value is
> uninitialized.  Realistically, though, especially since unsigned int
> is not allowed any trapping representations...
>
> If you want an test stripped of VFS specifics, consider this:
>
> int g(int n, _Bool flag)
> {
>         if (!flag)
>                 n = 0;
>         return n + 1;
> }
>
> int f(int n, _Bool flag)
> {
>         int x;
>
>         if (flag)
>                 x = n + 2;
>         return g(x, flag);
> }
>
> Do your tools trigger on it?

Currently KMSAN has two modes of operation controlled by
CONFIG_KMSAN_CHECK_PARAM_RETVAL.
When enabled, that config enforces checks of function parameters at
call sites (by applying Clang's -fsanitize-memory-param-retval flag).
In that mode the tool would report the attempt to call `g(x, false)`
if g() survives inlining.
In the case CONFIG_KMSAN_CHECK_PARAM_RETVAL=n, KMSAN won't be
reporting the error.

Based on the mentioned discussion I decided to make
CONFIG_KMSAN_CHECK_PARAM_RETVAL=y the default option.
So far it only reported a handful of errors (i.e. enforcing this rule
shouldn't be very problematic for the kernel), but it simplifies
handling of calls between instrumented and non-instrumented functions
that occur e.g. at syscall entry points: knowing that passed-by-value
arguments are checked at call sites, we can assume they are
initialized in the callees.


HTH,
Alex

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 13:44         ` Al Viro
  2022-07-04 13:55           ` Al Viro
  2022-07-04 15:49           ` Alexander Potapenko
@ 2022-07-04 16:00           ` Al Viro
  2022-07-04 16:47             ` Alexander Potapenko
  2 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 16:00 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> 
> > What makes you think they are false positives? Is the scenario I
> > described above:
> > 
> > """
> > In particular, if the call to lookup_fast() in walk_component()
> > returns NULL, and lookup_slow() returns a valid dentry, then the
> > `seq` and `inode` will remain uninitialized until the call to
> > step_into()
> > """
> > 
> > impossible?
> 
> Suppose step_into() has been called in non-RCU mode.  The first
> thing it does is
> 	int err = handle_mounts(nd, dentry, &path, &seq);
> 	if (err < 0) 
> 		return ERR_PTR(err);
> 
> And handle_mounts() in non-RCU mode is
> 	path->mnt = nd->path.mnt;
> 	path->dentry = dentry;
> 	if (nd->flags & LOOKUP_RCU) {
> 		[unreachable code]
> 	}
> 	[code not touching seqp]
> 	if (unlikely(ret)) {
> 		[code not touching seqp]
> 	} else {
> 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
> 	}
> 	return ret;
> 
> In other words, the value seq argument of step_into() used to have ends up
> being never fetched and, in case step_into() gets past that if (err < 0)
> that value is replaced with zero before any further accesses.
> 
> So it's a false positive; yes, strictly speaking compiler is allowd
> to do anything whatsoever if it manages to prove that the value is
> uninitialized.  Realistically, though, especially since unsigned int
> is not allowed any trapping representations...

FWIW, update (and yet untested) branch is in #work.namei.  Compared to the
previous, we store sampled ->d_seq of the next dentry in nd->next_seq,
rather than bothering with local variables.  AFAICS, it ends up with
better code that way.  And both ->seq and ->next_seq are zeroed at the
moments when we switch to non-RCU mode (as well as non-RCU path_init()).

IMO it looks saner that way.  NOTE: it still needs to be tested and probably
reordered and massaged; it's not for merge at the moment.  Current cumulative
diff follows:

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..b12c6b53579d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1466,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,14 +1499,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				flags = dentry->d_flags;
 				continue;
 			}
@@ -1515,8 +1511,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1524,16 +1519,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been just clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1546,9 +1540,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1607,9 +1598,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1620,37 +1609,24 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1731,7 +1707,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1746,7 +1722,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1760,11 +1736,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1774,7 +1750,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1836,15 +1812,25 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: nd->next_seq must be the validated sampled dentry->d_seq
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
+	if (unlikely(!inode)) {
+		if ((nd->flags & LOOKUP_RCU) &&
+		    read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+			return ERR_PTR(-ECHILD);
+		return ERR_PTR(-ENOENT);
+	}
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
@@ -1856,23 +1842,21 @@ static const char *step_into(struct nameidata *nd, int flags,
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1895,8 +1879,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1907,12 +1890,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	nd->next_seq = nd->seq;
+	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1936,15 +1918,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1952,8 +1931,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1961,17 +1938,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -1995,8 +1967,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +1977,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1987,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2372,6 +2342,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2445,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	nd->next_seq = nd->seq;
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,8 +3365,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3380,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3434,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 15:49           ` Alexander Potapenko
@ 2022-07-04 16:03             ` Greg Kroah-Hartman
  2022-07-04 16:33               ` Alexander Potapenko
  2022-07-04 18:23             ` Segher Boessenkool
  1 sibling, 1 reply; 33+ messages in thread
From: Greg Kroah-Hartman @ 2022-07-04 16:03 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> This e-mail is confidential. If you received this communication by
> mistake, please don't forward it to anyone else, please erase all
> copies and attachments, and please let me know that it has gone to the
> wrong person.

This is not compatible with Linux kernel development, sorry.

Now deleted.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 16:03             ` Greg Kroah-Hartman
@ 2022-07-04 16:33               ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-04 16:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Herbert Xu,
	Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim,
	Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox,
	Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek,
	Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum,
	Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 6:03 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> > This e-mail is confidential. If you received this communication by
> > mistake, please don't forward it to anyone else, please erase all
> > copies and attachments, and please let me know that it has gone to the
> > wrong person.
>
> This is not compatible with Linux kernel development, sorry.
>
> Now deleted.

Sorry, I shouldn't have added those to public emails.
Apologies for the inconvenience.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 16:00           ` Al Viro
@ 2022-07-04 16:47             ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-04 16:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 6:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Jul 04, 2022 at 02:44:00PM +0100, Al Viro wrote:
> > On Mon, Jul 04, 2022 at 10:20:53AM +0200, Alexander Potapenko wrote:
> >
> > > What makes you think they are false positives? Is the scenario I
> > > described above:
> > >
> > > """
> > > In particular, if the call to lookup_fast() in walk_component()
> > > returns NULL, and lookup_slow() returns a valid dentry, then the
> > > `seq` and `inode` will remain uninitialized until the call to
> > > step_into()
> > > """
> > >
> > > impossible?
> >
> > Suppose step_into() has been called in non-RCU mode.  The first
> > thing it does is
> >       int err = handle_mounts(nd, dentry, &path, &seq);
> >       if (err < 0)
> >               return ERR_PTR(err);
> >
> > And handle_mounts() in non-RCU mode is
> >       path->mnt = nd->path.mnt;
> >       path->dentry = dentry;
> >       if (nd->flags & LOOKUP_RCU) {
> >               [unreachable code]
> >       }
> >       [code not touching seqp]
> >       if (unlikely(ret)) {
> >               [code not touching seqp]
> >       } else {
> >               *seqp = 0; /* out of RCU mode, so the value doesn't matter */
> >       }
> >       return ret;
> >
> > In other words, the value seq argument of step_into() used to have ends up
> > being never fetched and, in case step_into() gets past that if (err < 0)
> > that value is replaced with zero before any further accesses.
> >
> > So it's a false positive; yes, strictly speaking compiler is allowd
> > to do anything whatsoever if it manages to prove that the value is
> > uninitialized.  Realistically, though, especially since unsigned int
> > is not allowed any trapping representations...
>
> FWIW, update (and yet untested) branch is in #work.namei.  Compared to the
> previous, we store sampled ->d_seq of the next dentry in nd->next_seq,
> rather than bothering with local variables.  AFAICS, it ends up with
> better code that way.  And both ->seq and ->next_seq are zeroed at the
> moments when we switch to non-RCU mode (as well as non-RCU path_init()).
>
> IMO it looks saner that way.  NOTE: it still needs to be tested and probably
> reordered and massaged; it's not for merge at the moment.  Current cumulative
> diff follows:

I confirm all KMSAN reports are gone as a result of applying this patch.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04  2:52     ` Al Viro
  2022-07-04  8:20       ` Alexander Potapenko
@ 2022-07-04 17:36       ` Linus Torvalds
  2022-07-04 19:02         ` Al Viro
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2022-07-04 17:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Sun, Jul 3, 2022 at 7:53 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, trying to write a coherent documentation had its usual effect...
> The thing is, we don't really need to fetch the inode that early.

Hmm. I like the patch, but as I was reading through it, I had a question...

In particular, I'd like it even more if each step when the sequence
number is updated also had a comment about what then protects the
previous sequence number up to and over that new sequence point.

For example, in __follow_mount_rcu(), when we jump to a new mount
point, and that sequence has

                *seqp = read_seqcount_begin(&dentry->d_seq);

to reset the sequence number to the new path we jumped into.

But I don't actually see what checks the previous sequence number in
that path. We just reset it to the new one.

In contrast, in lookup_fast(), we get the new sequence number from
__d_lookup_rcu(), and then after getting the new one and before
"instantiating" it, we will revalidate the parent sequence number.

So lookup_fast() has that "chain of sequence numbers".

For __follow_mount_rcu it looks like validating the previous sequence
number is left to the caller, which then does try_to_unlazy_next().

So when reading this code, my reaction was that it really would have
been much nicer to have that kind of clear "handoff" of one sequence
number domain to the next that lookup_fast() has.

IOW, I think it would be lovely to clarify the sequence number handoff.

I only quickly scanned your second patch for this, it does seem to at
least collect it all into try_to_unlazy_next().

So maybe you already looked at exactly this, but it would be good to
be quite explicit about the sequence number logic because it's "a bit
opaque" to say the least.

                   Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 15:49           ` Alexander Potapenko
  2022-07-04 16:03             ` Greg Kroah-Hartman
@ 2022-07-04 18:23             ` Segher Boessenkool
  1 sibling, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2022-07-04 18:23 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Al Viro, Linus Torvalds, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Vitaly Buka, linux-toolchains

On Mon, Jul 04, 2022 at 05:49:13PM +0200, Alexander Potapenko wrote:
> One of the reasons to do so is standard compliance - passing an
> uninitialized value to a function is UB in C11, as Segher pointed out
> here: https://lore.kernel.org/linux-toolchains/20220614214039.GA25951@gate.crashing.org/
> The compilers may not be smart enough to take advantage of this _yet_,
> but I wouldn't underestimate their ability to evolve (especially that
> of Clang).

GCC doesn't currently detect this UB, and doesn't even warn or error for
this, although that shouldn't be hard to do: it is all completely local.
An error is warranted here, and you won't get UB ever either then.

> I also believe it's fragile to rely on the callee to ignore certain
> parameters: it may be doing so today, but if someone changes
> step_into() tomorrow we may miss it.

There isn't any choice usually, this is C, do you want varargs?  :-)

But yes, you always should only pass "safe" values; callers should do
their part, and not assume the callee will do in the future as it does
now.  Defensive programming is mostly about defending your own sanity!


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 17:36       ` Linus Torvalds
@ 2022-07-04 19:02         ` Al Viro
  2022-07-04 19:16           ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 10:36:05AM -0700, Linus Torvalds wrote:

> For example, in __follow_mount_rcu(), when we jump to a new mount
> point, and that sequence has
> 
>                 *seqp = read_seqcount_begin(&dentry->d_seq);
> 
> to reset the sequence number to the new path we jumped into.
> 
> But I don't actually see what checks the previous sequence number in
> that path. We just reset it to the new one.

Theoretically it could be a problem.  We have /mnt/foo/bar and
/mnt/baz/bar.  Something's mounted on /mnt/foo, hiding /mnt/foo/bar.
We start a pathwalk for /mnt/baz/bar,
someone umounts /mnt/foo and swaps /mnt/foo to /mnt/baz before
we get there.  We are doomed to get -ECHILD from an attempt to
legitimize in the end, no matter what.  However, we might get
a hard error (-ENOENT, for example) before that, if we pick up
the old mount that used to be on top of /mnt/foo (now /mnt/baz)
and had been detached before the damn thing had become /mnt/baz
and notice that there's no "bar" in its root.

It used to be impossible (rename would've failed if the target had
been non-empty and had we managed to empty it first, well, there's
your point when -ENOENT would've been accurate).  With exchange...
Yes, it's a possible race.

Might need to add
                                if (read_seqretry(&mount_lock, nd->m_seq))
					return false;
in there.  And yes, it's a nice demonstration of how subtle and
brittle RCU pathwalk is - nobody noticed this bit of fun back when
RENAME_EXCHANGE had been added...  It got a lot more readable these
days, but...

> For __follow_mount_rcu it looks like validating the previous sequence
> number is left to the caller, which then does try_to_unlazy_next().

Not really - the caller goes there only if we have __follow_mount_rcu()
say "it's too tricky for me, get out of RCU mode and deal with it
there".

Anyway, I've thrown a mount_lock check in there, running xfstests to
see how it goes...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:02         ` Al Viro
@ 2022-07-04 19:16           ` Linus Torvalds
  2022-07-04 19:55             ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2022-07-04 19:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Anyway, I've thrown a mount_lock check in there, running xfstests to
> see how it goes...

So my reaction had been that it would be good to just do something like this:

  diff --git a/fs/namei.c b/fs/namei.c
  index 1f28d3f463c3..25c4bcc91142 100644
  --- a/fs/namei.c
  +++ b/fs/namei.c
  @@ -1493,11 +1493,18 @@ static bool __follow_mount_rcu(struct n...
      if (flags & DCACHE_MOUNTED) {
          struct mount *mounted = __lookup_mnt(path->mnt, dentry);
          if (mounted) {
  +           struct dentry *old_dentry = dentry;
  +           unsigned old_seq = *seqp;
  +
              path->mnt = &mounted->mnt;
              dentry = path->dentry = mounted->mnt.mnt_root;
              nd->state |= ND_JUMPED;
              *seqp = read_seqcount_begin(&dentry->d_seq);
              *inode = dentry->d_inode;
  +
  +           if (read_seqcount_retry(&old_dentry->d_seq, old_seq))
  +               return false;
  +
              /*
               * We don't need to re-check ->d_seq after this
               * ->d_inode read - there will be an RCU delay

but the above is just whitespace-damaged random monkey-scribbling by
yours truly.

More like a "shouldn't we do something like this" than a serious
patch, in other words.

IOW, it has *NOT* had a lot of real thought behind it. Purely a
"shouldn't we always clearly check the old sequence number after we've
picked up the new one?"

                   Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:16           ` Linus Torvalds
@ 2022-07-04 19:55             ` Al Viro
  2022-07-04 20:24               ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 12:16:24PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 12:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Anyway, I've thrown a mount_lock check in there, running xfstests to
> > see how it goes...
> 
> So my reaction had been that it would be good to just do something like this:
> 
>   diff --git a/fs/namei.c b/fs/namei.c
>   index 1f28d3f463c3..25c4bcc91142 100644
>   --- a/fs/namei.c
>   +++ b/fs/namei.c
>   @@ -1493,11 +1493,18 @@ static bool __follow_mount_rcu(struct n...
>       if (flags & DCACHE_MOUNTED) {
>           struct mount *mounted = __lookup_mnt(path->mnt, dentry);
>           if (mounted) {
>   +           struct dentry *old_dentry = dentry;
>   +           unsigned old_seq = *seqp;
>   +
>               path->mnt = &mounted->mnt;
>               dentry = path->dentry = mounted->mnt.mnt_root;
>               nd->state |= ND_JUMPED;
>               *seqp = read_seqcount_begin(&dentry->d_seq);
>               *inode = dentry->d_inode;
>   +
>   +           if (read_seqcount_retry(&old_dentry->d_seq, old_seq))
>   +               return false;
>   +
>               /*
>                * We don't need to re-check ->d_seq after this
>                * ->d_inode read - there will be an RCU delay
> 
> but the above is just whitespace-damaged random monkey-scribbling by
> yours truly.
> 
> More like a "shouldn't we do something like this" than a serious
> patch, in other words.
> 
> IOW, it has *NOT* had a lot of real thought behind it. Purely a
> "shouldn't we always clearly check the old sequence number after we've
> picked up the new one?"

You are checking the wrong thing here.  It's really about mount_lock -
->d_seq is *not* bumped when we or attach in some namespace.  If there's
a mismatch, RCU pathwalk is doomed anyway (it'll fail any form of unlazy)
and we might as well bugger off.  If it *does* match, we know that both
mountpoint and root had been pinned since before the pathwalk, remain
pinned as of that check and had a mount connecting them all along.
IOW, if we could have arrived to this dentry at any point, we would have
gotten that dentry as the next step.

We sample into nd->m_seq in path_init() and we want it to stay unchanged
all along.  If it does, all mountpoints and roots we observe are pinned
and their association with each other is stable.

It's not dentry -> dentry, it's dentry -> mount -> dentry.  The following
would've been safe:

	find mountpoint
	sample ->d_seq
	verify whatever had lead us to mountpoint

	sample mount_lock
	find mount
	verify mountpoint's ->d_seq

	find root of mounted
	sample its ->d_seq
	verify mount_lock

Correct?  Now, note that the last step done against the value we'd sampled
in path_init() guarantees that mount hash had not changed through all of
that.  Which is to say, we can pretend that we'd found mount before ->d_seq
of mountpoint might've changed, leaving us with

	find mountpoint
	sample ->d_seq
	verify whatever had lead us to mountpoint
	find mount
	find root of mounted
	sample its ->d_seq
	verify mount_lock

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 19:55             ` Al Viro
@ 2022-07-04 20:24               ` Linus Torvalds
  2022-07-04 20:46                 ` Al Viro
  2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 12:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> You are checking the wrong thing here.  It's really about mount_lock -
> ->d_seq is *not* bumped when we or attach in some namespace.

I think we're talking past each other.

Yes, we need to check the mount sequence lock too, because we're doing
that mount traversal.

But I think we *also* need to check the dentry sequence count, because
the dentry itself could have been moved to another parent.

The two are entirely independent, aren't they?

And the dentry sequence point check should go along with the "we're
now updating the sequence point from the old dentry to the new".

The mount point check should go around the "check dentry mount point",
but it's a separate issue from the whole "we are now jumping to a
different dentry, we should check that the previous dentry hasn't
changed".

                    Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:24               ` Linus Torvalds
@ 2022-07-04 20:46                 ` Al Viro
  2022-07-04 20:51                   ` Linus Torvalds
  2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 20:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 01:24:48PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 12:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > You are checking the wrong thing here.  It's really about mount_lock -
> > ->d_seq is *not* bumped when we or attach in some namespace.
> 
> I think we're talking past each other.

We might be.
 
> Yes, we need to check the mount sequence lock too, because we're doing
> that mount traversal.
> 
> But I think we *also* need to check the dentry sequence count, because
> the dentry itself could have been moved to another parent.

Why is that a problem?  It could have been moved to another parent,
but so it could after we'd crossed to the mounted and we wouldn't have
noticed (or cared).

What the chain of seqcount checks gives us is that with some timings
it would be possible to traverse that path, not that it had remained
valid through the entire pathwalk.

What I'm suggesting is to treat transition from mountpoint to mount
as happening instantly, with transition from mount to root sealed by
mount_lock check.

If that succeeds, there had been possible history in which refwalk
would have passed through the same dentry/mount/dentry and arrived
to the root dentry when it had the sampled ->d_seq value.

Sure, mountpoint might be moved since we'd reached it.  And the mount
would move with it, so we can pretend that we'd won the race and got
into the mount before it had the mountpoint had been moved.

Am I missing something fundamental about the things the sequence of
sampling and verifications gives us?  I'd always thought it's about
verifying that resulting history would be possible for a non-RCU
pathwalk with the right timings.  What am I missing?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:24               ` Linus Torvalds
  2022-07-04 20:46                 ` Al Viro
@ 2022-07-04 20:47                 ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 1:24 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The mount point check should go around the "check dentry mount point",
> but it's a separate issue from the whole "we are now jumping to a
> different dentry, we should check that the previous dentry hasn't
> changed".

Maybe it doesn't really matter, because we never actually end up
dereferencing the previous dentry (exactly since we're following the
mount point on it).

It feels like the sequence point checks are basically tied to the
"we're looking at the inode that the dentry pointed to", and because
the mount-point traversal doesn't need to look at the inode, the
sequence point check also isn't done.

But it feels wrong to traverse a dentry under RCU - even if we don't
then look at the inode itself - without having verified that the
dentry is still valid.

Yes, the d_seq lock protects against the inode going away (aka
"unlink()") and that cannot happen when it's a mount-point.

But it _also_ ends up changing for __d_move() when the name of the
dentry changes.

And I think that name change is relevant even to "look up a mount
point", exactly because we used that name to look up the dentry in the
first place, so if the name is changing, we shouldn't traverse that
mount point.

But I may have just confused myself terminally here.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:46                 ` Al Viro
@ 2022-07-04 20:51                   ` Linus Torvalds
  2022-07-04 21:04                     ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2022-07-04 20:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 1:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Why is that a problem?  It could have been moved to another parent,
> but so it could after we'd crossed to the mounted and we wouldn't have
> noticed (or cared).

Yeah, see my other email.

I agree that it might be a "we don't actually care" situation, where
all we care about that the name was valid at one point (when we picked
up that sequence point). So maybe we don't care about closing it.

But even if so, I think it might warrant a comment, because I still
feel like we're basically "throwing away" our previous sequence point
information without ever checking it.

Maybe all we ever care about is basically "this sequence point
protects the dentry inode pointer for the next lookup", and when it
comes to mount points that ends up being immaterial.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-04 20:51                   ` Linus Torvalds
@ 2022-07-04 21:04                     ` Al Viro
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 01:51:16PM -0700, Linus Torvalds wrote:
> On Mon, Jul 4, 2022 at 1:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Why is that a problem?  It could have been moved to another parent,
> > but so it could after we'd crossed to the mounted and we wouldn't have
> > noticed (or cared).
> 
> Yeah, see my other email.
> 
> I agree that it might be a "we don't actually care" situation, where
> all we care about that the name was valid at one point (when we picked
> up that sequence point). So maybe we don't care about closing it.
> 
> But even if so, I think it might warrant a comment, because I still
> feel like we're basically "throwing away" our previous sequence point
> information without ever checking it.
> 
> Maybe all we ever care about is basically "this sequence point
> protects the dentry inode pointer for the next lookup", and when it
> comes to mount points that ends up being immaterial.

	There is a problem, actually, but it's in a different place...
OK, let me try to write something resembling a formal proof and see
what falls out.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 21:04                     ` Al Viro
@ 2022-07-04 23:13                       ` Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
                                           ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Validate mount_lock seqcount as soon as we cross into mount in RCU
mode.  Sure, ->mnt_root is pinned and will remain so until we
do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
the mount_lock had been touched, but we might run into a hard error
(e.g. -ENOENT) before trying to unlazy.  And it's possible to end
up with RCU pathwalk racing with rename() and umount() in a way
that would fail with -ENOENT while non-RCU pathwalk would've
succeeded with any timings.

Once upon a time we hadn't needed that, but analysis had been subtle,
brittle and went out of window as soon as RENAME_EXCHANGE had been
added.

It's narrow, hard to hit and won't get you anything other than
stray -ENOENT that could be arranged in much easier way with the
same priveleges, but it's a bug all the same.

Cc: stable@kernel.org
X-sky-is-falling: unlikely
Fixes: da1ce0670c14 "vfs: add cross-rename"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..4dbf55b37ec6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1505,6 +1505,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				 * becoming unpinned.
 				 */
 				flags = dentry->d_flags;
+				if (read_seqretry(&mount_lock, nd->m_seq))
+					return false;
 				continue;
 			}
 			if (read_seqretry(&mount_lock, nd->m_seq))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
@ 2022-07-04 23:14                         ` Al Viro
  2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
                                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Instead of returning NULL when we are in root, just make it return
the current position (and set *seqp and *inodep accordingly).
That collapses the calls of step_into() in handle_dots()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4dbf55b37ec6..ecdb9ac21ece 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1909,7 +1909,9 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	*seqp = nd->seq;
+	*inodep = nd->path.dentry->d_inode;
+	return nd->path.dentry;
 }
 
 static struct dentry *follow_dotdot(struct nameidata *nd,
@@ -1945,8 +1947,9 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	*seqp = 0;
+	*inodep = nd->path.dentry->d_inode;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1968,12 +1971,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
 			parent = follow_dotdot(nd, &inode, &seq);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
 		if (unlikely(error))
 			return error;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
@ 2022-07-04 23:14                         ` Al Viro
  2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
                                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

New field: nd->next_seq.  Set to 0 outside of RCU mode, holds the sampled
value for the next dentry to be considered.  Used instead of an arseload
of local variables, arguments, etc.

step_into() has lost seq argument; nd->next_seq is used, so dentry passed
to it must be the one ->next_seq is about.

There are two requirements for RCU pathwalk:
	1) it should not give a hard failure (other than -ECHILD) unless
non-RCU pathwalk might fail that way given suitable timings.
	2) it should not succeed unless non-RCU pathwalk might succeed
with the same end location given suitable timings.

The use of seq numbers is the way we achieve that.  Invariant we want
to maintain is:
	if RCU pathwalk can reach the state with given nd->path, nd->inode
and nd->seq after having traversed some part of pathname, it must be possible
for non-RCU pathwalk to reach the same nd->path and nd->inode after having
traversed the same part of pathname, and observe the nd->path.dentry->d_seq
equal to what RCU pathwalk has in nd->seq

	For transition from parent to child, we sample child's ->d_seq
and verify that parent's ->d_seq remains unchanged.
	For transitions from child to parent we sample parent's ->d_seq
and verify that child's ->d_seq has not changed.
	For transition from mountpoint to root of mounted we sample
the ->d_seq of root and verify that nobody has touched mount_lock since
the beginning of pathwalk.  That guarantees that mount we'd found had
been there all along, with these mountpoint and root of mounted.
It would be possible for a non-RCU pathwalk to reach the previous state,
find the same mount and observe its root at the moment we'd sampled
->d_seq of that
	For transitions from root of mounted to mountpoint we sample
->d_seq of mountpoint and verify that mount_lock had not been touched
since the beginning of pathwalk.  The same reasoning as in the
previous case applies.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 104 ++++++++++++++++++++++++++---------------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ecdb9ac21ece..c7c9e88add85 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1467,7 +1471,7 @@ EXPORT_SYMBOL(follow_down);
  * we meet a managed dentry that would need blocking.
  */
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+			       struct inode **inode)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,7 +1500,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				*inode = dentry->d_inode;
 				/*
 				 * We don't need to re-check ->d_seq after this
@@ -1505,6 +1509,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				 * becoming unpinned.
 				 */
 				flags = dentry->d_flags;
+				// makes sure that non-RCU pathwalk could reach
+				// this state.
 				if (read_seqretry(&mount_lock, nd->m_seq))
 					return false;
 				continue;
@@ -1517,8 +1523,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path, struct inode **inode)
 {
 	bool jumped;
 	int ret;
@@ -1526,16 +1531,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path, inode)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1550,7 +1554,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 			mntput(path->mnt);
 	} else {
 		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1610,8 +1613,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 }
 
 static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+				  struct inode **inode)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1622,8 +1624,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
@@ -1635,7 +1636,7 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		 * the dentry name information from lookup.
 		 */
 		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+		if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 			return ERR_PTR(-ECHILD);
 
 		/*
@@ -1648,11 +1649,10 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1733,7 +1733,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1748,7 +1748,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1762,11 +1762,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1776,7 +1776,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1838,12 +1838,14 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry, struct inode *inode)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	int err = handle_mounts(nd, dentry, &path, &inode);
 
 	if (err < 0)
 		return ERR_PTR(err);
@@ -1858,23 +1860,22 @@ static const char *step_into(struct nameidata *nd, int flags,
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
 static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+					struct inode **inodep)
 {
 	struct dentry *parent, *old;
 
@@ -1891,6 +1892,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		nd->path = path;
 		nd->inode = path.dentry->d_inode;
 		nd->seq = seq;
+		// makes sure that non-RCU pathwalk could reach this state
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 			return ERR_PTR(-ECHILD);
 		/* we know that mountpoint was pinned */
@@ -1898,7 +1900,8 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	old = nd->path.dentry;
 	parent = old->d_parent;
 	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
+	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1909,14 +1912,13 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	*seqp = nd->seq;
+	nd->next_seq = nd->seq;
 	*inodep = nd->path.dentry->d_inode;
 	return nd->path.dentry;
 }
 
 static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+				 struct inode **inodep)
 {
 	struct dentry *parent;
 
@@ -1940,14 +1942,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
 	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	*seqp = 0;
 	*inodep = nd->path.dentry->d_inode;
 	return dget(nd->path.dentry);
 }
@@ -1958,7 +1958,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 		const char *error = NULL;
 		struct dentry *parent;
 		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1966,12 +1965,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd, &inode);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd, &inode);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		error = step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent, inode);
 		if (unlikely(error))
 			return error;
 
@@ -1996,7 +1995,6 @@ static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
 	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +2005,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd, &inode);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +2015,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry, inode);
 }
 
 /*
@@ -2372,6 +2370,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2473,9 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
+	nd->next_seq = nd->seq;
 	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+			nd->path.dentry, nd->inode));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,7 +3394,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
@@ -3410,7 +3410,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd, &inode);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3464,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry, inode);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/7] step_into(): lose inode argument
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
  2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
@ 2022-07-04 23:15                         ` Al Viro
  2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
                                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

make handle_mounts() always fetch it.  This is just the first step -
the callers of step_into() will stop trying to calculate the sucker,
etc.

The passed value should be equal to dentry->d_inode in all cases;
in RCU mode - fetched after we'd sampled ->d_seq.  Might as well
fetch it here.  We do need to validate ->d_seq, which duplicates
the check currently done in lookup_fast(); that duplication will
go away shortly.

After that change handle_mounts() always ignores the initial value of
*inode and always sets it on success.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c7c9e88add85..dddbebf92b48 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1532,6 +1532,11 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = nd->next_seq;
+		*inode = dentry->d_inode;
+		if (read_seqcount_retry(&dentry->d_seq, seq))
+			return -ECHILD;
+		if (unlikely(!*inode))
+			return -ENOENT;
 		if (likely(__follow_mount_rcu(nd, path, inode)))
 			return 0;
 		// *path and nd->next_seq might've been clobbered
@@ -1842,9 +1847,10 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode)
+		     struct dentry *dentry)
 {
 	struct path path;
+	struct inode *inode;
 	int err = handle_mounts(nd, dentry, &path, &inode);
 
 	if (err < 0)
@@ -1970,7 +1976,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
 			parent = follow_dotdot(nd, &inode);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		error = step_into(nd, WALK_NOFOLLOW, parent, inode);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -2015,7 +2021,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2474,8 +2480,7 @@ static int handle_lookup_down(struct nameidata *nd)
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
 	nd->next_seq = nd->seq;
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode));
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3464,7 +3469,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (2 preceding siblings ...)
  2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
@ 2022-07-04 23:15                         ` Al Viro
  2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
                                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

step_into() will fetch it, TYVM.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dddbebf92b48..fe95fe39634c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1880,8 +1880,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1905,7 +1904,6 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
 	nd->next_seq = read_seqcount_begin(&parent->d_seq);
 	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
@@ -1919,12 +1917,10 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	nd->next_seq = nd->seq;
-	*inodep = nd->path.dentry->d_inode;
 	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1948,13 +1944,11 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	*inodep = nd->path.dentry->d_inode;
 	return dget(nd->path.dentry);
 }
 
@@ -1963,7 +1957,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1971,9 +1964,9 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
 		error = step_into(nd, WALK_NOFOLLOW, parent);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/7] lookup_fast(): don't bother with inode
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (3 preceding siblings ...)
  2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
@ 2022-07-04 23:16                         ` Al Viro
  2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

Note that validation of ->d_seq after ->d_inode fetch is gone, along
with fetching of ->d_inode itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe95fe39634c..cdb61d09df79 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1617,8 +1617,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1636,22 +1635,11 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
 		status = d_revalidate(dentry, nd->flags);
@@ -1993,7 +1981,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2004,7 +1991,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -3392,7 +3379,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3408,7 +3394,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts()
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (4 preceding siblings ...)
  2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
@ 2022-07-04 23:17                         ` Al Viro
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
  6 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

... and lose messing with it in __follow_mount_rcu()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cdb61d09df79..f2c99e75b578 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1470,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1501,13 +1500,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
 				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
 				flags = dentry->d_flags;
 				// makes sure that non-RCU pathwalk could reach
 				// this state.
@@ -1523,7 +1515,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1532,12 +1524,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned int seq = nd->next_seq;
-		*inode = dentry->d_inode;
-		if (read_seqcount_retry(&dentry->d_seq, seq))
-			return -ECHILD;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode)))
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
 		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
@@ -1557,8 +1544,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
 	}
 	return ret;
 }
@@ -1839,15 +1824,21 @@ static const char *step_into(struct nameidata *nd, int flags,
 {
 	struct path path;
 	struct inode *inode;
-	int err = handle_mounts(nd, dentry, &path, &inode);
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
-		if (!(nd->flags & LOOKUP_RCU)) {
+		if (nd->flags & LOOKUP_RCU) {
+			if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+				return ERR_PTR(-ECHILD);
+			if (unlikely(!inode))
+				return ERR_PTR(-ENOENT);
+		} else {
 			dput(nd->path.dentry);
 			if (nd->path.mnt != path.mnt)
 				mntput(nd->path.mnt);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
                                           ` (5 preceding siblings ...)
  2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
@ 2022-07-04 23:19                         ` Al Viro
  2022-07-05  0:06                           ` Linus Torvalds
  6 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2022-07-04 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

	Just in case - this series is RFC only; it's very lightly
tested.  Might be carved into too many steps, at that.  Cumulative
delta follows; might or might not be more convenient for review.

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..f2c99e75b578 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -567,7 +567,7 @@ struct nameidata {
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
 	unsigned int	flags, state;
-	unsigned	seq, m_seq, r_seq;
+	unsigned	seq, next_seq, m_seq, r_seq;
 	int		last_type;
 	unsigned	depth;
 	int		total_link_count;
@@ -772,6 +772,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 		goto out;
 	if (unlikely(!legitimize_root(nd)))
 		goto out;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	BUG_ON(nd->inode != parent->d_inode);
 	return true;
@@ -780,6 +781,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 }
@@ -788,7 +790,6 @@ static bool try_to_unlazy(struct nameidata *nd)
  * try_to_unlazy_next - try to switch to ref-walk mode.
  * @nd: nameidata pathwalk data
  * @dentry: next dentry to step into
- * @seq: seq number to check @dentry against
  * Returns: true on success, false on failure
  *
  * Similar to try_to_unlazy(), but here we have the next dentry already
@@ -797,7 +798,7 @@ static bool try_to_unlazy(struct nameidata *nd)
  * Nothing should touch nameidata between try_to_unlazy_next() failure and
  * terminate_walk().
  */
-static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsigned seq)
+static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 {
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
@@ -818,7 +819,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!lockref_get_not_dead(&dentry->d_lockref)))
 		goto out;
-	if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
+	if (unlikely(read_seqcount_retry(&dentry->d_seq, nd->next_seq)))
 		goto out_dput;
 	/*
 	 * Sequence counts matched. Now make sure that the root is
@@ -826,6 +827,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 	 */
 	if (unlikely(!legitimize_root(nd)))
 		goto out_dput;
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return true;
 
@@ -834,9 +836,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi
 out1:
 	nd->path.dentry = NULL;
 out:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	return false;
 out_dput:
+	nd->seq = nd->next_seq = 0;
 	rcu_read_unlock();
 	dput(dentry);
 	return false;
@@ -1466,8 +1470,7 @@ EXPORT_SYMBOL(follow_down);
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
-			       struct inode **inode, unsigned *seqp)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
 	unsigned int flags = dentry->d_flags;
@@ -1496,15 +1499,12 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 				path->mnt = &mounted->mnt;
 				dentry = path->dentry = mounted->mnt.mnt_root;
 				nd->state |= ND_JUMPED;
-				*seqp = read_seqcount_begin(&dentry->d_seq);
-				*inode = dentry->d_inode;
-				/*
-				 * We don't need to re-check ->d_seq after this
-				 * ->d_inode read - there will be an RCU delay
-				 * between mount hash removal and ->mnt_root
-				 * becoming unpinned.
-				 */
+				nd->next_seq = read_seqcount_begin(&dentry->d_seq);
 				flags = dentry->d_flags;
+				// makes sure that non-RCU pathwalk could reach
+				// this state.
+				if (read_seqretry(&mount_lock, nd->m_seq))
+					return false;
 				continue;
 			}
 			if (read_seqretry(&mount_lock, nd->m_seq))
@@ -1515,8 +1515,7 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-			  struct path *path, struct inode **inode,
-			  unsigned int *seqp)
+			  struct path *path)
 {
 	bool jumped;
 	int ret;
@@ -1524,16 +1523,15 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned int seq = *seqp;
-		if (unlikely(!*inode))
-			return -ENOENT;
-		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+		unsigned int seq = nd->next_seq;
+		if (likely(__follow_mount_rcu(nd, path)))
 			return 0;
-		if (!try_to_unlazy_next(nd, dentry, seq))
-			return -ECHILD;
-		// *path might've been clobbered by __follow_mount_rcu()
+		// *path and nd->next_seq might've been clobbered
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
+		nd->next_seq = seq;
+		if (!try_to_unlazy_next(nd, dentry))
+			return -ECHILD;
 	}
 	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
 	if (jumped) {
@@ -1546,9 +1544,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		dput(path->dentry);
 		if (path->mnt != nd->path.mnt)
 			mntput(path->mnt);
-	} else {
-		*inode = d_backing_inode(path->dentry);
-		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
 	return ret;
 }
@@ -1607,9 +1602,7 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static struct dentry *lookup_fast(struct nameidata *nd,
-				  struct inode **inode,
-			          unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1620,37 +1613,24 @@ static struct dentry *lookup_fast(struct nameidata *nd,
 	 * going to fall back to non-racy lookup.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
 			if (!try_to_unlazy(nd))
 				return ERR_PTR(-ECHILD);
 			return NULL;
 		}
 
-		/*
-		 * This sequence count validates that the inode matches
-		 * the dentry name information from lookup.
-		 */
-		*inode = d_backing_inode(dentry);
-		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return ERR_PTR(-ECHILD);
-
-		/*
+	        /*
 		 * This sequence count validates that the parent had no
 		 * changes while we did the lookup of the dentry above.
-		 *
-		 * The memory barrier in read_seqcount_begin of child is
-		 *  enough, we can use __read_seqcount_retry here.
 		 */
-		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
+		if (unlikely(read_seqcount_retry(&parent->d_seq, nd->seq)))
 			return ERR_PTR(-ECHILD);
 
-		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
-		if (!try_to_unlazy_next(nd, dentry, seq))
+		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
@@ -1731,7 +1711,7 @@ static inline int may_lookup(struct user_namespace *mnt_userns,
 	return inode_permission(mnt_userns, nd->inode, MAY_EXEC);
 }
 
-static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+static int reserve_stack(struct nameidata *nd, struct path *link)
 {
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
@@ -1746,7 +1726,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 	if (nd->flags & LOOKUP_RCU) {
 		// we need to grab link before we do unlazy.  And we can't skip
 		// unlazy even if we fail to grab the link - cleanup needs it
-		bool grabbed_link = legitimize_path(nd, link, seq);
+		bool grabbed_link = legitimize_path(nd, link, nd->next_seq);
 
 		if (!try_to_unlazy(nd) || !grabbed_link)
 			return -ECHILD;
@@ -1760,11 +1740,11 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq, int flags)
+		     struct inode *inode, int flags)
 {
 	struct saved *last;
 	const char *res;
-	int error = reserve_stack(nd, link, seq);
+	int error = reserve_stack(nd, link);
 
 	if (unlikely(error)) {
 		if (!(nd->flags & LOOKUP_RCU))
@@ -1774,7 +1754,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	last->seq = seq;
+	last->seq = nd->next_seq;
 
 	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
@@ -1836,43 +1816,50 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
  * to do this check without having to look at inode->i_op,
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
+ *
+ * NOTE: dentry must be what nd->next_seq had been sampled from.
  */
 static const char *step_into(struct nameidata *nd, int flags,
-		     struct dentry *dentry, struct inode *inode, unsigned seq)
+		     struct dentry *dentry)
 {
 	struct path path;
-	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	struct inode *inode;
+	int err = handle_mounts(nd, dentry, &path);
 
 	if (err < 0)
 		return ERR_PTR(err);
+	inode = path.dentry->d_inode;
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
-		if (!(nd->flags & LOOKUP_RCU)) {
+		if (nd->flags & LOOKUP_RCU) {
+			if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
+				return ERR_PTR(-ECHILD);
+			if (unlikely(!inode))
+				return ERR_PTR(-ENOENT);
+		} else {
 			dput(nd->path.dentry);
 			if (nd->path.mnt != path.mnt)
 				mntput(nd->path.mnt);
 		}
 		nd->path = path;
 		nd->inode = inode;
-		nd->seq = seq;
+		nd->seq = nd->next_seq;
 		return NULL;
 	}
 	if (nd->flags & LOOKUP_RCU) {
 		/* make sure that d_is_symlink above matches inode */
-		if (read_seqcount_retry(&path.dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, nd->next_seq))
 			return ERR_PTR(-ECHILD);
 	} else {
 		if (path.mnt == nd->path.mnt)
 			mntget(path.mnt);
 	}
-	return pick_link(nd, &path, inode, seq, flags);
+	return pick_link(nd, &path, inode, flags);
 }
 
-static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
-					struct inode **inodep,
-					unsigned *seqp)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent, *old;
 
@@ -1889,14 +1876,15 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		nd->path = path;
 		nd->inode = path.dentry->d_inode;
 		nd->seq = seq;
+		// makes sure that non-RCU pathwalk could reach this state
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 			return ERR_PTR(-ECHILD);
 		/* we know that mountpoint was pinned */
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
-	*inodep = parent->d_inode;
-	*seqp = read_seqcount_begin(&parent->d_seq);
+	nd->next_seq = read_seqcount_begin(&parent->d_seq);
+	// makes sure that non-RCU pathwalk could reach this state
 	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 		return ERR_PTR(-ECHILD);
 	if (unlikely(!path_connected(nd->path.mnt, parent)))
@@ -1907,12 +1895,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
-	return NULL;
+	nd->next_seq = nd->seq;
+	return nd->path.dentry;
 }
 
-static struct dentry *follow_dotdot(struct nameidata *nd,
-				 struct inode **inodep,
-				 unsigned *seqp)
+static struct dentry *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
 
@@ -1936,15 +1923,12 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 		dput(parent);
 		return ERR_PTR(-ENOENT);
 	}
-	*seqp = 0;
-	*inodep = parent->d_inode;
 	return parent;
 
 in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
-	dget(nd->path.dentry);
-	return NULL;
+	return dget(nd->path.dentry);
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
@@ -1952,8 +1936,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
 		struct dentry *parent;
-		struct inode *inode;
-		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1961,17 +1943,12 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			parent = follow_dotdot_rcu(nd, &inode, &seq);
+			parent = follow_dotdot_rcu(nd);
 		else
-			parent = follow_dotdot(nd, &inode, &seq);
+			parent = follow_dotdot(nd);
 		if (IS_ERR(parent))
 			return ERR_CAST(parent);
-		if (unlikely(!parent))
-			error = step_into(nd, WALK_NOFOLLOW,
-					 nd->path.dentry, nd->inode, nd->seq);
-		else
-			error = step_into(nd, WALK_NOFOLLOW,
-					 parent, inode, seq);
+		error = step_into(nd, WALK_NOFOLLOW, parent);
 		if (unlikely(error))
 			return error;
 
@@ -1995,8 +1972,6 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-	struct inode *inode;
-	unsigned seq;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -2007,7 +1982,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd, &inode, &seq);
+	dentry = lookup_fast(nd);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -2017,7 +1992,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	return step_into(nd, flags, dentry);
 }
 
 /*
@@ -2372,6 +2347,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		flags &= ~LOOKUP_RCU;
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
+	else
+		nd->seq = nd->next_seq = 0;
 
 	nd->flags = flags;
 	nd->state |= ND_JUMPED;
@@ -2473,8 +2450,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq));
+	nd->next_seq = nd->seq;
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW, nd->path.dentry));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3393,8 +3370,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
 	bool got_write = false;
-	unsigned seq;
-	struct inode *inode;
 	struct dentry *dentry;
 	const char *res;
 
@@ -3410,7 +3385,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd, &inode, &seq);
+		dentry = lookup_fast(nd);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (likely(dentry))
@@ -3464,7 +3439,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry);
 	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 	return res;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
@ 2022-07-05  0:06                           ` Linus Torvalds
  2022-07-05  3:48                             ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2022-07-05  0:06 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 4, 2022 at 4:19 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -       unsigned        seq, m_seq, r_seq;
> +       unsigned        seq, next_seq, m_seq, r_seq;

So the main thing I react to here is how "next_seq" is in the "struct
nameidata", but then it always goes together with a "struct dentry"
that you end up having to pass separately (and that is *not* in that
"struct nameidata").

Now, saving the associated dentry (as "next_dentry") in the nd would
solve that, but ends up benign ugly since everything then wants to
look at the dentry anyway, so while it would solve the inconsistency,
it would be ugly.

I wonder if the solution might not be to create a new structure like

        struct rcu_dentry {
                struct dentry *dentry;
                unsigned seq;
        };

and in fact then we could make __d_lookup_rcu() return one of these
things (we already rely on that "returning a two-word structure is
efficient" elsewhere).

That would then make that "this dentry goes with this sequence number"
be a very clear thing, and I actually thjink that it would make
__d_lookup_rcu() have a cleaner calling convention too, ie we'd go
from

        dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);

rto

       dseq = __d_lookup_rcu(parent, &nd->last);

and it would even improve code generation because it now returns the
dentry and the sequence number in registers, instead of returning one
in a register and one in memory.

I did *not* look at how it would change some of the other places, but
I do like the notion of "keep the dentry and the sequence number that
goes with it together".

That "keep dentry as a local, keep the sequence number that goes with
it as a field in the 'nd'" really does seem an odd thing. So I'm
throwing the above out as a "maybe we could do this instead..".

Not a huge deal. That oddity or not, I think the patch series is an improvement.

I do have a minor gripe with this too:

> +       nd->seq = nd->next_seq = 0;

I'm not convinced "0" is a good value.

It's not supposed to match anything, but it *could* match a valid
sequence number. Wouldn't it be better to pick something that is
explicitly invalid and has the low bit set (ie 1 or -1).

We don't seem to have a SEQ_INVAL or anything like that, but it does
seem that if the intent is to make it clear it's not a real sequence
number any more at that point, then 0 isn't great.

But again, this is more of a stylistic detail thing than a real complaint.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged
  2022-07-05  0:06                           ` Linus Torvalds
@ 2022-07-05  3:48                             ` Al Viro
  0 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2022-07-05  3:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Potapenko, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev, Linux-MM, linux-arch,
	Linux Kernel Mailing List, Evgenii Stepanov, Nathan Chancellor,
	Nick Desaulniers, Segher Boessenkool, Vitaly Buka,
	linux-toolchains

On Mon, Jul 04, 2022 at 05:06:17PM -0700, Linus Torvalds wrote:

> I wonder if the solution might not be to create a new structure like
> 
>         struct rcu_dentry {
>                 struct dentry *dentry;
>                 unsigned seq;
>         };
> 
> and in fact then we could make __d_lookup_rcu() return one of these
> things (we already rely on that "returning a two-word structure is
> efficient" elsewhere).
>
> That would then make that "this dentry goes with this sequence number"
> be a very clear thing, and I actually thjink that it would make
> __d_lookup_rcu() have a cleaner calling convention too, ie we'd go
> from
> 
>         dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
> 
> rto
> 
>        dseq = __d_lookup_rcu(parent, &nd->last);
> 
> and it would even improve code generation because it now returns the
> dentry and the sequence number in registers, instead of returning one
> in a register and one in memory.
> 
> I did *not* look at how it would change some of the other places, but
> I do like the notion of "keep the dentry and the sequence number that
> goes with it together".
> 
> That "keep dentry as a local, keep the sequence number that goes with
> it as a field in the 'nd'" really does seem an odd thing. So I'm
> throwing the above out as a "maybe we could do this instead..".

I looked into that; turns out to be quite messy, unfortunately.  For one
thing, the distance between the places where we get the seq count and
the place where we consume it is large; worse, there's a bunch of paths
where we are in non-RCU mode converging to the same consumer and those
need a 0/1/-1/whatever paired with dentry.  Gets very clumsy...

There might be a clever way to deal with pairs cleanly, but I don't see it
at the moment.  I'll look into that some more, but...

BTW, how good gcc and clang are at figuring out that e.g.

static int foo(int n)
{
	if (likely(n >= 0))
		return 0;
	....
}

....
	if (foo(n))
		whatever();

should be treated as
	if (unlikely(foo(n)))
		whatever();

They certainly do it just fine if the damn thing is inlined (e.g.
all those unlikely(read_seqcount_retry(....)) can and should lose
unlikely), but do they manage that for non-inlined functions in
the same compilation unit?  Relatively recent gcc seems to...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into()
  2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
  2022-07-02 17:23   ` Linus Torvalds
@ 2022-08-08 16:37   ` Alexander Potapenko
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-08-08 16:37 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
	Andrey Konovalov, Andy Lutomirski, Arnd Bergmann,
	Borislav Petkov, Christoph Hellwig, Christoph Lameter,
	David Rientjes, Dmitry Vyukov, Eric Dumazet, Greg Kroah-Hartman,
	Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe,
	Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland,
	Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Thomas Gleixner, Vasily Gorbik,
	Vegard Nossum, Vlastimil Babka, kasan-dev,
	Linux Memory Management List, Linux-Arch, LKML, Evgenii Stepanov,
	Linus Torvalds, Nathan Chancellor, Nick Desaulniers,
	Segher Boessenkool, Vitaly Buka, linux-toolchains

On Fri, Jul 1, 2022 at 4:25 PM Alexander Potapenko <glider@google.com> wrote:
>
> Under certain circumstances initialization of `unsigned seq` and
> `struct inode *inode` passed into step_into() may be skipped.
> In particular, if the call to lookup_fast() in walk_component()
> returns NULL, and lookup_slow() returns a valid dentry, then the
> `seq` and `inode` will remain uninitialized until the call to
> step_into() (see [1] for more info).
>
> Right now step_into() does not use these uninitialized values,
> yet passing uninitialized values to functions is considered undefined
> behavior (see [2]). To fix that, we initialize `seq` and `inode` at
> definition.

Given that Al Viro has a patch series in flight to address the
problem, I am going to drop this patch from KMSAN v5 series.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-08-08 16:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220701142310.2188015-1-glider@google.com>
2022-07-01 14:23 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Alexander Potapenko
2022-07-02 17:23   ` Linus Torvalds
2022-07-03  3:59     ` Al Viro
2022-07-04  2:52     ` Al Viro
2022-07-04  8:20       ` Alexander Potapenko
2022-07-04 13:44         ` Al Viro
2022-07-04 13:55           ` Al Viro
2022-07-04 15:49           ` Alexander Potapenko
2022-07-04 16:03             ` Greg Kroah-Hartman
2022-07-04 16:33               ` Alexander Potapenko
2022-07-04 18:23             ` Segher Boessenkool
2022-07-04 16:00           ` Al Viro
2022-07-04 16:47             ` Alexander Potapenko
2022-07-04 17:36       ` Linus Torvalds
2022-07-04 19:02         ` Al Viro
2022-07-04 19:16           ` Linus Torvalds
2022-07-04 19:55             ` Al Viro
2022-07-04 20:24               ` Linus Torvalds
2022-07-04 20:46                 ` Al Viro
2022-07-04 20:51                   ` Linus Torvalds
2022-07-04 21:04                     ` Al Viro
2022-07-04 23:13                       ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
2022-07-04 23:14                         ` [PATCH 2/7] follow_dotdot{,_rcu}(): change calling conventions Al Viro
2022-07-04 23:14                         ` [PATCH 3/7] namei: stash the sampled ->d_seq into nameidata Al Viro
2022-07-04 23:15                         ` [PATCH 4/7] step_into(): lose inode argument Al Viro
2022-07-04 23:15                         ` [PATCH 5/7] follow_dotdot{,_rcu}(): don't bother with inode Al Viro
2022-07-04 23:16                         ` [PATCH 6/7] lookup_fast(): " Al Viro
2022-07-04 23:17                         ` [PATCH 7/7] step_into(): move fetching ->d_inode past handle_mounts() Al Viro
2022-07-04 23:19                         ` [PATCH 1/7] __follow_mount_rcu(): verify that mount_lock remains unchanged Al Viro
2022-07-05  0:06                           ` Linus Torvalds
2022-07-05  3:48                             ` Al Viro
2022-07-04 20:47                 ` [PATCH v4 43/45] namei: initialize parameters passed to step_into() Linus Torvalds
2022-08-08 16:37   ` Alexander Potapenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).