linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCHSET] sanitized pathwalk machinery (v2)
@ 2020-02-23  1:11 Al Viro
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                   ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

	This is a slightly extended repost of the patchset posted on
Jan 19.  Current branch is in vfs.git#work.do_last, the main
difference from the last time around being a bit of do_last()
untangling added in the end of series.  #work.openat2 is already
in mainline, which simplifies the series - now it's a straight
branch with no merges.

	Individual patches are in the followups.  Branch survives
the local testing (including ltp and xfstests).

	Review and testing would be _very_ welcome; it does a lot
of massage, so there had been a plenty of opportunities to fuck up
and fail to spot that.  The same goes for profiling - it doesn't
seem to slow the things down, but that needs to be verified.

part 1: follow_automount() cleanups and fixes.

	Quite a bit of that function had been about working around the
wrong calling conventions of finish_automount().  The problem is that
finish_automount() misuses the primitive intended for mount(2) and
friends, where we want to mount on top of the pile, even if something
has managed to add to that while we'd been trying to lock the namespace.
For automount that's not the right thing to do - there we want to discard
whatever it was going to attach and just cross into what got mounted
there in the meanwhile (most likely - the results of the same automount
triggered by somebody else).  Current mainline kinda-sorta manages to do
that, but it's unreliable and very convoluted.  Much simpler approach
is to stop using lock_mount() in finish_automount() and have it bail
out if something turns out to have been mounted on top where we wanted
to attach.  That allows to get rid of a lot of PITA in the caller.
Another simplification comes from not trying to cross into the results
of automount - simply ride through the next iteration of the loop and
let it move into overmount.

	Another thing in the same series is divorcing follow_automount()
from nameidata; that'll play later when we get to unifying follow_down()
with the guts of follow_managed().

	4 commits, the second one fixes a hard-to-hit race.  The first
is a prereq for it.

1/34	do_add_mount(): lift lock_mount/unlock_mount into callers
2/34	fix automount/automount race properly
3/34	follow_automount(): get rid of dead^Wstillborn code
4/34	follow_automount() doesn't need the entire nameidata

part 2: unifying mount traversals in pathwalk.

	Handling of mount traversal (follow_managed()) is currently called
in a bunch of places.  Each of them is shortly followed by a call of
step_into() or an open-coded equivalent thereof.  However, the locations
of those step_into() calls are far from preceding follow_managed();
moreover, that preceding call might happen on different paths that
converge to given step_into() call.  It's harder to analyse that it should
be (especially when it comes to liveness analysis) and it forces rather
ugly calling conventions on lookup_fast()/atomic_open()/lookup_open().
The series below massages the code to the point when the calls of
follow_managed() (and __follow_mount_rcu()) move into the beginning of
step_into().

5/34	make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
	gets EEXIST handling in do_last() past the step_into() call there.
6/34	handle_mounts(): start building a sane wrapper for follow_managed()
	rather than mangling follow_managed() itself (and creating conflicts
	with openat2 series), add a wrapper that will absorb the required
	interface changes.
7/34	atomic_open(): saner calling conventions (return dentry on success)
	struct path passed to it is pure out parameter; only dentry part
	ever varies, though - mnt is always nd->path.mnt.  Just return
	the dentry on success, and ERR_PTR(-E...) on failure.
8/34	lookup_open(): saner calling conventions (return dentry on success)
	propagate the same change one level up the call chain.
9/34	do_last(): collapse the call of path_to_nameidata()
	struct path filled in lookup_open() call is eventually given to
	handle_mounts(); the only use it has before that is path_to_nameidata()
	call in "->atomic_open() has actually opened it" case, and there
	path_to_nameidata() is an overkill - we are guaranteed to replace
	only nd->path.dentry.  So have the struct path filled only immediately
	prior to handle_mounts().
10/34	handle_mounts(): pass dentry in, turn path into a pure out argument
	now all callers of handle_mount() are directly preceded by filling
	struct path it gets.  path->mnt is nd->path.mnt in all cases, so we can
	pass just the dentry instead and fill path in handle_mount() itself.
	Some boilerplate gone, path is pure out argument of handle_mount()
	now.
11/34	lookup_fast(): consolidate the RCU success case
	massage to gather what will become an RCU case equivalent of
	handle_mounts(); basically, that's what we do if revalidate succeeds
	in RCU case of lookup_fast(), including unlazy and fallback to
	handle_mounts() if __follow_mount_rcu() says "it's too tricky".
12/34	teach handle_mounts() to handle RCU mode
	... and take that into handle_mount() itself.  The other caller of
	__follow_mount_rcu() is fine with the same fallback (it just didn't
	bother since it's in the very beginning of pathwalk), switched to
	handle_mount() as well.
13/34	lookup_fast(): take mount traversal into callers
	Now we are getting somewhere - both RCU and non-RCU success cases of
	lookup_fast() are ended with the same return handle_mounts(...);
	move that to the callers - there it will merge with the identical calls
	that had been on the paths where we had to do slow lookups.
	lookup_fast() returns dentry now.
14/34	new step_into() flag: WALK_NOFOLLOW
	use step_into() instead of open-coding it in handle_lookup_down().
	Add a flag for "don't follow symlinks regardless of LOOKUP_FOLLOW" for
	that (and eventually, I hope, for .. handling).
	Now *all* calls of handle_mounts() and step_into() are right next to
	each other.
15/34	fold handle_mounts() into step_into()
	... and we can move the call of handle_mounts() into step_into(),
	getting a slightly saner calling conventions out of that.
16/34	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
	another payoff from 14/17 - we can teach path_lookupat() to do
	what path_mountpointat() used to.  And kill the latter, along with
	its wrappers.
17/34	expand the only remaining call of path_lookup_conditional()
	minor cleanup - RIP path_lookup_conditional().  Only one caller left.

Changes so far:
	* mount traversal is taken into step_into().
	* lookup_fast(), atomic_open() and lookup_open() calling conventions
are slightly changed.  All of them return dentry now, instead of returning
an int and filling struct path on success.  For lookup_fast() the old
"0 for cache miss, 1 for cache hit" is replaced with "NULL stands for cache
miss, dentry - for hit".
	* step_into() can be called in RCU mode as well.  Takes nameidata,
WALK_... flags, dentry and, in RCU case, corresponding inode and seq value.
Handles mount traversals, decides whether it's a symlink to be followed.
Error => returns -E...; symlink to follow => returns 1, puts symlink on stack;
non-symlink or symlink not to follow => returns 0, moves nd->path to new location.
	* LOOKUP_MOUNTPOINT introduced; user_path_mountpoint_at() and friends
became calls of user_path_at() et.al. with LOOKUP_MOUNTPOINT in flags.

part 3: untangling the symlink handling.

	Right now when we decide to follow a symlink it happens this way:
	* step_into() decides that it has been given a symlink that needs to
be followed.
	* it calls pick_link(), which pushes the symlink on stack and
returns 1 on success / -E... on error.  Symlink's mount/dentry/seq is
stored on stack and the inode is stashed in nd->link_inode.
	* step_into() passes that 1 to its callers, which proceed to pass it
up the call chain for several layers.  In all cases we get to get_link()
call shortly afterwards.
	* get_link() is called, picks the inode stashed in nd->link_inode
by the pick_link(), does some checks, touches the atime, etc.
	* get_link() either picks the link body out of inode or calls
->get_link().  If it's an absolute symlink, we move to the root and return
the relative portion of the body; if it's a relative one - just return the
body.  If it's a procfs-style one, the call of nd_jump_link() has been
made and we'd moved to whatever location is desired.  And return NULL,
same as we do for symlink to "/".
	* the caller proceeds to deal with the string returned to it.

	The sequence is the same in all cases (nested symlink, trailing
symlink on lookup, trailing symlink on open), but its pieces are not close
to each other and the bit between the call of pick_link() and (inevitable)
call of get_link() afterwards is not easy to follow.  Moreover, a bunch
of functions (walk_component/lookup_last/do_last) ends up with the same
conventions for return values as step_into().  And those conventions
(see above) are not pretty - 0/1/-E... is asking for mistakes, especially
when returned 1 is used only to direct control flow on a rather twisted
way to matching get_link() call.  And that path can be seriously twisted.
E.g. when we are trying to open /dev/stdin, we get the following sequence:
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  Into the stack it goes and we return 1.
	* do_last() sees 1 and returns it.
	* trailing_symlink() is called (in the top-level loop) and it
calls get_link().  OK, we get "/proc/self/fd/0" for body, move to
root again and return "proc/self/fd/0".
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes and we return 1.
	* again, do_last() passes 1 to caller
	* trailing_symlink() is called and calls get_link().
	* this time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  But the
fun doesn't stop yet.
	* trailing_symlink() returns "" to the caller
	* link_path_walk() is called on that and does nothing
whatsoever.
	* do_last() is called and sees LAST_BIND left by the get_link().
It calls handle_dots()
	* handle_dots() drops the symlink from stack and returns
	* do_last() *FINALLY* proceeds to the point after its call of
step_into() (finish_open:) and gets around to opening the damn thing.

	Making sense of the control flow through all of that is not fun,
to put it mildly; debugging anything in that area can be a massive PITA,
and this example has touched only one of 3 cases.  Arguably, the worst
one, but...  Anyway, it turns out that this code can be massaged to
considerably saner shape - both in terms of control flow and wrt calling
conventions.

18/34	merging pick_link() with get_link(), part 1
	prep work: move the "hardening" crap from trailing_symlink() into
get_link() (conditional on the absense of LOOKUP_PARENT in nd->flags).
We'll be moving the calls of get_link() around quite a bit through that
series, and the next step will be to eliminate trailing_symlink().
19/34	merging pick_link() with get_link(), part 2
	fold trailing_symlink() into lookup_last() and do_last().
Now these are returning strings; it's not the final calling conventions,
but it's almost there.  NULL => old 0, we are done.  ERR_PTR(-E...) =>
old -E..., we'd failed.  string => old 1, and the string is the symlink
body to follow.  Just as for trailing_symlink(), "/" and procfs ones
(where get_link() returns NULL) yield "", so the ugly song and dance
with no-op trip through link_path_walk()/handle_dots() still remains.
20/34	merging pick_link() with get_link(), part 3
	elimination of that round-trip.  In *all* cases having
get_link() return NULL on such symlinks means that we'll proceed to
drop the symlink from stack and get back to the point near that
get_link() call - basically, where we would be if it hadn't been
a symlink at all.  The path by which we are getting there depends
upon the call site; the end result is the same in all cases - such
symlinks (procfs ones and symlink to "/") are fully processed by
the time get_link() returns, so we could as well drop them from the
stack right in get_link().  Makes life simpler in terms of control
flow analysis...
	And now the calling conventions for do_last() and lookup_last()
have reached the final shape - ERR_PTR(-E...) for error, NULL for
"we are done", string for "traverse this".
21/34	merging pick_link() with get_link(), part 4
	now all calls of walk_component() are followed by the same
boilerplate - "if it has returned 1, call get_link() and if that
has returned NULL treat that as if walk_component() has returned 0".
Eliminate by folding that into walk_component() itself.  Now
walk_component() return value conventions have joined those of
do_last()/lookup_last().
22/34	merging pick_link() with get_link(), part 5
	same as for the previous, only this time the boilerplate
migrates one level down, into step_into().  Only one caller of
get_link() left, step_into() has joined the same return value
conventions.
23/34	merging pick_link() with get_link(), part 6
	move that thing into pick_link().  Now all traces of
"return 1 if we are following a symlink" are gone.
24/34	finally fold get_link() into pick_link()
	ta-da - expand get_link() into the only caller.  As a side
benefit, we get rid of stashing the inode in nd->link_inode - it
was done only to carry that piece of information from pick_link()
to eventual get_link().  That's not the main benefit, though - the
control flow became considerably easier to reason about.

For what it's worth, the example above (/dev/stdin) becomes
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  On the stack it goes and we get its body.  Which is
"/proc/self/fd/0", so we move to root and return "proc/self/fd/0".
	* do_last() sees non-NULL and returns it - whether it's an error
or a pathname to traverse, we hadn't reached something we'll be opening.
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes.   This time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  So we
drop symlink from stack and return that NULL to caller.
	* that NULL is returned by step_into(), same as if we had just
moved to a non-symlink.
	* do_last() proceeds to open the damn thing.

part 4.  some mount traversal cleanups.

25/34	massage __follow_mount_rcu() a bit
	make it more similar to non-RCU counterpart
26/34	new helper: traverse_mounts()
	the guts of follow_managed() are very similar to
follow_down().  The calling conventions are different (follow_managed()
works with nameidata, follow_down() - with standalone struct path),
but the core loop is pretty much the same in both.  Turned that loop
into a common helper (traverse_mounts()) and since follow_managed()
becomes a very thin wrapper around it, expand follow_managed() at its
only call site (in handle_mounts()),

part 5.  do_last() untangling.

Control flow in do_last() is an atrocity, and liveness analysis in there
is rather painful.  What follows is a massage of that thing into (hopefully)
more straightforward shape; by the end of the series it's still unpleasant,
but at least easier to follow.

A major source of headache is treatment of "we'd already managed to
open it in ->atomic_open()" and "we'd just created that sucker" cases -
that's what gives complicated control flow graph.  As it is, we
have the following horror:
        |
 /------*				ends with . or ..?
 |      |
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      |   |           |
 |      *--------------------->		is it a symlink?
 |      |   |           |
 \------+   |           |		finish_open:
        #   |           |
        +--/            |		finish_open_created:
        #               |
        +---------------/		opened:
        #
To make it even more unpleasant, there is quite a bit of similar,
but not entirely identical logics on parallel branches, some of it
buried in lookup_open() *and* atomic_open() called by it.  Keeping
track of that has been hard and that had lead to more than one bug.

27/34	atomic_open(): return the right dentry in FMODE_OPENED case
	As it is, several invariants do not hold in "we'd already opened
it in ->atomic_open()" case.  In particular, nd->path.dentry might be
pointing to the wrong place by the time we return to do_last() - on
that codepath we don't care anymore.  That both makes it harder to reason
about and serves as an obstacle to transformations that would untangle
that mess.  Fortunately, it's not hard to regularize.
28/34	atomic_open(): lift the call of may_open() into do_last()
	may_open() is called before vfs_open() in "hadn't opened in
->atomic_open()" case.  Rightfully so, since vfs_open() for e.g.
devices can have side effects.  In "opened in ->atomic_open()" case
we have to do it after the actual opening - the whole point is to
combine open with lookup and we only get the information needed for
may_open() after the combined lookup/open has happened.  That's
OK - no side effects are possible in that case.  However, we don't
have to keep that call of may_open() inside fs/namei.c:atomic_open();
as the matter of fact, lifting it into do_last() allows to simplify
life there...
29/34	do_last(): merge the may_open() calls
	... since now we have the "it's already opened" case in
do_last() rejoin the main path at earlier point.

At that point the horror graph from above has become
        |
 /------*				ends with . or ..?
 |      |
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      |   |           |
 |      *--------------------->		is it a symlink?
 |      |   |           |
 \------+   |           |		finish_open:
        #   |           |
        +--/------------/		finish_open_created:
        |
30/34	do_last(): don't bother with keeping got_write in FMODE_OPENED case
	Another source of unpleasantness is an attempt to be clever and
keep track of write access status; the thing is, it doesn't really buy
us anything - we could as well drop it right after the lookup_open() and
only regain it for truncation, should such be needed.  Makes for much
simpler cleanups on failures and sets the things up for unification of
"already opened" and "new file" branches with the main path...
31/34	do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
	... which we do here.
32/34	do_last(): simplify the liveness analysis past finish_open_created
	It also makes possible to shrink the liveness intervals for local
variables.
33/34	do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
	Further unification of parallel branches.

At that point we get
        |
 /------*				ends with . or ..?
 |      |
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---\				opened by ->atomic_open() or freshly creatd?	
 |  |   |   | 
 |  \---+   |				finish_lookup:
 |      |   |           
 |      *--------------------->		is it a symlink?
 |      |   |           
 \------+   |				finish_open:
        |   |
        +--/				finish_open_created:
        |
with very little work done between finish_open: and finish_open_created:,
as well as on any of the side branches.  Moreover, we have a pretty clear
separation: most of the work on _opening_ is after finish_open_created
(some of it - conditional), while the work on lookups and creation is all
before that point.  Even better, most of the local variables are used
either only before or only after that cutoff point.

34/34	split the lookup-related parts of do_last() into a separate helper
	... which allows to separate the lookup-related parts from
open-related ones.

	I'm not saying I'm entirely happy with the resulting state of
do_last() clusterfuck, but it got a lot easier to follow and reason
about.  There are more cleanups possible (and needed) in there, though -
there will be followups.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers
  2020-02-23  1:11 [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
@ 2020-02-23  1:15 ` Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 02/34] fix automount/automount race properly Al Viro
                     ` (32 more replies)
  2020-02-25  1:24 ` [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
  2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
  2 siblings, 33 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

preparation to finish_automount() fix (next commit)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 85b5f7bea82e..dcd015fafe01 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2697,45 +2697,32 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
+			struct path *path, int mnt_flags)
 {
-	struct mountpoint *mp;
-	struct mount *parent;
-	int err;
+	struct mount *parent = real_mount(path->mnt);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
-	mp = lock_mount(path);
-	if (IS_ERR(mp))
-		return PTR_ERR(mp);
-
-	parent = real_mount(path->mnt);
-	err = -EINVAL;
 	if (unlikely(!check_mnt(parent))) {
 		/* that's acceptable only for automounts done in private ns */
 		if (!(mnt_flags & MNT_SHRINKABLE))
-			goto unlock;
+			return -EINVAL;
 		/* ... and for those we'd better have mountpoint still alive */
 		if (!parent->mnt_ns)
-			goto unlock;
+			return -EINVAL;
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	err = -EBUSY;
 	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
 	    path->mnt->mnt_root == path->dentry)
-		goto unlock;
+		return -EBUSY;
 
-	err = -EINVAL;
 	if (d_is_symlink(newmnt->mnt.mnt_root))
-		goto unlock;
+		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	err = graft_tree(newmnt, parent, mp);
-
-unlock:
-	unlock_mount(mp);
-	return err;
+	return graft_tree(newmnt, parent, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -2748,6 +2735,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct vfsmount *mnt;
+	struct mountpoint *mp;
 	struct super_block *sb = fc->root->d_sb;
 	int error;
 
@@ -2768,7 +2756,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
+	mp = lock_mount(mountpoint);
+	if (IS_ERR(mp)) {
+		mntput(mnt);
+		return PTR_ERR(mp);
+	}
+	error = do_add_mount(real_mount(mnt), mp, mountpoint, mnt_flags);
+	unlock_mount(mp);
 	if (error < 0)
 		mntput(mnt);
 	return error;
@@ -2830,6 +2824,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct mount *mnt = real_mount(m);
+	struct mountpoint *mp;
 	int err;
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
@@ -2842,7 +2837,13 @@ int finish_automount(struct vfsmount *m, struct path *path)
 		goto fail;
 	}
 
-	err = do_add_mount(mnt, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	mp = lock_mount(path);
+	if (IS_ERR(mp)) {
+		err = PTR_ERR(mp);
+		goto fail;
+	}
+	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	unlock_mount(mp);
 	if (!err)
 		return 0;
 fail:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 02/34] fix automount/automount race properly
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  2:07     ` Linus Torvalds
  2020-02-23  1:15   ` [RFC][PATCH v2 03/34] follow_automount(): get rid of dead^Wstillborn code Al Viro
                     ` (31 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Protection against automount/automount races (two threads hitting the same
referral point at the same time) is based upon do_add_mount() prevention of
identical overmounts - trying to overmount the root of mounted tree with
the same tree fails with -EBUSY.  It's unreliable (the other thread might've
mounted something on top of the automount it has triggered) *and* causes
no end of headache for follow_automount() and its caller, since
finish_automount() behaves like do_new_mount() - if the mountpoint to be is
overmounted, it mounts on top what's overmounting it.  It's not only wrong
(we want to go into what's overmounting the automount point and quietly
discard what we planned to mount there), it introduces the possibility of
original parent mount getting dropped.  That's what 8aef18845266 (VFS: Fix
vfsmount overput on simultaneous automount) deals with, but it can't do
anything about the reliability of conflict detection - if something had
been overmounted the other thread's automount (e.g. that other thread
having stepped into automount in mount(2)), we don't get that -EBUSY and
the result is
	 referral point under automounted NFS under explicit overmount
under another copy of automounted NFS

What we need is finish_automount() *NOT* digging into overmounts - if it
finds one, it should just quietly discard the thing it was asked to mount.
And don't bother with actually crossing into the results of finish_automount() -
the same loop that calls follow_automount() will do that just fine on the
next iteration.

IOW, instead of calling lock_mount() have finish_automount() do it manually,
_without_ the "move into overmount and retry" part.  And leave crossing into
the results to the caller of follow_automount(), which simplifies it a lot.

Moral: if you end up with a lot of glue working around the calling conventions
of something, perhaps these calling conventions are simply wrong...

Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 29 ++++-------------------------
 fs/namespace.c | 41 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db6565c99825..626eddb33508 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,11 +1208,9 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd,
-			    bool *need_mntput)
+static int follow_automount(struct path *path, struct nameidata *nd)
 {
 	struct vfsmount *mnt;
-	int err;
 
 	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
 		return -EREMOTE;
@@ -1253,29 +1251,10 @@ static int follow_automount(struct path *path, struct nameidata *nd,
 		return PTR_ERR(mnt);
 	}
 
-	if (!mnt) /* mount collision */
-		return 0;
-
-	if (!*need_mntput) {
-		/* lock_mount() may release path->mnt on error */
-		mntget(path->mnt);
-		*need_mntput = true;
-	}
-	err = finish_automount(mnt, path);
-
-	switch (err) {
-	case -EBUSY:
-		/* Someone else made a mount here whilst we were busy */
+	if (!mnt)
 		return 0;
-	case 0:
-		path_put(path);
-		path->mnt = mnt;
-		path->dentry = dget(mnt->mnt_root);
-		return 0;
-	default:
-		return err;
-	}
 
+	return finish_automount(mnt, path);
 }
 
 /*
@@ -1333,7 +1312,7 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd, &need_mntput);
+			ret = follow_automount(path, nd);
 			if (ret < 0)
 				break;
 			continue;
diff --git a/fs/namespace.c b/fs/namespace.c
index dcd015fafe01..6228fd1ef94f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2823,6 +2823,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 
 int finish_automount(struct vfsmount *m, struct path *path)
 {
+	struct dentry *dentry = path->dentry;
 	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
 	int err;
@@ -2832,21 +2833,47 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	BUG_ON(mnt_get_count(mnt) < 2);
 
 	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == path->dentry) {
+	    m->mnt_root == dentry) {
 		err = -ELOOP;
-		goto fail;
+		goto discard;
 	}
 
-	mp = lock_mount(path);
+	/*
+	 * we don't want to use lock_mount() - in this case finding something
+	 * that overmounts our mountpoint to be means "quitely drop what we've
+	 * got", not "try to mount it on top".
+	 */
+	inode_lock(dentry->d_inode);
+	if (unlikely(cant_mount(dentry))) {
+		err = -ENOENT;
+		goto discard1;
+	}
+	namespace_lock();
+	rcu_read_lock();
+	if (unlikely(__lookup_mnt(path->mnt, dentry))) {
+		rcu_read_unlock();
+		err = 0;
+		goto discard2;
+	}
+	rcu_read_unlock();
+	mp = get_mountpoint(dentry);
 	if (IS_ERR(mp)) {
 		err = PTR_ERR(mp);
-		goto fail;
+		goto discard2;
 	}
+
 	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(mp);
-	if (!err)
-		return 0;
-fail:
+	if (unlikely(err))
+		goto discard;
+	mntput(m);
+	return 0;
+
+discard2:
+	namespace_unlock();
+discard1:
+	inode_unlock(dentry->d_inode);
+discard:
 	/* remove m from any expiration list it may be on */
 	if (!list_empty(&mnt->mnt_expire)) {
 		namespace_lock();
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 03/34] follow_automount(): get rid of dead^Wstillborn code
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 02/34] fix automount/automount race properly Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 04/34] follow_automount() doesn't need the entire nameidata Al Viro
                     ` (30 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) no instances of ->d_automount() have ever made use of the "return
ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's
a rudiment of plans that got superseded before the thing went into
the tree.  Despite the comment in follow_automount(), autofs has
never done that.

2) if there's no ->d_automount() in dentry_operations, filesystems
should not set DCACHE_NEED_AUTOMOUNT in the first place.  None have
ever done so...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 28 +++-------------------------
 fs/namespace.c |  9 ++++++++-
 2 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 626eddb33508..39dd56f5171f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1210,10 +1210,7 @@ EXPORT_SYMBOL(follow_up);
  */
 static int follow_automount(struct path *path, struct nameidata *nd)
 {
-	struct vfsmount *mnt;
-
-	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
-		return -EREMOTE;
+	struct dentry *dentry = path->dentry;
 
 	/* We don't want to mount if someone's just doing a stat -
 	 * unless they're stat'ing a directory and appended a '/' to
@@ -1228,33 +1225,14 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 */
 	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-	    path->dentry->d_inode)
+	    dentry->d_inode)
 		return -EISDIR;
 
 	nd->total_link_count++;
 	if (nd->total_link_count >= 40)
 		return -ELOOP;
 
-	mnt = path->dentry->d_op->d_automount(path);
-	if (IS_ERR(mnt)) {
-		/*
-		 * The filesystem is allowed to return -EISDIR here to indicate
-		 * it doesn't want to automount.  For instance, autofs would do
-		 * this so that its userspace daemon can mount on this dentry.
-		 *
-		 * However, we can only permit this if it's a terminal point in
-		 * the path being looked up; if it wasn't then the remainder of
-		 * the path is inaccessible and we should say so.
-		 */
-		if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
-			return -EREMOTE;
-		return PTR_ERR(mnt);
-	}
-
-	if (!mnt)
-		return 0;
-
-	return finish_automount(mnt, path);
+	return finish_automount(dentry->d_op->d_automount(path), path);
 }
 
 /*
diff --git a/fs/namespace.c b/fs/namespace.c
index 6228fd1ef94f..a9e556224945 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2824,9 +2824,16 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
-	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
+	struct mount *mnt;
 	int err;
+
+	if (!m)
+		return 0;
+	if (IS_ERR(m))
+		return PTR_ERR(m);
+
+	mnt = real_mount(m);
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
 	 */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 04/34] follow_automount() doesn't need the entire nameidata
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 02/34] fix automount/automount race properly Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 03/34] follow_automount(): get rid of dead^Wstillborn code Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 05/34] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
                     ` (29 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Only the address of ->total_link_count and the flags.
And fix an off-by-one is ELOOP detection - make it
consistent with symlink following, where we check if
the pre-increment value has reached 40, rather than
check the post-increment one.

[kudos to Christian Brauner for spotted braino]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 39dd56f5171f..6721c5f7e9d5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,7 +1208,7 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd)
+static int follow_automount(struct path *path, int *count, unsigned lookup_flags)
 {
 	struct dentry *dentry = path->dentry;
 
@@ -1223,13 +1223,12 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 * as being automount points.  These will need the attentions
 	 * of the daemon to instantiate them before they can be used.
 	 */
-	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+	if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
 	    dentry->d_inode)
 		return -EISDIR;
 
-	nd->total_link_count++;
-	if (nd->total_link_count >= 40)
+	if (count && (*count)++ >= MAXSYMLINKS)
 		return -ELOOP;
 
 	return finish_automount(dentry->d_op->d_automount(path), path);
@@ -1290,7 +1289,8 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd);
+			ret = follow_automount(path, &nd->total_link_count,
+						nd->flags);
 			if (ret < 0)
 				break;
 			continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 05/34] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (2 preceding siblings ...)
  2020-02-23  1:15   ` [RFC][PATCH v2 04/34] follow_automount() doesn't need the entire nameidata Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 06/34] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
                     ` (28 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink".
As it is, we might or might not have LOOKUP_FOLLOW in op->intent
in that case - that depends upon having O_NOFOLLOW in open flags.
It doesn't matter, since we won't be checking it in that case -
do_last() bails out earlier.

However, making sure it's not set (i.e. acting as if we had an explicit
O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the
check for O_CREAT | O_EXCL in do_last() with the call of step_into()
immediately following it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 +++++----------
 fs/open.c  |  4 +++-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6721c5f7e9d5..6938d20aa73a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3396,22 +3396,17 @@ static int do_last(struct nameidata *nd,
 	if (unlikely(error < 0))
 		return error;
 
-	/*
-	 * create/update audit record if it already exists.
-	 */
-	audit_inode(nd->name, path.dentry, 0);
-
-	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-		path_to_nameidata(&path, nd);
-		return -EEXIST;
-	}
-
 	seq = 0;	/* out of RCU mode, so the value doesn't matter */
 	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
+
+	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
+		audit_inode(nd->name, nd->path.dentry, 0);
+		return -EEXIST;
+	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..e5227cd533f4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1049,8 +1049,10 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 
 	if (flags & O_CREAT) {
 		op->intent |= LOOKUP_CREATE;
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			op->intent |= LOOKUP_EXCL;
+			flags |= O_NOFOLLOW;
+		}
 	}
 
 	if (flags & O_DIRECTORY)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 06/34] handle_mounts(): start building a sane wrapper for follow_managed()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (3 preceding siblings ...)
  2020-02-23  1:15   ` [RFC][PATCH v2 05/34] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  1:15   ` [RFC][PATCH v2 07/34] atomic_open(): saner calling conventions (return dentry on success) Al Viro
                     ` (27 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers of follow_managed() follow it on success with the same steps -
d_backing_inode(path->dentry) is calculated and stored into some struct inode *
variable and, in all but one case, an unsigned variable (nd->seq to be) is
zeroed.  The single exception is lookup_fast() and there zeroing is correct
thing to do - not doing it is a pointless microoptimization.

	Add a wrapper for follow_managed() that would do that combination.
It's mostly a vehicle for code massage - it will be changing quite a bit,
and the current calling conventions are by no means final.  Right now it
takes path, nameidata and (as out params) inode and seq, similar to
__follow_mount_rcu().  Which will soon get folded into it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6938d20aa73a..c104ec75faef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,6 +1385,18 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
+static inline int handle_mounts(struct path *path, struct nameidata *nd,
+			  struct inode **inode, unsigned int *seqp)
+{
+	int ret = follow_managed(path, nd);
+
+	if (likely(ret >= 0)) {
+		*inode = d_backing_inode(path->dentry);
+		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
+	}
+	return ret;
+}
+
 static int follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct inode *inode = nd->inode;
@@ -1607,7 +1619,6 @@ static int lookup_fast(struct nameidata *nd,
 	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
-	int err;
 
 	/*
 	 * Rename seqlock is not required here because in the off chance
@@ -1677,10 +1688,7 @@ static int lookup_fast(struct nameidata *nd,
 
 	path->mnt = mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd);
-	if (likely(err > 0))
-		*inode = d_backing_inode(path->dentry);
-	return err;
+	return handle_mounts(path, nd, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1875,12 +1883,9 @@ static int walk_component(struct nameidata *nd, int flags)
 			return PTR_ERR(path.dentry);
 
 		path.mnt = nd->path.mnt;
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-
-		seq = 0;	/* we are already out of RCU mode */
-		inode = d_backing_inode(path.dentry);
 	}
 
 	return step_into(nd, &path, flags, inode, seq);
@@ -2365,11 +2370,9 @@ static int handle_lookup_down(struct nameidata *nd)
 			return -ECHILD;
 	} else {
 		dget(path.dentry);
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-		inode = d_backing_inode(path.dentry);
-		seq = 0;
 	}
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
@@ -3392,12 +3395,9 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	error = follow_managed(&path, nd);
+	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-
-	seq = 0;	/* out of RCU mode, so the value doesn't matter */
-	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 07/34] atomic_open(): saner calling conventions (return dentry on success)
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (4 preceding siblings ...)
  2020-02-23  1:15   ` [RFC][PATCH v2 06/34] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
@ 2020-02-23  1:15   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 08/34] lookup_open(): " Al Viro
                     ` (26 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Currently it either returns -E... or puts (nd->path.mnt,dentry)
into *path and returns 0.  Make it return ERR_PTR(-E...) or
dentry; adjust the caller.  Fewer arguments and it's easier
to keep track of *path contents that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c104ec75faef..5f8b791a6d6e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3087,10 +3087,10 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
  *
  * Returns an error code otherwise.
  */
-static int atomic_open(struct nameidata *nd, struct dentry *dentry,
-			struct path *path, struct file *file,
-			const struct open_flags *op,
-			int open_flag, umode_t mode)
+static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
+				  struct file *file,
+				  const struct open_flags *op,
+				  int open_flag, umode_t mode)
 {
 	struct dentry *const DENTRY_NOT_SET = (void *) -1UL;
 	struct inode *dir =  nd->path.dentry->d_inode;
@@ -3131,17 +3131,15 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
 			}
 			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-			if (unlikely(d_is_negative(dentry))) {
+			if (unlikely(d_is_negative(dentry)))
 				error = -ENOENT;
-			} else {
-				path->dentry = dentry;
-				path->mnt = nd->path.mnt;
-				return 0;
-			}
 		}
 	}
-	dput(dentry);
-	return error;
+	if (error) {
+		dput(dentry);
+		dentry = ERR_PTR(error);
+	}
+	return dentry;
 }
 
 /*
@@ -3236,11 +3234,20 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 
 	if (dir_inode->i_op->atomic_open) {
-		error = atomic_open(nd, dentry, path, file, op, open_flag,
-				    mode);
-		if (unlikely(error == -ENOENT) && create_error)
-			error = create_error;
-		return error;
+		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
+		if (IS_ERR(dentry)) {
+			error = PTR_ERR(dentry);
+			if (unlikely(error == -ENOENT) && create_error)
+				error = create_error;
+			return error;
+		}
+		if (file->f_mode & FMODE_OPENED) {
+			dput(dentry);
+			return 0;
+		}
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+		return 0;
 	}
 
 no_open:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 08/34] lookup_open(): saner calling conventions (return dentry on success)
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (5 preceding siblings ...)
  2020-02-23  1:15   ` [RFC][PATCH v2 07/34] atomic_open(): saner calling conventions (return dentry on success) Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 09/34] do_last(): collapse the call of path_to_nameidata() Al Viro
                     ` (25 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

same story as for atomic_open() in the previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5f8b791a6d6e..4946d006ba20 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3157,10 +3157,9 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
  *
  * An error code is returned on failure.
  */
-static int lookup_open(struct nameidata *nd, struct path *path,
-			struct file *file,
-			const struct open_flags *op,
-			bool got_write)
+static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
+				  const struct open_flags *op,
+				  bool got_write)
 {
 	struct dentry *dir = nd->path.dentry;
 	struct inode *dir_inode = dir->d_inode;
@@ -3171,7 +3170,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
-		return -ENOENT;
+		return ERR_PTR(-ENOENT);
 
 	file->f_mode &= ~FMODE_CREATED;
 	dentry = d_lookup(dir, &nd->last);
@@ -3179,7 +3178,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		if (!dentry) {
 			dentry = d_alloc_parallel(dir, &nd->last, &wq);
 			if (IS_ERR(dentry))
-				return PTR_ERR(dentry);
+				return dentry;
 		}
 		if (d_in_lookup(dentry))
 			break;
@@ -3195,7 +3194,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 	if (dentry->d_inode) {
 		/* Cached positive dentry: will open in f_op->open */
-		goto out_no_open;
+		return dentry;
 	}
 
 	/*
@@ -3236,18 +3235,10 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	if (dir_inode->i_op->atomic_open) {
 		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
 		if (IS_ERR(dentry)) {
-			error = PTR_ERR(dentry);
-			if (unlikely(error == -ENOENT) && create_error)
-				error = create_error;
-			return error;
+			if (dentry == ERR_PTR(-ENOENT) && create_error)
+				dentry = ERR_PTR(create_error);
 		}
-		if (file->f_mode & FMODE_OPENED) {
-			dput(dentry);
-			return 0;
-		}
-		path->mnt = nd->path.mnt;
-		path->dentry = dentry;
-		return 0;
+		return dentry;
 	}
 
 no_open:
@@ -3283,14 +3274,11 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		error = create_error;
 		goto out_dput;
 	}
-out_no_open:
-	path->dentry = dentry;
-	path->mnt = nd->path.mnt;
-	return 0;
+	return dentry;
 
 out_dput:
 	dput(dentry);
-	return error;
+	return ERR_PTR(error);
 }
 
 /*
@@ -3309,6 +3297,7 @@ static int do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct path path;
+	struct dentry *dentry;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3365,14 +3354,18 @@ static int do_last(struct nameidata *nd,
 		inode_lock(dir->d_inode);
 	else
 		inode_lock_shared(dir->d_inode);
-	error = lookup_open(nd, &path, file, op, got_write);
+	dentry = lookup_open(nd, file, op, got_write);
 	if (open_flag & O_CREAT)
 		inode_unlock(dir->d_inode);
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (error)
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
 		goto out;
+	}
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3380,6 +3373,7 @@ static int do_last(struct nameidata *nd,
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
+		dput(dentry);
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 09/34] do_last(): collapse the call of path_to_nameidata()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (6 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 08/34] lookup_open(): " Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 10/34] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
                     ` (24 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and shift filling struct path to just before the call of
handle_mounts().  All callers of handle_mounts() are
immediately preceded by path->mnt = nd->path.mnt now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4946d006ba20..f26af0559abf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3364,8 +3364,6 @@ static int do_last(struct nameidata *nd,
 		error = PTR_ERR(dentry);
 		goto out;
 	}
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3382,7 +3380,8 @@ static int do_last(struct nameidata *nd,
 		open_flag &= ~O_TRUNC;
 		will_truncate = false;
 		acc_mode = 0;
-		path_to_nameidata(&path, nd);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
 		goto finish_open_created;
 	}
 
@@ -3396,6 +3395,8 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 10/34] handle_mounts(): pass dentry in, turn path into a pure out argument
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (7 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 09/34] do_last(): collapse the call of path_to_nameidata() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 11/34] lookup_fast(): consolidate the RCU success case Al Viro
                     ` (23 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers are equivalent to
	path->dentry = dentry;
	path->mnt = nd->path.mnt;
	err = handle_mounts(path, ...)
Pass dentry as an explicit argument, fill *path in handle_mounts()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f26af0559abf..033f91a72bb5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,11 +1385,15 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
-static inline int handle_mounts(struct path *path, struct nameidata *nd,
-			  struct inode **inode, unsigned int *seqp)
+static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
+			  struct path *path, struct inode **inode,
+			  unsigned int *seqp)
 {
-	int ret = follow_managed(path, nd);
+	int ret;
 
+	path->mnt = nd->path.mnt;
+	path->dentry = dentry;
+	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
@@ -1685,10 +1689,7 @@ static int lookup_fast(struct nameidata *nd,
 		dput(dentry);
 		return status;
 	}
-
-	path->mnt = mnt;
-	path->dentry = dentry;
-	return handle_mounts(path, nd, inode, seqp);
+	return handle_mounts(nd, dentry, path, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1859,6 +1860,7 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 static int walk_component(struct nameidata *nd, int flags)
 {
 	struct path path;
+	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
 	int err;
@@ -1877,13 +1879,11 @@ static int walk_component(struct nameidata *nd, int flags)
 	if (unlikely(err <= 0)) {
 		if (err < 0)
 			return err;
-		path.dentry = lookup_slow(&nd->last, nd->path.dentry,
-					  nd->flags);
-		if (IS_ERR(path.dentry))
-			return PTR_ERR(path.dentry);
+		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
 
-		path.mnt = nd->path.mnt;
-		err = handle_mounts(&path, nd, &inode, &seq);
+		err = handle_mounts(nd, dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -2355,7 +2355,7 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path = nd->path;
+	struct path path;
 	struct inode *inode = nd->inode;
 	unsigned seq = nd->seq;
 	int err;
@@ -2366,11 +2366,12 @@ static int handle_lookup_down(struct nameidata *nd)
 		 * at the very beginning of walk, so we lose nothing
 		 * if we simply redo everything in non-RCU mode
 		 */
+		path = nd->path;
 		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
 			return -ECHILD;
 	} else {
-		dget(path.dentry);
-		err = handle_mounts(&path, nd, &inode, &seq);
+		dget(nd->path.dentry);
+		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -3395,9 +3396,7 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
-	error = handle_mounts(&path, nd, &inode, &seq);
+	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
 finish_lookup:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 11/34] lookup_fast(): consolidate the RCU success case
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (8 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 10/34] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 12/34] teach handle_mounts() to handle RCU mode Al Viro
                     ` (22 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
to call unlazy_child() and, should it succeed, handle_mounts().
Note that we have status > 0 (or we wouldn't be calling
__follow_mount_rcu() at all), so all stuff conditional upon
non-positive status won't be even touched.

Consolidate just that sequence after the call of __follow_mount_rcu().

2) calling d_is_negative() and keeping its result is pointless -
we either don't get past checking ->d_seq (and don't use the results of
d_is_negative() at all), or we are guaranteed that ->d_inode and
type bits of ->d_flags had been consistent at the time of d_is_negative()
call.  IOW, we could only get to the use of its result if it's
equal to !inode.  The same ->d_seq check guarantees that after that point
this CPU won't observe ->d_flags values older than ->d_inode update.
So 'negative' variable is completely pointless these days.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 033f91a72bb5..53e859b80b4c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1631,7 +1631,6 @@ static int lookup_fast(struct nameidata *nd,
 	 */
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned seq;
-		bool negative;
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
@@ -1644,7 +1643,6 @@ static int lookup_fast(struct nameidata *nd,
 		 * the dentry name information from lookup.
 		 */
 		*inode = d_backing_inode(dentry);
-		negative = d_is_negative(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
 			return -ECHILD;
 
@@ -1665,12 +1663,15 @@ static int lookup_fast(struct nameidata *nd,
 			 * Note: do negative dentry check after revalidation in
 			 * case that drops it.
 			 */
-			if (unlikely(negative))
+			if (unlikely(!inode))
 				return -ENOENT;
 			path->mnt = mnt;
 			path->dentry = dentry;
 			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
 				return 1;
+			if (unlazy_child(nd, dentry, seq))
+				return -ECHILD;
+			return handle_mounts(nd, dentry, path, inode, seqp);
 		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 12/34] teach handle_mounts() to handle RCU mode
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (9 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 11/34] lookup_fast(): consolidate the RCU success case Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 13/34] lookup_fast(): take mount traversal into callers Al Viro
                     ` (21 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and make the callers of __follow_mount_rcu() use handle_mounts().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 53e859b80b4c..3215b0da6e91 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1393,6 +1393,18 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
+	if (nd->flags & LOOKUP_RCU) {
+		unsigned int seq = *seqp;
+		if (unlikely(!*inode))
+			return -ENOENT;
+		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+			return 1;
+		if (unlazy_child(nd, dentry, seq))
+			return -ECHILD;
+		// *path might've been clobbered by __follow_mount_rcu()
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
 	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
@@ -1620,7 +1632,6 @@ static int lookup_fast(struct nameidata *nd,
 		       struct path *path, struct inode **inode,
 		       unsigned *seqp)
 {
-	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
 
@@ -1658,21 +1669,8 @@ static int lookup_fast(struct nameidata *nd,
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
-		if (likely(status > 0)) {
-			/*
-			 * Note: do negative dentry check after revalidation in
-			 * case that drops it.
-			 */
-			if (unlikely(!inode))
-				return -ENOENT;
-			path->mnt = mnt;
-			path->dentry = dentry;
-			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-				return 1;
-			if (unlazy_child(nd, dentry, seq))
-				return -ECHILD;
+		if (likely(status > 0))
 			return handle_mounts(nd, dentry, path, inode, seqp);
-		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		if (unlikely(status == -ECHILD))
@@ -2361,21 +2359,11 @@ static int handle_lookup_down(struct nameidata *nd)
 	unsigned seq = nd->seq;
 	int err;
 
-	if (nd->flags & LOOKUP_RCU) {
-		/*
-		 * don't bother with unlazy_walk on failure - we are
-		 * at the very beginning of walk, so we lose nothing
-		 * if we simply redo everything in non-RCU mode
-		 */
-		path = nd->path;
-		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
-			return -ECHILD;
-	} else {
+	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
-	}
+	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+		return err;
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 13/34] lookup_fast(): take mount traversal into callers
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (10 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 12/34] teach handle_mounts() to handle RCU mode Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW Al Viro
                     ` (20 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Current calling conventions: -E... on error, 0 on cache miss,
result of handle_mounts(nd, dentry, path, inode, seqp) on
success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
resp.; deal with handle_mounts() in the callers.  The thing
is, they already do that in cache miss handling case, so we
just need to supply dentry to them and unify the mount traversal
in those cases.  Fewer arguments that way, and we get closer
to merging handle_mounts() and step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3215b0da6e91..4c9b633e1981 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1628,9 +1628,9 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static int lookup_fast(struct nameidata *nd,
-		       struct path *path, struct inode **inode,
-		       unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd,
+				  struct inode **inode,
+			          unsigned *seqp)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1645,8 +1645,8 @@ static int lookup_fast(struct nameidata *nd,
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
-				return -ECHILD;
-			return 0;
+				return ERR_PTR(-ECHILD);
+			return NULL;
 		}
 
 		/*
@@ -1655,7 +1655,7 @@ static int lookup_fast(struct nameidata *nd,
 		 */
 		*inode = d_backing_inode(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		/*
 		 * This sequence count validates that the parent had no
@@ -1665,30 +1665,30 @@ static int lookup_fast(struct nameidata *nd,
 		 *  enough, we can use __read_seqcount_retry here.
 		 */
 		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
-			return handle_mounts(nd, dentry, path, inode, seqp);
+			return dentry;
 		if (unlazy_child(nd, dentry, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		if (unlikely(status == -ECHILD))
 			/* we'd been told to redo it in non-rcu mode */
 			status = d_revalidate(dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
-			return 0;
+			return NULL;
 		status = d_revalidate(dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
 			d_invalidate(dentry);
 		dput(dentry);
-		return status;
+		return ERR_PTR(status);
 	}
-	return handle_mounts(nd, dentry, path, inode, seqp);
+	return dentry;
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1874,19 +1874,18 @@ static int walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return err;
 	}
-	err = lookup_fast(nd, &path, &inode, &seq);
-	if (unlikely(err <= 0)) {
-		if (err < 0)
-			return err;
+	dentry = lookup_fast(nd, &inode, &seq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
-
-		err = handle_mounts(nd, dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
 	}
 
+	err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+			return err;
 	return step_into(nd, &path, flags, inode, seq);
 }
 
@@ -3304,13 +3303,12 @@ static int do_last(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		error = lookup_fast(nd, &path, &inode, &seq);
-		if (likely(error > 0))
+		dentry = lookup_fast(nd, &inode, &seq);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+		if (likely(dentry))
 			goto finish_lookup;
 
-		if (error < 0)
-			return error;
-
 		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
@@ -3385,10 +3383,10 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+finish_lookup:
 	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (11 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 13/34] lookup_fast(): take mount traversal into callers Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  2:14     ` Linus Torvalds
  2020-02-23  1:16   ` [RFC][PATCH v2 15/34] fold handle_mounts() into step_into() Al Viro
                     ` (19 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
Allows to switch handle_lookup_down() to of step_into(), getting
all follow_managed() and step_into() calls paired.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4c9b633e1981..fe48a8d00ae5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1827,7 +1827,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	return 1;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2};
+enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1841,7 +1841,8 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path->dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
+	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
+	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(path, nd);
 		nd->inode = inode;
@@ -2363,10 +2364,7 @@ static int handle_lookup_down(struct nameidata *nd)
 	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 	if (unlikely(err < 0))
 		return err;
-	path_to_nameidata(&path, nd);
-	nd->inode = inode;
-	nd->seq = seq;
-	return 0;
+	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 15/34] fold handle_mounts() into step_into()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (12 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 16/34] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
                     ` (18 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The following is true:
	* calls of handle_mounts() and step_into() are always
paired in sequences like
	err = handle_mounts(nd, dentry, &path, &inode, &seq);
	if (unlikely(err < 0))
		return err;
	err = step_into(nd, &path, flags, inode, seq);
	* in all such sequences path is uninitialized before and
unused after this pair of calls
	* in all such sequences inode and seq are unused afterwards.

So the call of handle_mounts() can be shifted inside step_into(),
turning 'path' into a local variable in the combined function.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe48a8d00ae5..28835ee7168a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1835,31 +1835,35 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static inline int step_into(struct nameidata *nd, struct path *path,
-			    int flags, struct inode *inode, unsigned seq)
+static int step_into(struct nameidata *nd, int flags,
+		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
+	struct path path;
+	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+
+	if (err < 0)
+		return err;
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	if (likely(!d_is_symlink(path->dentry)) ||
+	if (likely(!d_is_symlink(path.dentry)) ||
 	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
 	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
-		path_to_nameidata(path, nd);
+		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
 		return 0;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
-		if (read_seqcount_retry(&path->dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return -ECHILD;
 	}
-	return pick_link(nd, path, inode, seq);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static int walk_component(struct nameidata *nd, int flags)
 {
-	struct path path;
 	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
@@ -1883,11 +1887,7 @@ static int walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
-
-	err = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-			return err;
-	return step_into(nd, &path, flags, inode, seq);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2354,17 +2354,10 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path;
-	struct inode *inode = nd->inode;
-	unsigned seq = nd->seq;
-	int err;
-
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-		return err;
-	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
+	return step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3283,7 +3276,6 @@ static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path path;
 	struct dentry *dentry;
 	int error;
 
@@ -3382,10 +3374,7 @@ static int do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	error = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(error < 0))
-		return error;
-	error = step_into(nd, &path, 0, inode, seq);
+	error = step_into(nd, 0, dentry, inode, seq);
 	if (unlikely(error))
 		return error;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 16/34] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (13 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 15/34] fold handle_mounts() into step_into() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 17/34] expand the only remaining call of path_lookup_conditional() Al Viro
                     ` (17 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs/dev-ioctl.c |  6 ++--
 fs/internal.h         |  1 -
 fs/namei.c            | 89 ++++-----------------------------------------------
 fs/namespace.c        |  4 +--
 include/linux/namei.h |  2 +-
 5 files changed, 12 insertions(+), 90 deletions(-)

diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
index a3cdb0036c5d..f3a0f412b43b 100644
--- a/fs/autofs/dev-ioctl.c
+++ b/fs/autofs/dev-ioctl.c
@@ -186,7 +186,7 @@ static int find_autofs_mount(const char *pathname,
 	struct path path;
 	int err;
 
-	err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0);
+	err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path);
 	if (err)
 		return err;
 	err = -ENOENT;
@@ -519,8 +519,8 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
 
 	if (!fp || param->ioctlfd == -1) {
 		if (autofs_type_any(type))
-			err = kern_path_mountpoint(AT_FDCWD,
-						   name, &path, LOOKUP_FOLLOW);
+			err = kern_path(name, LOOKUP_FOLLOW | LOOKUP_MOUNTPOINT,
+					&path);
 		else
 			err = find_autofs_mount(name, &path,
 						test_by_type, &type);
diff --git a/fs/internal.h b/fs/internal.h
index f3f280b952a3..b108a8eb75ca 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -61,7 +61,6 @@ extern int finish_clean_context(struct fs_context *fc);
  */
 extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
 			   struct path *path, struct path *root);
-extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
 long do_mknodat(int dfd, const char __user *filename, umode_t mode,
diff --git a/fs/namei.c b/fs/namei.c
index 28835ee7168a..6f1f46b931a6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2382,6 +2382,10 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 	if (!err && nd->flags & LOOKUP_DIRECTORY)
 		if (!d_can_lookup(nd->path.dentry))
 			err = -ENOTDIR;
+	if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
+		err = handle_lookup_down(nd);
+		nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
+	}
 	if (!err) {
 		*path = nd->path;
 		nd->path.mnt = NULL;
@@ -2410,7 +2414,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 		retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);
 
 	if (likely(!retval))
-		audit_inode(name, path->dentry, 0);
+		audit_inode(name, path->dentry,
+			    flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
 	restore_nameidata();
 	putname(name);
 	return retval;
@@ -2688,88 +2693,6 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
 }
 EXPORT_SYMBOL(user_path_at_empty);
 
-/**
- * path_mountpoint - look up a path to be umounted
- * @nd:		lookup context
- * @flags:	lookup flags
- * @path:	pointer to container for result
- *
- * Look up the given name, but don't attempt to revalidate the last component.
- * Returns 0 and "path" will be valid on success; Returns error otherwise.
- */
-static int
-path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
-{
-	const char *s = path_init(nd, flags);
-	int err;
-
-	while (!(err = link_path_walk(s, nd)) &&
-		(err = lookup_last(nd)) > 0) {
-		s = trailing_symlink(nd);
-	}
-	if (!err && (nd->flags & LOOKUP_RCU))
-		err = unlazy_walk(nd);
-	if (!err)
-		err = handle_lookup_down(nd);
-	if (!err) {
-		*path = nd->path;
-		nd->path.mnt = NULL;
-		nd->path.dentry = NULL;
-	}
-	terminate_walk(nd);
-	return err;
-}
-
-static int
-filename_mountpoint(int dfd, struct filename *name, struct path *path,
-			unsigned int flags)
-{
-	struct nameidata nd;
-	int error;
-	if (IS_ERR(name))
-		return PTR_ERR(name);
-	set_nameidata(&nd, dfd, name);
-	error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
-	if (unlikely(error == -ECHILD))
-		error = path_mountpoint(&nd, flags, path);
-	if (unlikely(error == -ESTALE))
-		error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
-	if (likely(!error))
-		audit_inode(name, path->dentry, AUDIT_INODE_NOEVAL);
-	restore_nameidata();
-	putname(name);
-	return error;
-}
-
-/**
- * user_path_mountpoint_at - lookup a path from userland in order to umount it
- * @dfd:	directory file descriptor
- * @name:	pathname from userland
- * @flags:	lookup flags
- * @path:	pointer to container to hold result
- *
- * A umount is a special case for path walking. We're not actually interested
- * in the inode in this situation, and ESTALE errors can be a problem. We
- * simply want track down the dentry and vfsmount attached at the mountpoint
- * and avoid revalidating the last component.
- *
- * Returns 0 and populates "path" on success.
- */
-int
-user_path_mountpoint_at(int dfd, const char __user *name, unsigned int flags,
-			struct path *path)
-{
-	return filename_mountpoint(dfd, getname(name), path, flags);
-}
-
-int
-kern_path_mountpoint(int dfd, const char *name, struct path *path,
-			unsigned int flags)
-{
-	return filename_mountpoint(dfd, getname_kernel(name), path, flags);
-}
-EXPORT_SYMBOL(kern_path_mountpoint);
-
 int __check_sticky(struct inode *dir, struct inode *inode)
 {
 	kuid_t fsuid = current_fsuid();
diff --git a/fs/namespace.c b/fs/namespace.c
index a9e556224945..ef3f2a33992c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1669,7 +1669,7 @@ int ksys_umount(char __user *name, int flags)
 	struct path path;
 	struct mount *mnt;
 	int retval;
-	int lookup_flags = 0;
+	int lookup_flags = LOOKUP_MOUNTPOINT;
 
 	if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
 		return -EINVAL;
@@ -1680,7 +1680,7 @@ int ksys_umount(char __user *name, int flags)
 	if (!(flags & UMOUNT_NOFOLLOW))
 		lookup_flags |= LOOKUP_FOLLOW;
 
-	retval = user_path_mountpoint_at(AT_FDCWD, name, lookup_flags, &path);
+	retval = user_path_at(AT_FDCWD, name, lookup_flags, &path);
 	if (retval)
 		goto out;
 	mnt = real_mount(path.mnt);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 0dd980d7318f..d9576a051808 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -23,6 +23,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_AUTOMOUNT	0x0004  /* force terminal automount */
 #define LOOKUP_EMPTY		0x4000	/* accept empty path [user_... only] */
 #define LOOKUP_DOWN		0x8000	/* follow mounts in the starting point */
+#define LOOKUP_MOUNTPOINT	0x0080	/* follow mounts in the end */
 
 #define LOOKUP_REVAL		0x0020	/* tell ->d_revalidate() to trust no cache */
 #define LOOKUP_RCU		0x0040	/* RCU pathwalk mode; semi-internal */
@@ -64,7 +65,6 @@ extern struct dentry *kern_path_create(int, const char *, struct path *, unsigne
 extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
 extern void done_path_create(struct path *, struct dentry *);
 extern struct dentry *kern_path_locked(const char *, struct path *);
-extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int);
 
 extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
 extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 17/34] expand the only remaining call of path_lookup_conditional()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (14 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 16/34] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 18/34] merging pick_link() with get_link(), part 1 Al Viro
                     ` (16 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6f1f46b931a6..3eed5784942a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -858,13 +858,6 @@ static int set_root(struct nameidata *nd)
 	return 0;
 }
 
-static void path_put_conditional(struct path *path, struct nameidata *nd)
-{
-	dput(path->dentry);
-	if (path->mnt != nd->path.mnt)
-		mntput(path->mnt);
-}
-
 static inline void path_to_nameidata(const struct path *path,
 					struct nameidata *nd)
 {
@@ -1312,8 +1305,11 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 		ret = 1;
 	if (ret > 0 && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0))
-		path_put_conditional(path, nd);
+	if (unlikely(ret < 0)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	}
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 18/34] merging pick_link() with get_link(), part 1
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (15 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 17/34] expand the only remaining call of path_lookup_conditional() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 19/34] merging pick_link() with get_link(), part 2 Al Viro
                     ` (15 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
the call of get_link() (nothing _currently_ uses them in there).
That allows to moved the call of may_follow_link() into get_link()
as well, since now the presence of LOOKUP_PARENT distinguishes
the callers from each other (link_path_walk() has it, trailing_symlink()
doesn't).

Preparations for folding trailing_symlink() into callers (lookup_last()
and do_last()) and changing the calling conventions of those.  Next
stage after that will have get_link() call migrate into walk_component(),
then - into step_into().  It's tricky enough to warrant doing that
in stages, unfortunately...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3eed5784942a..3594f6a4998b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1115,6 +1115,12 @@ const char *get_link(struct nameidata *nd)
 	int error;
 	const char *res;
 
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
 	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
 		return ERR_PTR(-ELOOP);
 
@@ -2329,13 +2335,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static const char *trailing_symlink(struct nameidata *nd)
 {
-	const char *s;
-	int error = may_follow_link(nd);
-	if (unlikely(error))
-		return ERR_PTR(error);
+	const char *s = get_link(nd);
 	nd->flags |= LOOKUP_PARENT;
 	nd->stack[0].name = NULL;
-	s = get_link(nd);
 	return s ? s : "";
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 19/34] merging pick_link() with get_link(), part 2
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (16 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 18/34] merging pick_link() with get_link(), part 1 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 20/34] merging pick_link() with get_link(), part 3 Al Viro
                     ` (14 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Fold trailing_symlink() into lookup_last() and do_last(), change
the calling conventions of those two.  Rules change:
	success, we are done => NULL instead of 0
	error	=> ERR_PTR(-E...) instead of -E...
	got a symlink to follow => return the path to be followed instead of 1

The loops calling those (in path_lookupat() and path_openat()) adjusted.

A subtle change of control flow here: originally a pure-jump trailing
symlink ("/" or procfs one) would've passed through the upper level
loop once more, with "" for path to traverse.  That would've brought
us back to the lookup_last/do_last entry and we would've hit LAST_BIND
case (LAST_BIND left from get_link() called by trailing_symlink())
and pretty much skip to the point right after where we'd left the
sucker back when we picked that trailing symlink.

Now we don't bother with that extra pass through the upper level
loop - if get_link() says "I've just done a pure jump, nothing
else to do", we just treat that as non-symlink case.

Boilerplate added on that step will go away shortly - it'll migrate
into walk_component() and then to step_into(), collapsing into the
change of calling conventions for those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 68 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3594f6a4998b..888b1e5b994e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2333,21 +2333,26 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	return s;
 }
 
-static const char *trailing_symlink(struct nameidata *nd)
-{
-	const char *s = get_link(nd);
-	nd->flags |= LOOKUP_PARENT;
-	nd->stack[0].name = NULL;
-	return s ? s : "";
-}
-
-static inline int lookup_last(struct nameidata *nd)
+static inline const char *lookup_last(struct nameidata *nd)
 {
+	int err;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	return walk_component(nd, 0);
+	err = walk_component(nd, 0);
+	if (unlikely(err)) {
+		const char *s;
+		if (err < 0)
+			return PTR_ERR(err);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
+	return NULL;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
@@ -2370,10 +2375,9 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 			s = ERR_PTR(err);
 	}
 
-	while (!(err = link_path_walk(s, nd))
-		&& ((err = lookup_last(nd)) > 0)) {
-		s = trailing_symlink(nd);
-	}
+	while (!(err = link_path_walk(s, nd)) &&
+	       (s = lookup_last(nd)) != NULL)
+		;
 	if (!err)
 		err = complete_walk(nd);
 
@@ -3185,7 +3189,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 /*
  * Handle the last step of open()
  */
-static int do_last(struct nameidata *nd,
+static const char *do_last(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
@@ -3206,7 +3210,7 @@ static int do_last(struct nameidata *nd,
 	if (nd->last_type != LAST_NORM) {
 		error = handle_dots(nd, nd->last_type);
 		if (unlikely(error))
-			return error;
+			return ERR_PTR(error);
 		goto finish_open;
 	}
 
@@ -3216,7 +3220,7 @@ static int do_last(struct nameidata *nd,
 		/* we _can_ be in RCU mode here */
 		dentry = lookup_fast(nd, &inode, &seq);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 		if (likely(dentry))
 			goto finish_lookup;
 
@@ -3231,12 +3235,12 @@ static int do_last(struct nameidata *nd,
 		 */
 		error = complete_walk(nd);
 		if (error)
-			return error;
+			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))
-			return -EISDIR;
+			return ERR_PTR(-EISDIR);
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
@@ -3296,18 +3300,28 @@ static int do_last(struct nameidata *nd,
 
 finish_lookup:
 	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error))
-		return error;
+	if (unlikely(error)) {
+		const char *s;
+		if (error < 0)
+			return ERR_PTR(error);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
 		audit_inode(nd->name, nd->path.dentry, 0);
-		return -EEXIST;
+		return ERR_PTR(-EEXIST);
 	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
 	if (error)
-		return error;
+		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		error = -EISDIR;
@@ -3349,7 +3363,7 @@ static int do_last(struct nameidata *nd,
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	return error;
+	return ERR_PTR(error);
 }
 
 struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
@@ -3452,10 +3466,8 @@ static struct file *path_openat(struct nameidata *nd,
 	} else {
 		const char *s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
-			(error = do_last(nd, file, op)) > 0) {
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			s = trailing_symlink(nd);
-		}
+		       (s = do_last(nd, file, op)) != NULL)
+			;
 		terminate_walk(nd);
 	}
 	if (likely(!error)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 20/34] merging pick_link() with get_link(), part 3
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (17 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 19/34] merging pick_link() with get_link(), part 2 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4 Al Viro
                     ` (13 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 888b1e5b994e..46cd3e5cb052 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1154,7 +1154,9 @@ const char *get_link(struct nameidata *nd)
 		} else {
 			res = get(dentry, inode, &last->done);
 		}
-		if (IS_ERR_OR_NULL(res))
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
 			return res;
 	}
 	if (*res == '/') {
@@ -1164,9 +1166,11 @@ const char *get_link(struct nameidata *nd)
 		while (unlikely(*++res == '/'))
 			;
 	}
-	if (!*res)
-		res = NULL;
-	return res;
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 /*
@@ -2211,11 +2215,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 			if (IS_ERR(s))
 				return PTR_ERR(s);
-			err = 0;
-			if (unlikely(!s)) {
-				/* jumped */
-				put_link(nd);
-			} else {
+			if (likely(s)) {
 				nd->stack[nd->depth - 1].name = name;
 				name = s;
 				continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (18 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 20/34] merging pick_link() with get_link(), part 3 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  2:19     ` Linus Torvalds
  2020-02-23  1:16   ` [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5 Al Viro
                     ` (12 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move the call of get_link() into walk_component().  Change the
calling conventions for walk_component() to returning the link
body to follow (if any).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 60 +++++++++++++++++++++++++++---------------------------------
 1 file changed, 27 insertions(+), 33 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 46cd3e5cb052..09e9f9969fd3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1868,7 +1868,7 @@ static int step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq);
 }
 
-static int walk_component(struct nameidata *nd, int flags)
+static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1883,17 +1883,23 @@ static int walk_component(struct nameidata *nd, int flags)
 		err = handle_dots(nd, nd->last_type);
 		if (!(flags & WALK_MORE) && nd->depth)
 			put_link(nd);
-		return err;
+		return ERR_PTR(err);
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
 	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
+		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 	}
-	return step_into(nd, flags, dentry, inode, seq);
+	err = step_into(nd, flags, dentry, inode, seq);
+	if (!err)
+		return NULL;
+	else if (err > 0)
+		return get_link(nd);
+	else
+		return ERR_PTR(err);
 }
 
 /*
@@ -2145,6 +2151,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
+		const char *link;
 		u64 hash_len;
 		int type;
 
@@ -2202,24 +2209,18 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			err = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, WALK_FOLLOW);
 		} else {
 			/* not the last component */
-			err = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
 		}
-		if (err < 0)
-			return err;
-
-		if (err) {
-			const char *s = get_link(nd);
-
-			if (IS_ERR(s))
-				return PTR_ERR(s);
-			if (likely(s)) {
-				nd->stack[nd->depth - 1].name = name;
-				name = s;
-				continue;
-			}
+		if (unlikely(link)) {
+			if (IS_ERR(link))
+				return PTR_ERR(link);
+			/* a symlink to follow */
+			nd->stack[nd->depth - 1].name = name;
+			name = link;
+			continue;
 		}
 		if (unlikely(!d_can_lookup(nd->path.dentry))) {
 			if (nd->flags & LOOKUP_RCU) {
@@ -2335,24 +2336,17 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static inline const char *lookup_last(struct nameidata *nd)
 {
-	int err;
+	const char *link;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	err = walk_component(nd, 0);
-	if (unlikely(err)) {
-		const char *s;
-		if (err < 0)
-			return PTR_ERR(err);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	link = walk_component(nd, 0);
+	if (link) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->stack[0].name = NULL;
 	}
-	return NULL;
+	return link;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (19 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  2:22     ` Linus Torvalds
  2020-02-23  1:16   ` [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6 Al Viro
                     ` (11 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move get_link() call into step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 45 +++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 09e9f9969fd3..1e8548a547ff 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1841,14 +1841,14 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static int step_into(struct nameidata *nd, int flags,
+static const char *step_into(struct nameidata *nd, int flags,
 		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
 	struct path path;
 	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
 
 	if (err < 0)
-		return err;
+		return ERR_PTR(err);
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path.dentry)) ||
@@ -1858,14 +1858,18 @@ static int step_into(struct nameidata *nd, int flags,
 		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
-		return 0;
+		return NULL;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 	}
-	return pick_link(nd, &path, inode, seq);
+	err = pick_link(nd, &path, inode, seq);
+	if (err > 0)
+		return get_link(nd);
+	else
+		return ERR_PTR(err);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
@@ -1893,13 +1897,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 	}
-	err = step_into(nd, flags, dentry, inode, seq);
-	if (!err)
-		return NULL;
-	else if (err > 0)
-		return get_link(nd);
-	else
-		return ERR_PTR(err);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2353,8 +2351,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq);
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3196,6 +3194,7 @@ static const char *do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
+	const char *link;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3293,18 +3292,12 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error)) {
-		const char *s;
-		if (error < 0)
-			return ERR_PTR(error);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	link = step_into(nd, 0, dentry, inode, seq);
+	if (unlikely(link)) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+		nd->stack[0].name = NULL;
+		return link;
 	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (20 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  2:24     ` Linus Torvalds
  2020-02-23  1:16   ` [RFC][PATCH v2 24/34] finally fold get_link() into pick_link() Al Viro
                     ` (10 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move the only remaining call of get_link() into pick_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1e8548a547ff..fef2c447219d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1793,14 +1793,14 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static int pick_link(struct nameidata *nd, struct path *link,
+static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
 	int error;
 	struct saved *last;
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
-		return -ELOOP;
+		return ERR_PTR(-ELOOP);
 	}
 	if (!(nd->flags & LOOKUP_RCU)) {
 		if (link->mnt == nd->path.mnt)
@@ -1821,7 +1821,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 		}
 		if (error) {
 			path_put(link);
-			return error;
+			return ERR_PTR(error);
 		}
 	}
 
@@ -1830,7 +1830,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	clear_delayed_call(&last->done);
 	nd->link_inode = inode;
 	last->seq = seq;
-	return 1;
+	return get_link(nd);
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
@@ -1865,11 +1865,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
 	}
-	err = pick_link(nd, &path, inode, seq);
-	if (err > 0)
-		return get_link(nd);
-	else
-		return ERR_PTR(err);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 24/34] finally fold get_link() into pick_link()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (21 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6 Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  2:31     ` Linus Torvalds
  2020-02-23  1:16   ` [RFC][PATCH v2 25/34] massage __follow_mount_rcu() a bit Al Viro
                     ` (9 subsequent siblings)
  32 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

kill nd->link_inode, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 135 ++++++++++++++++++++++++++++---------------------------------
 1 file changed, 61 insertions(+), 74 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fef2c447219d..9b6c3e3edd75 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -503,7 +503,6 @@ struct nameidata {
 	} *stack, internal[EMBEDDED_LEVELS];
 	struct filename	*name;
 	struct nameidata *saved;
-	struct inode	*link_inode;
 	unsigned	root_seq;
 	int		dfd;
 } __randomize_layout;
@@ -962,9 +961,8 @@ int sysctl_protected_regular __read_mostly;
  *
  * Returns 0 if following the symlink is allowed, -ve on error.
  */
-static inline int may_follow_link(struct nameidata *nd)
+static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
 {
-	const struct inode *inode;
 	const struct inode *parent;
 	kuid_t puid;
 
@@ -972,7 +970,6 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if owner and follower match. */
-	inode = nd->link_inode;
 	if (uid_eq(current_cred()->fsuid, inode->i_uid))
 		return 0;
 
@@ -1106,73 +1103,6 @@ static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid,
 	return 0;
 }
 
-static __always_inline
-const char *get_link(struct nameidata *nd)
-{
-	struct saved *last = nd->stack + nd->depth - 1;
-	struct dentry *dentry = last->link.dentry;
-	struct inode *inode = nd->link_inode;
-	int error;
-	const char *res;
-
-	if (!(nd->flags & LOOKUP_PARENT)) {
-		error = may_follow_link(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-	}
-
-	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
-		return ERR_PTR(-ELOOP);
-
-	if (!(nd->flags & LOOKUP_RCU)) {
-		touch_atime(&last->link);
-		cond_resched();
-	} else if (atime_needs_update(&last->link, inode)) {
-		if (unlikely(unlazy_walk(nd)))
-			return ERR_PTR(-ECHILD);
-		touch_atime(&last->link);
-	}
-
-	error = security_inode_follow_link(dentry, inode,
-					   nd->flags & LOOKUP_RCU);
-	if (unlikely(error))
-		return ERR_PTR(error);
-
-	nd->last_type = LAST_BIND;
-	res = READ_ONCE(inode->i_link);
-	if (!res) {
-		const char * (*get)(struct dentry *, struct inode *,
-				struct delayed_call *);
-		get = inode->i_op->get_link;
-		if (nd->flags & LOOKUP_RCU) {
-			res = get(NULL, inode, &last->done);
-			if (res == ERR_PTR(-ECHILD)) {
-				if (unlikely(unlazy_walk(nd)))
-					return ERR_PTR(-ECHILD);
-				res = get(dentry, inode, &last->done);
-			}
-		} else {
-			res = get(dentry, inode, &last->done);
-		}
-		if (!res)
-			goto all_done;
-		if (IS_ERR(res))
-			return res;
-	}
-	if (*res == '/') {
-		error = nd_jump_root(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		while (unlikely(*++res == '/'))
-			;
-	}
-	if (*res)
-		return res;
-all_done: // pure jump
-	put_link(nd);
-	return NULL;
-}
-
 /*
  * follow_up - Find the mountpoint of path's vfsmount
  *
@@ -1796,8 +1726,10 @@ static inline int handle_dots(struct nameidata *nd, int type)
 static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
-	int error;
 	struct saved *last;
+	const char *res;
+	int error;
+
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
 		return ERR_PTR(-ELOOP);
@@ -1828,9 +1760,64 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	nd->link_inode = inode;
 	last->seq = seq;
-	return get_link(nd);
+
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd, inode);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
+	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
+		return ERR_PTR(-ELOOP);
+
+	if (!(nd->flags & LOOKUP_RCU)) {
+		touch_atime(&last->link);
+		cond_resched();
+	} else if (atime_needs_update(&last->link, inode)) {
+		if (unlikely(unlazy_walk(nd)))
+			return ERR_PTR(-ECHILD);
+		touch_atime(&last->link);
+	}
+
+	error = security_inode_follow_link(link->dentry, inode,
+					   nd->flags & LOOKUP_RCU);
+	if (unlikely(error))
+		return ERR_PTR(error);
+
+	nd->last_type = LAST_BIND;
+	res = READ_ONCE(inode->i_link);
+	if (!res) {
+		const char * (*get)(struct dentry *, struct inode *,
+				struct delayed_call *);
+		get = inode->i_op->get_link;
+		if (nd->flags & LOOKUP_RCU) {
+			res = get(NULL, inode, &last->done);
+			if (res == ERR_PTR(-ECHILD)) {
+				if (unlikely(unlazy_walk(nd)))
+					return ERR_PTR(-ECHILD);
+				res = get(link->dentry, inode, &last->done);
+			}
+		} else {
+			res = get(link->dentry, inode, &last->done);
+		}
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
+			return res;
+	}
+	if (*res == '/') {
+		error = nd_jump_root(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+		while (unlikely(*++res == '/'))
+			;
+	}
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 25/34] massage __follow_mount_rcu() a bit
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (22 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 24/34] finally fold get_link() into pick_link() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 26/34] new helper: traverse_mounts() Al Viro
                     ` (8 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

make the loop more similar to that in follow_managed(), with
explicit tracking of flags, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 70 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9b6c3e3edd75..e83071d25fae 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1269,12 +1269,6 @@ int follow_down_one(struct path *path)
 }
 EXPORT_SYMBOL(follow_down_one);
 
-static inline int managed_dentry_rcu(const struct path *path)
-{
-	return (path->dentry->d_flags & DCACHE_MANAGE_TRANSIT) ?
-		path->dentry->d_op->d_manage(path, true) : 0;
-}
-
 /*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
@@ -1282,43 +1276,49 @@ static inline int managed_dentry_rcu(const struct path *path)
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 			       struct inode **inode, unsigned *seqp)
 {
+	struct dentry *dentry = path->dentry;
+	unsigned int flags = dentry->d_flags;
+
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
+		return true;
+
+	if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+		return false;
+
 	for (;;) {
-		struct mount *mounted;
 		/*
 		 * Don't forget we might have a non-mountpoint managed dentry
 		 * that wants to block transit.
 		 */
-		switch (managed_dentry_rcu(path)) {
-		case -ECHILD:
-		default:
-			return false;
-		case -EISDIR:
-			return true;
-		case 0:
-			break;
+		if (unlikely(flags & DCACHE_MANAGE_TRANSIT)) {
+			int res = dentry->d_op->d_manage(path, true);
+			if (res)
+				return res == -EISDIR;
+			flags = dentry->d_flags;
 		}
 
-		if (!d_mountpoint(path->dentry))
-			return !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
-
-		mounted = __lookup_mnt(path->mnt, path->dentry);
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return false;
-		path->mnt = &mounted->mnt;
-		path->dentry = mounted->mnt.mnt_root;
-		nd->flags |= LOOKUP_JUMPED;
-		*seqp = read_seqcount_begin(&path->dentry->d_seq);
-		/*
-		 * Update the inode too. We don't need to re-check the
-		 * dentry sequence number here after this d_inode read,
-		 * because a mount-point is always pinned.
-		 */
-		*inode = path->dentry->d_inode;
+		if (flags & DCACHE_MOUNTED) {
+			struct mount *mounted = __lookup_mnt(path->mnt, dentry);
+			if (mounted) {
+				path->mnt = &mounted->mnt;
+				dentry = path->dentry = mounted->mnt.mnt_root;
+				nd->flags |= LOOKUP_JUMPED;
+				*seqp = read_seqcount_begin(&dentry->d_seq);
+				*inode = dentry->d_inode;
+				/*
+				 * We don't need to re-check ->d_seq after this
+				 * ->d_inode read - there will be an RCU delay
+				 * between mount hash removal and ->mnt_root
+				 * becoming unpinned.
+				 */
+				flags = dentry->d_flags;
+				continue;
+			}
+			if (read_seqretry(&mount_lock, nd->m_seq))
+				return false;
+		}
+		return !(flags & DCACHE_NEED_AUTOMOUNT);
 	}
-	return !read_seqretry(&mount_lock, nd->m_seq) &&
-		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 26/34] new helper: traverse_mounts()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (23 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 25/34] massage __follow_mount_rcu() a bit Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 27/34] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
                     ` (7 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

common guts of follow_down() and follow_managed() taken to a new
helper - traverse_mounts().  The remnants of follow_managed()
are folded into its sole remaining caller (handle_mounts()).
Calling conventions of handle_mounts() slightly sanitized -
instead of the weird "1 for success, -E... for failure" that used
to be imposed by the calling conventions of walk_component() et.al.
we can use the normal "0 for success, -E... for failure".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 177 +++++++++++++++++++++++++------------------------------------
 1 file changed, 72 insertions(+), 105 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e83071d25fae..a7730bbee162 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1168,91 +1168,79 @@ static int follow_automount(struct path *path, int *count, unsigned lookup_flags
 }
 
 /*
- * Handle a dentry that is managed in some way.
- * - Flagged for transit management (autofs)
- * - Flagged as mountpoint
- * - Flagged as automount point
- *
- * This may only be called in refwalk mode.
- * On success path->dentry is known positive.
- *
- * Serialization is taken care of in namespace.c
+ * mount traversal - out-of-line part.  One note on ->d_flags accesses -
+ * dentries are pinned but not locked here, so negative dentry can go
+ * positive right under us.  Use of smp_load_acquire() provides a barrier
+ * sufficient for ->d_inode and ->d_flags consistency.
  */
-static int follow_managed(struct path *path, struct nameidata *nd)
+static int __traverse_mounts(struct path *path, unsigned flags, bool *jumped,
+			     int *count, unsigned lookup_flags)
 {
-	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
-	unsigned flags;
+	struct vfsmount *mnt = path->mnt;
 	bool need_mntput = false;
 	int ret = 0;
 
-	/* Given that we're not holding a lock here, we retain the value in a
-	 * local variable for each dentry as we look at it so that we don't see
-	 * the components of that value change under us */
-	while (flags = smp_load_acquire(&path->dentry->d_flags),
-	       unlikely(flags & DCACHE_MANAGED_DENTRY)) {
+	while (flags & DCACHE_MANAGED_DENTRY) {
 		/* Allow the filesystem to manage the transit without i_mutex
 		 * being held. */
 		if (flags & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
 			ret = path->dentry->d_op->d_manage(path, false);
 			flags = smp_load_acquire(&path->dentry->d_flags);
 			if (ret < 0)
 				break;
 		}
 
-		/* Transit to a mounted filesystem. */
-		if (flags & DCACHE_MOUNTED) {
+		if (flags & DCACHE_MOUNTED) {	// something's mounted on it..
 			struct vfsmount *mounted = lookup_mnt(path);
-			if (mounted) {
+			if (mounted) {		// ... in our namespace
 				dput(path->dentry);
 				if (need_mntput)
 					mntput(path->mnt);
 				path->mnt = mounted;
 				path->dentry = dget(mounted->mnt_root);
+				// here we know it's positive
+				flags = path->dentry->d_flags;
 				need_mntput = true;
 				continue;
 			}
-
-			/* Something is mounted on this dentry in another
-			 * namespace and/or whatever was mounted there in this
-			 * namespace got unmounted before lookup_mnt() could
-			 * get it */
 		}
 
-		/* Handle an automount point */
-		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, &nd->total_link_count,
-						nd->flags);
-			if (ret < 0)
-				break;
-			continue;
-		}
+		if (!(flags & DCACHE_NEED_AUTOMOUNT))
+			break;
 
-		/* We didn't change the current path point */
-		break;
+		// uncovered automount point
+		ret = follow_automount(path, count, lookup_flags);
+		flags = smp_load_acquire(&path->dentry->d_flags);
+		if (ret < 0)
+			break;
 	}
 
-	if (need_mntput) {
-		if (path->mnt == mnt)
-			mntput(path->mnt);
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			ret = -EXDEV;
-		else
-			nd->flags |= LOOKUP_JUMPED;
-	}
-	if (ret == -EISDIR || !ret)
-		ret = 1;
-	if (ret > 0 && unlikely(d_flags_negative(flags)))
+	if (ret == -EISDIR)
+		ret = 0;
+	// possible if you race with several mount --move
+	if (need_mntput && path->mnt == mnt)
+		mntput(path->mnt);
+	if (!ret && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0)) {
-		dput(path->dentry);
-		if (path->mnt != nd->path.mnt)
-			mntput(path->mnt);
-	}
+	*jumped = need_mntput;
 	return ret;
 }
 
+static inline int traverse_mounts(struct path *path, bool *jumped,
+				  int *count, unsigned lookup_flags)
+{
+	unsigned flags = smp_load_acquire(&path->dentry->d_flags);
+
+	/* fastpath */
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY))) {
+		*jumped = false;
+		if (unlikely(d_flags_negative(flags)))
+			return -ENOENT;
+		return 0;
+	}
+	return __traverse_mounts(path, flags, jumped, count, lookup_flags);
+}
+
 int follow_down_one(struct path *path)
 {
 	struct vfsmount *mounted;
@@ -1270,6 +1258,23 @@ int follow_down_one(struct path *path)
 EXPORT_SYMBOL(follow_down_one);
 
 /*
+ * Follow down to the covering mount currently visible to userspace.  At each
+ * point, the filesystem owning that dentry may be queried as to whether the
+ * caller is permitted to proceed or not.
+ */
+int follow_down(struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	bool jumped;
+	int ret = traverse_mounts(path, &jumped, NULL, 0);
+
+	if (path->mnt != mnt)
+		mntput(mnt);
+	return ret;
+}
+EXPORT_SYMBOL(follow_down);
+
+/*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
@@ -1325,6 +1330,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 			  struct path *path, struct inode **inode,
 			  unsigned int *seqp)
 {
+	bool jumped;
 	int ret;
 
 	path->mnt = nd->path.mnt;
@@ -1334,15 +1340,25 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		if (unlikely(!*inode))
 			return -ENOENT;
 		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-			return 1;
+			return 0;
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		// *path might've been clobbered by __follow_mount_rcu()
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
 	}
-	ret = follow_managed(path, nd);
-	if (likely(ret >= 0)) {
+	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
+	if (jumped) {
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			ret = -EXDEV;
+		else
+			nd->flags |= LOOKUP_JUMPED;
+	}
+	if (unlikely(ret)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	} else {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
@@ -1411,55 +1427,6 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 }
 
 /*
- * Follow down to the covering mount currently visible to userspace.  At each
- * point, the filesystem owning that dentry may be queried as to whether the
- * caller is permitted to proceed or not.
- */
-int follow_down(struct path *path)
-{
-	unsigned managed;
-	int ret;
-
-	while (managed = READ_ONCE(path->dentry->d_flags),
-	       unlikely(managed & DCACHE_MANAGED_DENTRY)) {
-		/* Allow the filesystem to manage the transit without i_mutex
-		 * being held.
-		 *
-		 * We indicate to the filesystem if someone is trying to mount
-		 * something here.  This gives autofs the chance to deny anyone
-		 * other than its daemon the right to mount on its
-		 * superstructure.
-		 *
-		 * The filesystem may sleep at this point.
-		 */
-		if (managed & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
-			ret = path->dentry->d_op->d_manage(path, false);
-			if (ret < 0)
-				return ret == -EISDIR ? 0 : ret;
-		}
-
-		/* Transit to a mounted filesystem. */
-		if (managed & DCACHE_MOUNTED) {
-			struct vfsmount *mounted = lookup_mnt(path);
-			if (!mounted)
-				break;
-			dput(path->dentry);
-			mntput(path->mnt);
-			path->mnt = mounted;
-			path->dentry = dget(mounted->mnt_root);
-			continue;
-		}
-
-		/* Don't handle automount points here */
-		break;
-	}
-	return 0;
-}
-EXPORT_SYMBOL(follow_down);
-
-/*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
 static void follow_mount(struct path *path)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 27/34] atomic_open(): return the right dentry in FMODE_OPENED case
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (24 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 26/34] new helper: traverse_mounts() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 28/34] atomic_open(): lift the call of may_open() into do_last() Al Viro
                     ` (6 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

->atomic_open() might have used a different alias than the one we'd
passed to it; in "not opened" case we take care of that, in "opened"
one we don't.  Currently we don't care downstream of "opened" case
which alias to return; however, that will change shortly when we
get to unifying may_open() calls.

It's not hard to get right in all cases, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7730bbee162..c2244ee4b2f0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2956,11 +2956,15 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
+			int acc_mode = op->acc_mode;
+			if (unlikely(dentry != file->f_path.dentry)) {
+				dput(dentry);
+				dentry = dget(file->f_path.dentry);
+			}
 			/*
 			 * We didn't have the inode before the open, so check open
 			 * permission here.
 			 */
-			int acc_mode = op->acc_mode;
 			if (file->f_mode & FMODE_CREATED) {
 				WARN_ON(!(open_flag & O_CREAT));
 				fsnotify_create(dir, dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 28/34] atomic_open(): lift the call of may_open() into do_last()
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (25 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 27/34] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 29/34] do_last(): merge the may_open() calls Al Viro
                     ` (5 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

there we'll be able to merge it with its counterparts in other
cases, and there's no reason to do it before the parent has
been unlocked

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c2244ee4b2f0..16786be13050 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2956,23 +2956,12 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
-			int acc_mode = op->acc_mode;
 			if (unlikely(dentry != file->f_path.dentry)) {
 				dput(dentry);
 				dentry = dget(file->f_path.dentry);
 			}
-			/*
-			 * We didn't have the inode before the open, so check open
-			 * permission here.
-			 */
-			if (file->f_mode & FMODE_CREATED) {
-				WARN_ON(!(open_flag & O_CREAT));
+			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-				acc_mode = 0;
-			}
-			error = may_open(&file->f_path, acc_mode, open_flag);
-			if (WARN_ON(error > 0))
-				error = -EINVAL;
 		} else if (WARN_ON(file->f_path.dentry == DENTRY_NOT_SET)) {
 			error = -EIO;
 		} else {
@@ -3216,12 +3205,19 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_OPENED) {
-		if ((file->f_mode & FMODE_CREATED) ||
-		    !S_ISREG(file_inode(file)->i_mode))
+		if (file->f_mode & FMODE_CREATED) {
+			open_flag &= ~O_TRUNC;
+			will_truncate = false;
+			acc_mode = 0;
+		} else if (!S_ISREG(file_inode(file)->i_mode))
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(dentry);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
+		error = may_open(&nd->path, acc_mode, open_flag);
+		if (error)
+			goto out;
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 29/34] do_last(): merge the may_open() calls
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (26 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 28/34] atomic_open(): lift the call of may_open() into do_last() Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 30/34] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
                     ` (4 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

have FMODE_OPENED case rejoin the main path at earlier point

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 16786be13050..f79e020f08fe 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3215,10 +3215,7 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		error = may_open(&nd->path, acc_mode, open_flag);
-		if (error)
-			goto out;
-		goto opened;
+		goto finish_open_created;
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
@@ -3285,11 +3282,10 @@ static const char *do_last(struct nameidata *nd,
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-	BUG_ON(file->f_mode & FMODE_OPENED); /* once it's opened, it's opened */
-	error = vfs_open(&nd->path, file);
+	if (!(file->f_mode & FMODE_OPENED))
+		error = vfs_open(&nd->path, file);
 	if (error)
 		goto out;
-opened:
 	error = ima_file_check(file, op->acc_mode);
 	if (!error && will_truncate)
 		error = handle_truncate(file);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 30/34] do_last(): don't bother with keeping got_write in FMODE_OPENED case
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (27 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 29/34] do_last(): merge the may_open() calls Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 31/34] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
                     ` (3 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

it's easier to drop it right after lookup_open() and regain if
needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
path we do that anyway.  In case of FMODE_CREATED we won't be
needing it.  And it's easier to prove correctness that way,
especially since the initial failure to get write access is not
always fatal; proving that we'll never end up truncating in that
case is rather convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f79e020f08fe..56285466aa55 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3199,11 +3199,14 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (IS_ERR(dentry)) {
-		error = PTR_ERR(dentry);
-		goto out;
+	if (got_write) {
+		mnt_drop_write(nd->path.mnt);
+		got_write = false;
 	}
 
+	if (IS_ERR(dentry))
+		return ERR_CAST(dentry);
+
 	if (file->f_mode & FMODE_OPENED) {
 		if (file->f_mode & FMODE_CREATED) {
 			open_flag &= ~O_TRUNC;
@@ -3228,16 +3231,6 @@ static const char *do_last(struct nameidata *nd,
 		goto finish_open_created;
 	}
 
-	/*
-	 * If atomic_open() acquired write access it is dropped now due to
-	 * possible mount and symlink following (this might be optimized away if
-	 * necessary...)
-	 */
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-
 finish_lookup:
 	link = step_into(nd, 0, dentry, inode, seq);
 	if (unlikely(link)) {
@@ -3258,27 +3251,25 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-		error = -EISDIR;
 		if (d_is_dir(nd->path.dentry))
-			goto out;
+			return ERR_PTR(-EISDIR);
 		error = may_create_in_sticky(dir_mode, dir_uid,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
-			goto out;
+			return ERR_PTR(error);
 	}
-	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
-		goto out;
+		return ERR_PTR(-ENOTDIR);
 	if (!d_is_reg(nd->path.dentry))
 		will_truncate = false;
 
+finish_open_created:
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-			goto out;
+			return ERR_PTR(error);
 		got_write = true;
 	}
-finish_open_created:
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 31/34] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (28 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 30/34] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 32/34] do_last(): simplify the liveness analysis past finish_open_created Al Viro
                     ` (2 subsequent siblings)
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 56285466aa55..51283caaf7c4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3208,13 +3208,6 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_CAST(dentry);
 
 	if (file->f_mode & FMODE_OPENED) {
-		if (file->f_mode & FMODE_CREATED) {
-			open_flag &= ~O_TRUNC;
-			will_truncate = false;
-			acc_mode = 0;
-		} else if (!S_ISREG(file_inode(file)->i_mode))
-			will_truncate = false;
-
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
@@ -3222,10 +3215,6 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
-		/* Don't check for write permission, don't truncate */
-		open_flag &= ~O_TRUNC;
-		will_truncate = false;
-		acc_mode = 0;
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3260,10 +3249,16 @@ static const char *do_last(struct nameidata *nd,
 	}
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
-	if (!d_is_reg(nd->path.dentry))
-		will_truncate = false;
 
 finish_open_created:
+	if (file->f_mode & FMODE_CREATED) {
+		/* Don't check for write permission, don't truncate */
+		open_flag &= ~O_TRUNC;
+		will_truncate = false;
+		acc_mode = 0;
+	} else if (!d_is_reg(nd->path.dentry)) {
+		will_truncate = false;
+	}
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 32/34] do_last(): simplify the liveness analysis past finish_open_created
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (29 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 31/34] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 33/34] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 34/34] split the lookup-related parts of do_last() into a separate helper Al Viro
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Don't mess with got_write there - it is guaranteed to be false on
entry and it will be set true if and only if we decide to go for
truncation and manage to get write access for that.

Don't carry acc_mode through the entire thing - it's only used
in that part.  And don't bother with gotos in there - compiler is
quite capable of optimizing that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 51283caaf7c4..ce6f2864a335 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3131,9 +3131,9 @@ static const char *do_last(struct nameidata *nd,
 	kuid_t dir_uid = nd->inode->i_uid;
 	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool will_truncate = (open_flag & O_TRUNC) != 0;
+	bool do_truncate;
 	bool got_write = false;
-	int acc_mode = op->acc_mode;
+	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3251,36 +3251,30 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(-ENOTDIR);
 
 finish_open_created:
+	do_truncate = false;
+	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
 		/* Don't check for write permission, don't truncate */
 		open_flag &= ~O_TRUNC;
-		will_truncate = false;
 		acc_mode = 0;
-	} else if (!d_is_reg(nd->path.dentry)) {
-		will_truncate = false;
-	}
-	if (will_truncate) {
+	} else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
 			return ERR_PTR(error);
-		got_write = true;
+		do_truncate = true;
 	}
 	error = may_open(&nd->path, acc_mode, open_flag);
-	if (error)
-		goto out;
-	if (!(file->f_mode & FMODE_OPENED))
+	if (!error && !(file->f_mode & FMODE_OPENED))
 		error = vfs_open(&nd->path, file);
-	if (error)
-		goto out;
-	error = ima_file_check(file, op->acc_mode);
-	if (!error && will_truncate)
+	if (!error)
+		error = ima_file_check(file, op->acc_mode);
+	if (!error && do_truncate)
 		error = handle_truncate(file);
-out:
 	if (unlikely(error > 0)) {
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	if (got_write)
+	if (do_truncate)
 		mnt_drop_write(nd->path.mnt);
 	return ERR_PTR(error);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 33/34] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (30 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 32/34] do_last(): simplify the liveness analysis past finish_open_created Al Viro
@ 2020-02-23  1:16   ` Al Viro
  2020-02-23  1:16   ` [RFC][PATCH v2 34/34] split the lookup-related parts of do_last() into a separate helper Al Viro
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... getting may_create_in_sticky() checks in FMODE_OPENED case as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ce6f2864a335..37cbe7806677 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3207,14 +3207,7 @@ static const char *do_last(struct nameidata *nd,
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 
-	if (file->f_mode & FMODE_OPENED) {
-		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(nd->path.dentry);
-		nd->path.dentry = dentry;
-		goto finish_open_created;
-	}
-
-	if (file->f_mode & FMODE_CREATED) {
+	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3238,7 +3231,9 @@ static const char *do_last(struct nameidata *nd,
 	error = complete_walk(nd);
 	if (error)
 		return ERR_PTR(error);
-	audit_inode(nd->name, nd->path.dentry, 0);
+finish_open_created:
+	if (!(file->f_mode & FMODE_CREATED))
+		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		if (d_is_dir(nd->path.dentry))
 			return ERR_PTR(-EISDIR);
@@ -3250,7 +3245,6 @@ static const char *do_last(struct nameidata *nd,
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
 
-finish_open_created:
 	do_truncate = false;
 	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v2 34/34] split the lookup-related parts of do_last() into a separate helper
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (31 preceding siblings ...)
  2020-02-23  1:16   ` [RFC][PATCH v2 33/34] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-02-23  1:16   ` Al Viro
  32 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23  1:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 37cbe7806677..96182a947ca1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3121,19 +3121,12 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	return ERR_PTR(error);
 }
 
-/*
- * Handle the last step of open()
- */
-static const char *do_last(struct nameidata *nd,
+static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
-	kuid_t dir_uid = nd->inode->i_uid;
-	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool do_truncate;
 	bool got_write = false;
-	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3145,9 +3138,9 @@ static const char *do_last(struct nameidata *nd,
 
 	if (nd->last_type != LAST_NORM) {
 		error = handle_dots(nd, nd->last_type);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		goto finish_open;
+		if (likely(!error))
+			error = complete_walk(nd);
+		return ERR_PTR(error);
 	}
 
 	if (!(open_flag & O_CREAT)) {
@@ -3160,7 +3153,6 @@ static const char *do_last(struct nameidata *nd,
 		if (likely(dentry))
 			goto finish_lookup;
 
-		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
 		/* create side of things */
@@ -3170,7 +3162,7 @@ static const char *do_last(struct nameidata *nd,
 		 * about to look up
 		 */
 		error = complete_walk(nd);
-		if (error)
+		if (unlikely(error))
 			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
@@ -3199,10 +3191,8 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (got_write) {
+	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
 
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
@@ -3210,7 +3200,7 @@ static const char *do_last(struct nameidata *nd,
 	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		goto finish_open_created;
+		return NULL;
 	}
 
 finish_lookup:
@@ -3226,12 +3216,29 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, nd->path.dentry, 0);
 		return ERR_PTR(-EEXIST);
 	}
-finish_open:
+
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
-	error = complete_walk(nd);
-	if (error)
-		return ERR_PTR(error);
-finish_open_created:
+	return ERR_PTR(complete_walk(nd));
+}
+
+/*
+ * Handle the last step of open()
+ */
+static const char *do_last(struct nameidata *nd,
+		   struct file *file, const struct open_flags *op)
+{
+	kuid_t dir_uid = nd->inode->i_uid;
+	umode_t dir_mode = nd->inode->i_mode;
+	int open_flag = op->open_flag;
+	bool do_truncate;
+	int acc_mode;
+	const char *link;
+	int error;
+
+	link = open_last_lookups(nd, file, op);
+	if (unlikely(link))
+		return link;
+
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 02/34] fix automount/automount race properly
  2020-02-23  1:15   ` [RFC][PATCH v2 02/34] fix automount/automount race properly Al Viro
@ 2020-02-23  2:07     ` Linus Torvalds
  2020-02-27 19:43       ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:07 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 5:16 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +
> +discard2:
> +       namespace_unlock();
> +discard1:
> +       inode_unlock(dentry->d_inode);
> +discard:
>         /* remove m from any expiration list it may be on */

Would you mind re-naming those labels?

I realize that the numbering may help show that the error handling is
done in the reverse order, but I bet that a nice name could so that
too.

              Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW
  2020-02-23  1:16   ` [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW Al Viro
@ 2020-02-23  2:14     ` Linus Torvalds
  2020-02-23 22:07       ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:14 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         if (likely(!d_is_symlink(path->dentry)) ||
> -          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
> +          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
> +          flags & WALK_NOFOLLOW) {

Humor me, and don't mix bitwise ops with logical boolean ops without
parentheses, ok?

And yes, the old code did it too, so it's not a new thing.

But as it gets even more complex, let's just generally strive for doing

   (a & b) || (c & d)

instead of

   a & b || c & d

to make it easier to mentally see the grouping.

               Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4
  2020-02-23  1:16   ` [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4 Al Viro
@ 2020-02-23  2:19     ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:19 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

Ok, so far I haven't seen anything bad. But I keep noticing these odd
stylistic things...

On Sat, Feb 22, 2020 at 5:22 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -       return step_into(nd, flags, dentry, inode, seq);
> +       err = step_into(nd, flags, dentry, inode, seq);
> +       if (!err)
> +               return NULL;
> +       else if (err > 0)
> +               return get_link(nd);
> +       else
> +               return ERR_PTR(err);
>  }

What?

Those "else" statements make no sense.

Each if-statement has a "return" in it. It's done. The else part is
not adding anything but confusion.

IOW, this should be

        if (!err)
                return NULL;
        if (err > 0)
                return get_link(nd);
        return ERR_PTR(err);

with not an 'else' in sight.

              Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5
  2020-02-23  1:16   ` [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5 Al Viro
@ 2020-02-23  2:22     ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:22 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

Ok, one step back:

On Sat, Feb 22, 2020 at 5:22 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +       if (err > 0)
> +               return get_link(nd);
> +       else
> +               return ERR_PTR(err);
>  }

.. and two steps forward, as you then entirely remove the code I just
complained about.

> -       err = step_into(nd, flags, dentry, inode, seq);
> -       if (!err)
> -               return NULL;
> -       else if (err > 0)
> -               return get_link(nd);
> -       else
> -               return ERR_PTR(err);
> +       return step_into(nd, flags, dentry, inode, seq);

I'm waiting with bated breath if the next patch will then remove the
new odd if-return-else thing. But I'm not going to peek, it's going to
be a surprise.

              Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6
  2020-02-23  1:16   ` [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6 Al Viro
@ 2020-02-23  2:24     ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:24 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 5:23 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -       err = pick_link(nd, &path, inode, seq);
> -       if (err > 0)
> -               return get_link(nd);
> -       else
> -               return ERR_PTR(err);
> +       return pick_link(nd, &path, inode, seq);

Yay! It's like Christmas.

Except honestly, I think I'd have been happier if the intermediate
points didn't have that odd syntax in them. Even if it then gets
removed in the end.

               Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 24/34] finally fold get_link() into pick_link()
  2020-02-23  1:16   ` [RFC][PATCH v2 24/34] finally fold get_link() into pick_link() Al Viro
@ 2020-02-23  2:31     ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-02-23  2:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 5:23 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> kill nd->link_inode, while we are at it

I like that part, but that pick_link() function is now the function from hell.

It's now something like a hundred lines of fairly dense code, isn't it?

Oh well. Maybe it's easier to follow since it's straight-line. So this
is more of an observation than a complaint.

             Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW
  2020-02-23  2:14     ` Linus Torvalds
@ 2020-02-23 22:07       ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-23 22:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 06:14:45PM -0800, Linus Torvalds wrote:
> On Sat, Feb 22, 2020 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         if (likely(!d_is_symlink(path->dentry)) ||
> > -          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
> > +          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
> > +          flags & WALK_NOFOLLOW) {
> 
> Humor me, and don't mix bitwise ops with logical boolean ops without
> parentheses, ok?
> 
> And yes, the old code did it too, so it's not a new thing.
> 
> But as it gets even more complex, let's just generally strive for doing
> 
>    (a & b) || (c & d)
> 
> instead of
> 
>    a & b || c & d
> 
> to make it easier to mentally see the grouping.

Can do...  FWIW, the only case where the normal "'and' is multiplication,
'or' is addition" doesn't give the right result is
	x | y && z
written instead of
	x | (y && z)
where you would be better off rewriting the expression anyway.

FWIW, one of the things in the local pile is this:

commit cc1b6724b32de1be108cf6a5f28dbb5aa424b42f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 19 12:44:18 2020 -0500

    namei: invert the meaning of WALK_FOLLOW
    
    old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
    That's what that flag had really been used for.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/namei.c b/fs/namei.c
index 6eb708014d4b..7d938241157f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1786,7 +1786,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	return NULL;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1805,7 +1805,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path.dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
+	   (flags & WALK_TRAILING && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(&path, nd);
@@ -2157,10 +2157,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			link = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, 0);
 		} else {
 			/* not the last component */
-			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_MORE);
 		}
 		if (unlikely(link)) {
 			if (IS_ERR(link))
@@ -2288,7 +2288,7 @@ static inline const char *lookup_last(struct nameidata *nd)
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	link = walk_component(nd, 0);
+	link = walk_component(nd, WALK_TRAILING);
 	if (link) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->stack[0].name = NULL;
@@ -3241,7 +3241,7 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	link = step_into(nd, 0, dentry, inode, seq);
+	link = step_into(nd, WALK_TRAILING, dentry, inode, seq);
 	if (unlikely(link)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);


and I can simply fold adding extra parens into it.  Or I can fold that into
the patch you'd been replying to  - I'm still uncertain about the series
containing WALK_TRAILING...

^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v2)
  2020-02-23  1:11 [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-02-25  1:24 ` Al Viro
  2020-02-25  2:03   ` Al Viro
  2020-02-28  1:24   ` Aleksa Sarai
  2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
  2 siblings, 2 replies; 198+ messages in thread
From: Al Viro @ 2020-02-25  1:24 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds, Aleksa Sarai

On Sun, Feb 23, 2020 at 01:12:21AM +0000, Al Viro wrote:
> 	This is a slightly extended repost of the patchset posted on
> Jan 19.  Current branch is in vfs.git#work.do_last, the main
> difference from the last time around being a bit of do_last()
> untangling added in the end of series.  #work.openat2 is already
> in mainline, which simplifies the series - now it's a straight
> branch with no merges.

Whee...  While trying to massage ".." handling towards the use of
regular mount crossing semantics, I've found something interesting.
Namely, if you start in a directory with overmounted parent,
LOOKUP_NO_XDEV resolution of ../something will bloody well cross
into the overmount.

Reason: follow_dotdot() (and its RCU counterpart) check for LOOKUP_NO_XDEV
when crossing into underlying fs, but not when crossing into overmount
of the parent.

Interpretation of .. is basically

loop:	if we are in root					// uncommon
		next = current position
	else if we are in root of a mounted filesystem		// more rare
		move to underlying mountpoint
		goto loop
	else
		next = parent directory of current position	// most common

	while next is overmounted				// _VERY_ uncommon
		next = whatever's mounted on next

	move to next

The second loop should've been sharing code with the normal mountpoint
crossing.  It doesn't, which has already lead to interesting inconsistencies
(e.g. autofs generally expects ->d_manage() to be called before crossing
into it; here it's not done).  LOOKUP_NO_XDEV has just added one more...

Incidentally, another inconsistency is LOOKUP_BENEATH treatment in case
when we have walked out of the subtree by way of e.g. procfs symlink and
then ran into .. in the absolute root (that's
                if (!follow_up(&nd->path))
                        break;
in follow_dotdot()).  Shouldn't that give the same reaction as ..
in root (EXDEV on LOOKUP_BENEATH, that is)?  It doesn't...

Another one is about LOOKUP_NO_XDEV again: suppose you have process'
root directly overmounted and cwd in the root of whatever's overmounting
it.  Resolution of .. will stay in cwd - we have no parent within the
chroot jail we are in, so we move to whatever's overmounting that root.
Which is the original location.  Should we fail on LOOKUP_NO_XDEV here?
Plain .. in the root of chroot jail (not overmounted by anything) does
*not*...

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v2)
  2020-02-25  1:24 ` [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
@ 2020-02-25  2:03   ` Al Viro
  2020-02-28  1:24   ` Aleksa Sarai
  1 sibling, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-25  2:03 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds, Aleksa Sarai

On Tue, Feb 25, 2020 at 01:24:57AM +0000, Al Viro wrote:

> Incidentally, another inconsistency is LOOKUP_BENEATH treatment in case
> when we have walked out of the subtree by way of e.g. procfs symlink and
> then ran into .. in the absolute root (that's
>                 if (!follow_up(&nd->path))
>                         break;
> in follow_dotdot()).  Shouldn't that give the same reaction as ..
> in root (EXDEV on LOOKUP_BENEATH, that is)?  It doesn't...
> 
> Another one is about LOOKUP_NO_XDEV again: suppose you have process'
> root directly overmounted and cwd in the root of whatever's overmounting
> it.  Resolution of .. will stay in cwd - we have no parent within the
> chroot jail we are in, so we move to whatever's overmounting that root.
> Which is the original location.  Should we fail on LOOKUP_NO_XDEV here?
> Plain .. in the root of chroot jail (not overmounted by anything) does
> *not*...

FWIW, my preference would be the following (for non-RCU case; RCU
one is similar)

get_parent(nd)
{
	if (path_equal(&nd->path, &nd->root))
		return NULL;
	if (nd->path.dentry != nd->path.mnt->mnt_root)
		return dget_parent(nd->path.dentry);
	m = real_mount(nd->path.mnt);
        read_seqlock_excl(&mount_lock);
	while (mnt_has_parent(m)) {
		d = m->mnt_mountpoint;
		m = m->mnt_parent;
		if (&m->mnt == nd->root.mnt && d == nd->root.path) // root
			break;
		if (m->mnt_root != d) {
			if (unlikely(nd->flags & LOOKUP_NO_XDEV)) {
				read_sequnlock_excl(&mount_lock);
				return ERR_PTR(-EXDEV);
			}
			mntget(&m->mnt);
			dget(d);
			read_sequnlock_excl(&mount_lock);
			path_put(&nd->path);
			nd->path.mnt = &m->mnt;
			nd->path.dentry = d;
			nd->inode = d->d_inode;
			return dget_parent(d);
		}
	}
	read_sequnlock_excl(&mount_lock);
	return NULL;
}

with follow_dotdot() doing
	parent = get_parent(nd);
	if (unlikely(IS_ERR(parent)))
		return PTR_ERR(parent);
	if (unlikely(!parent)) {	.. in root is a rare case
		bugger off if LOOKUP_BENEATH
		parent = dget(nd->path.dentry);
	} else if (unlikely(!path_connected(nd->path.mnt, parent))) {
		dput(parent);
		return -ENOENT;
	}
	dput(nd->path.dentry);
	nd->path.dentry = parent;
	follow_mount(&nd->path);

... with the last part replaced with
	step_into(nd, WALK_NOFOLLOW, dentry, NULL, 0);
later in this series, with similar in RCU case (only there we would want
inode and seq supplied, as usual, so it would be get_parent_rcu(nd, &inode,
&seq)).

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 02/34] fix automount/automount race properly
  2020-02-23  2:07     ` Linus Torvalds
@ 2020-02-27 19:43       ` Al Viro
  2020-02-27 20:00         ` Linus Torvalds
  0 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-02-27 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sat, Feb 22, 2020 at 06:07:39PM -0800, Linus Torvalds wrote:
> On Sat, Feb 22, 2020 at 5:16 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > +
> > +discard2:
> > +       namespace_unlock();
> > +discard1:
> > +       inode_unlock(dentry->d_inode);
> > +discard:
> >         /* remove m from any expiration list it may be on */
> 
> Would you mind re-naming those labels?
> 
> I realize that the numbering may help show that the error handling is
> done in the reverse order, but I bet that a nice name could so that
> too.

Umm...  A bit of reordering in the beginning eliminates discard1, suggesting
s/discard2/discard_locked/...  Incremental would be

diff --git a/fs/namespace.c b/fs/namespace.c
index 6228fd1ef94f..777c3116e62e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2844,22 +2844,22 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	inode_lock(dentry->d_inode);
+	namespace_lock();
 	if (unlikely(cant_mount(dentry))) {
 		err = -ENOENT;
-		goto discard1;
+		goto discard_locked;
 	}
-	namespace_lock();
 	rcu_read_lock();
 	if (unlikely(__lookup_mnt(path->mnt, dentry))) {
 		rcu_read_unlock();
 		err = 0;
-		goto discard2;
+		goto discard_locked;
 	}
 	rcu_read_unlock();
 	mp = get_mountpoint(dentry);
 	if (IS_ERR(mp)) {
 		err = PTR_ERR(mp);
-		goto discard2;
+		goto discard_locked;
 	}
 
 	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
@@ -2869,9 +2869,8 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	mntput(m);
 	return 0;
 
-discard2:
+discard_locked:
 	namespace_unlock();
-discard1:
 	inode_unlock(dentry->d_inode);
 discard:
 	/* remove m from any expiration list it may be on */

^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v2 02/34] fix automount/automount race properly
  2020-02-27 19:43       ` Al Viro
@ 2020-02-27 20:00         ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-02-27 20:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Thu, Feb 27, 2020 at 11:43 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  A bit of reordering in the beginning eliminates discard1, suggesting
> s/discard2/discard_locked/...  Incremental would be

Ack, thanks.

          Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v2)
  2020-02-25  1:24 ` [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
  2020-02-25  2:03   ` Al Viro
@ 2020-02-28  1:24   ` Aleksa Sarai
  2020-02-28  3:02     ` Al Viro
  1 sibling, 1 reply; 198+ messages in thread
From: Aleksa Sarai @ 2020-02-28  1:24 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 3039 bytes --]

On 2020-02-25, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Feb 23, 2020 at 01:12:21AM +0000, Al Viro wrote:
> > 	This is a slightly extended repost of the patchset posted on
> > Jan 19.  Current branch is in vfs.git#work.do_last, the main
> > difference from the last time around being a bit of do_last()
> > untangling added in the end of series.  #work.openat2 is already
> > in mainline, which simplifies the series - now it's a straight
> > branch with no merges.
> 
> Whee...  While trying to massage ".." handling towards the use of
> regular mount crossing semantics, I've found something interesting.
> Namely, if you start in a directory with overmounted parent,
> LOOKUP_NO_XDEV resolution of ../something will bloody well cross
> into the overmount.

Oh boy...

> Reason: follow_dotdot() (and its RCU counterpart) check for LOOKUP_NO_XDEV
> when crossing into underlying fs, but not when crossing into overmount
> of the parent.
> 
> Interpretation of .. is basically
> 
> loop:	if we are in root					// uncommon
> 		next = current position
> 	else if we are in root of a mounted filesystem		// more rare
> 		move to underlying mountpoint
> 		goto loop
> 	else
> 		next = parent directory of current position	// most common
> 
> 	while next is overmounted				// _VERY_ uncommon
> 		next = whatever's mounted on next
> 
> 	move to next
> 
> The second loop should've been sharing code with the normal mountpoint
> crossing.  It doesn't, which has already lead to interesting inconsistencies
> (e.g. autofs generally expects ->d_manage() to be called before crossing
> into it; here it's not done).  LOOKUP_NO_XDEV has just added one more...

You're quite right -- LOOKUP_NO_XDEV should block that and I missed it.

> Incidentally, another inconsistency is LOOKUP_BENEATH treatment in case
> when we have walked out of the subtree by way of e.g. procfs symlink and
> then ran into .. in the absolute root (that's
>                 if (!follow_up(&nd->path))
>                         break;
> in follow_dotdot()).  Shouldn't that give the same reaction as ..
> in root (EXDEV on LOOKUP_BENEATH, that is)?  It doesn't...

You can't go through procfs symlinks with LOOKUP_BENEATH, but if it's
possible to do that kind of jump then it should also be blocked (but I
would say that I'd prefer "block any kind of weird jump").

> Another one is about LOOKUP_NO_XDEV again: suppose you have process'
> root directly overmounted and cwd in the root of whatever's overmounting
> it.  Resolution of .. will stay in cwd - we have no parent within the
> chroot jail we are in, so we move to whatever's overmounting that root.
> Which is the original location.  Should we fail on LOOKUP_NO_XDEV here?
> Plain .. in the root of chroot jail (not overmounted by anything) does
> *not*...

I think LOOKUP_NO_XDEV should block that since you end up crossing a
mountpoint.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v2)
  2020-02-28  1:24   ` Aleksa Sarai
@ 2020-02-28  3:02     ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-02-28  3:02 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Fri, Feb 28, 2020 at 12:24:51PM +1100, Aleksa Sarai wrote:

> > Another one is about LOOKUP_NO_XDEV again: suppose you have process'
> > root directly overmounted and cwd in the root of whatever's overmounting
> > it.  Resolution of .. will stay in cwd - we have no parent within the
> > chroot jail we are in, so we move to whatever's overmounting that root.
> > Which is the original location.  Should we fail on LOOKUP_NO_XDEV here?
> > Plain .. in the root of chroot jail (not overmounted by anything) does
> > *not*...
> 
> I think LOOKUP_NO_XDEV should block that since you end up crossing a
> mountpoint.

You are not.  Your process' root is overmounted and your current directory
is on that overmount.  You attempt to resolve ".." there.

# cd /
# unshare -m
# stat .
  File: .
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d      Inode: 2           Links: 25
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-02-26 20:51:06.623409892 -0500
Modify: 2020-02-26 20:43:51.284020000 -0500
Change: 2020-02-26 20:43:51.284020000 -0500
 Birth: -
# mkdir /tmp/foo
# mount -t tmpfs none /tmp/foo/
# for i in *; do test -d $i && mkdir /tmp/foo/$i && mount --rbind $i /tmp/foo/$i; done
# cd /tmp/foo/
# mount --move . /
# /bin/pwd
/
# ls
bin   dev  home  lib32  libx32      ltp    mnt  proc  run   srv  tmp  var
boot  etc  lib   lib64  lost+found  media  opt  root  sbin  sys  usr
# ls /
253_metadump  etc             lib32       media  run   usr
315.full      home            lib64       mnt    sbin  var
bin           initrd.img      libx32      opt    srv   vmlinuz
boot          initrd.img.old  lost+found  proc   sys   vmlinuz.old
dev           lib             ltp         root   tmp
# stat ..
  File: ..
  Size: 500             Blocks: 0          IO Block: 4096   directory
Device: 19h/25d Inode: 1875746     Links: 25
Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-02-27 21:48:45.649705410 -0500
Modify: 2020-02-27 21:46:40.829607188 -0500
Change: 2020-02-27 21:46:40.829607188 -0500
 Birth: -
# stat .
  File: .
  Size: 500             Blocks: 0          IO Block: 4096   directory
Device: 19h/25d Inode: 1875746     Links: 25
Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-02-27 21:48:45.649705410 -0500
Modify: 2020-02-27 21:46:40.829607188 -0500
Change: 2020-02-27 21:46:40.829607188 -0500
 Birth: -


See what's going on?  We have a tmpfs overmounting root; current directory
is the root of tmpfs; process root is on ext3.  Lookups for . and .. yield
exactly the same result here - we stay in the root of tmpfs.

Anyway, see #work.dotdot in vfs.git

^ permalink raw reply	[flat|nested] 198+ messages in thread

* [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-02-23  1:11 [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
  2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-02-25  1:24 ` [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
@ 2020-03-01 21:51 ` Al Viro
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                     ` (2 more replies)
  2 siblings, 3 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

	Extended since the last repost.  The branch is in #work.dotdot;
#work.do_last is its beginning (about 2/3 of the total), slightly
reworked since the last time.

	Individual patches are in the followups.  Branch survives
the local testing (including ltp and xfstests).

	Review and testing would be _very_ welcome; it does a lot
of massage, so there had been a plenty of opportunities to fuck up
and fail to spot that.  The same goes for profiling - it doesn't
seem to slow the things down, but that needs to be verified.  If
nobody screams, in a couple of days I'm going to push the beginning
of that thing (#work.do_last) into -next, with the rest to follow
it there next weekend or so.

	Changes since v2: several followup cleanups in the end of
symlinks-related series (part 3) and large series on ".." handling
added in the end (part 6).  The only thing suggested in review and
not done yet is splitup of pick_link() - it's going to happen, but
I want the tricky parts handled first; pick_link() is trivial
as far as control flow goes.

part 1: follow_automount() cleanups and fixes.

	Quite a bit of that function had been about working around the
wrong calling conventions of finish_automount().  The problem is that
finish_automount() misuses the primitive intended for mount(2) and
friends, where we want to mount on top of the pile, even if something
has managed to add to that while we'd been trying to lock the namespace.
For automount that's not the right thing to do - there we want to discard
whatever it was going to attach and just cross into what got mounted
there in the meanwhile (most likely - the results of the same automount
triggered by somebody else).  Current mainline kinda-sorta manages to do
that, but it's unreliable and very convoluted.  Much simpler approach
is to stop using lock_mount() in finish_automount() and have it bail
out if something turns out to have been mounted on top where we wanted
to attach.  That allows to get rid of a lot of PITA in the caller.
Another simplification comes from not trying to cross into the results
of automount - simply ride through the next iteration of the loop and
let it move into overmount.

	Another thing in the same series is divorcing follow_automount()
from nameidata; that'll play later when we get to unifying follow_down()
with the guts of follow_managed().

	4 commits, the second one fixes a hard-to-hit race.  The first
is a prereq for it.

1/55	do_add_mount(): lift lock_mount/unlock_mount into callers
2/55	fix automount/automount race properly
3/55	follow_automount(): get rid of dead^Wstillborn code
4/55	follow_automount() doesn't need the entire nameidata

part 2: unifying mount traversals in pathwalk.

	Handling of mount traversal (follow_managed()) is currently called
in a bunch of places.  Each of them is shortly followed by a call of
step_into() or an open-coded equivalent thereof.  However, the locations
of those step_into() calls are far from preceding follow_managed();
moreover, that preceding call might happen on different paths that
converge to given step_into() call.  It's harder to analyse that it should
be (especially when it comes to liveness analysis) and it forces rather
ugly calling conventions on lookup_fast()/atomic_open()/lookup_open().
The series below massages the code to the point when the calls of
follow_managed() (and __follow_mount_rcu()) move into the beginning of
step_into().

5/55	make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
	gets EEXIST handling in do_last() past the step_into() call there.
6/55	handle_mounts(): start building a sane wrapper for follow_managed()
	rather than mangling follow_managed() itself (and creating conflicts
	with openat2 series), add a wrapper that will absorb the required
	interface changes.
7/55	atomic_open(): saner calling conventions (return dentry on success)
	struct path passed to it is pure out parameter; only dentry part
	ever varies, though - mnt is always nd->path.mnt.  Just return
	the dentry on success, and ERR_PTR(-E...) on failure.
8/55	lookup_open(): saner calling conventions (return dentry on success)
	propagate the same change one level up the call chain.
9/55	do_last(): collapse the call of path_to_nameidata()
	struct path filled in lookup_open() call is eventually given to
	handle_mounts(); the only use it has before that is path_to_nameidata()
	call in "->atomic_open() has actually opened it" case, and there
	path_to_nameidata() is an overkill - we are guaranteed to replace
	only nd->path.dentry.  So have the struct path filled only immediately
	prior to handle_mounts().
10/55	handle_mounts(): pass dentry in, turn path into a pure out argument
	now all callers of handle_mount() are directly preceded by filling
	struct path it gets.  path->mnt is nd->path.mnt in all cases, so we can
	pass just the dentry instead and fill path in handle_mount() itself.
	Some boilerplate gone, path is pure out argument of handle_mount()
	now.
11/55	lookup_fast(): consolidate the RCU success case
	massage to gather what will become an RCU case equivalent of
	handle_mounts(); basically, that's what we do if revalidate succeeds
	in RCU case of lookup_fast(), including unlazy and fallback to
	handle_mounts() if __follow_mount_rcu() says "it's too tricky".
12/55	teach handle_mounts() to handle RCU mode
	... and take that into handle_mount() itself.  The other caller of
	__follow_mount_rcu() is fine with the same fallback (it just didn't
	bother since it's in the very beginning of pathwalk), switched to
	handle_mount() as well.
13/55	lookup_fast(): take mount traversal into callers
	Now we are getting somewhere - both RCU and non-RCU success cases of
	lookup_fast() are ended with the same return handle_mounts(...);
	move that to the callers - there it will merge with the identical calls
	that had been on the paths where we had to do slow lookups.
	lookup_fast() returns dentry now.
14/55	new step_into() flag: WALK_NOFOLLOW
	use step_into() instead of open-coding it in handle_lookup_down().
	Add a flag for "don't follow symlinks regardless of LOOKUP_FOLLOW" for
	that (and eventually, I hope, for .. handling).
	Now *all* calls of handle_mounts() and step_into() are right next to
	each other.
15/55	fold handle_mounts() into step_into()
	... and we can move the call of handle_mounts() into step_into(),
	getting a slightly saner calling conventions out of that.
16/55	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
	another payoff from 14/17 - we can teach path_lookupat() to do
	what path_mountpointat() used to.  And kill the latter, along with
	its wrappers.
17/55	expand the only remaining call of path_lookup_conditional()
	minor cleanup - RIP path_lookup_conditional().  Only one caller left.

Changes so far:
	* mount traversal is taken into step_into().
	* lookup_fast(), atomic_open() and lookup_open() calling conventions
are slightly changed.  All of them return dentry now, instead of returning
an int and filling struct path on success.  For lookup_fast() the old
"0 for cache miss, 1 for cache hit" is replaced with "NULL stands for cache
miss, dentry - for hit".
	* step_into() can be called in RCU mode as well.  Takes nameidata,
WALK_... flags, dentry and, in RCU case, corresponding inode and seq value.
Handles mount traversals, decides whether it's a symlink to be followed.
Error => returns -E...; symlink to follow => returns 1, puts symlink on stack;
non-symlink or symlink not to follow => returns 0, moves nd->path to new location.
	* LOOKUP_MOUNTPOINT introduced; user_path_mountpoint_at() and friends
became calls of user_path_at() et.al. with LOOKUP_MOUNTPOINT in flags.

part 3: untangling the symlink handling.

	Right now when we decide to follow a symlink it happens this way:
	* step_into() decides that it has been given a symlink that needs to
be followed.
	* it calls pick_link(), which pushes the symlink on stack and
returns 1 on success / -E... on error.  Symlink's mount/dentry/seq is
stored on stack and the inode is stashed in nd->link_inode.
	* step_into() passes that 1 to its callers, which proceed to pass it
up the call chain for several layers.  In all cases we get to get_link()
call shortly afterwards.
	* get_link() is called, picks the inode stashed in nd->link_inode
by the pick_link(), does some checks, touches the atime, etc.
	* get_link() either picks the link body out of inode or calls
->get_link().  If it's an absolute symlink, we move to the root and return
the relative portion of the body; if it's a relative one - just return the
body.  If it's a procfs-style one, the call of nd_jump_link() has been
made and we'd moved to whatever location is desired.  And return NULL,
same as we do for symlink to "/".
	* the caller proceeds to deal with the string returned to it.

	The sequence is the same in all cases (nested symlink, trailing
symlink on lookup, trailing symlink on open), but its pieces are not close
to each other and the bit between the call of pick_link() and (inevitable)
call of get_link() afterwards is not easy to follow.  Moreover, a bunch
of functions (walk_component/lookup_last/do_last) ends up with the same
conventions for return values as step_into().  And those conventions
(see above) are not pretty - 0/1/-E... is asking for mistakes, especially
when returned 1 is used only to direct control flow on a rather twisted
way to matching get_link() call.  And that path can be seriously twisted.
E.g. when we are trying to open /dev/stdin, we get the following sequence:
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  Into the stack it goes and we return 1.
	* do_last() sees 1 and returns it.
	* trailing_symlink() is called (in the top-level loop) and it
calls get_link().  OK, we get "/proc/self/fd/0" for body, move to
root again and return "proc/self/fd/0".
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes and we return 1.
	* again, do_last() passes 1 to caller
	* trailing_symlink() is called and calls get_link().
	* this time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  But the
fun doesn't stop yet.
	* trailing_symlink() returns "" to the caller
	* link_path_walk() is called on that and does nothing
whatsoever.
	* do_last() is called and sees LAST_BIND left by the get_link().
It calls handle_dots()
	* handle_dots() drops the symlink from stack and returns
	* do_last() *FINALLY* proceeds to the point after its call of
step_into() (finish_open:) and gets around to opening the damn thing.

	Making sense of the control flow through all of that is not fun,
to put it mildly; debugging anything in that area can be a massive PITA,
and this example has touched only one of 3 cases.  Arguably, the worst
one, but...  Anyway, it turns out that this code can be massaged to
considerably saner shape - both in terms of control flow and wrt calling
conventions.

18/55	merging pick_link() with get_link(), part 1
	prep work: move the "hardening" crap from trailing_symlink() into
get_link() (conditional on the absense of LOOKUP_PARENT in nd->flags).
We'll be moving the calls of get_link() around quite a bit through that
series, and the next step will be to eliminate trailing_symlink().
19/55	merging pick_link() with get_link(), part 2
	fold trailing_symlink() into lookup_last() and do_last().
Now these are returning strings; it's not the final calling conventions,
but it's almost there.  NULL => old 0, we are done.  ERR_PTR(-E...) =>
old -E..., we'd failed.  string => old 1, and the string is the symlink
body to follow.  Just as for trailing_symlink(), "/" and procfs ones
(where get_link() returns NULL) yield "", so the ugly song and dance
with no-op trip through link_path_walk()/handle_dots() still remains.
20/55	merging pick_link() with get_link(), part 3
	elimination of that round-trip.  In *all* cases having
get_link() return NULL on such symlinks means that we'll proceed to
drop the symlink from stack and get back to the point near that
get_link() call - basically, where we would be if it hadn't been
a symlink at all.  The path by which we are getting there depends
upon the call site; the end result is the same in all cases - such
symlinks (procfs ones and symlink to "/") are fully processed by
the time get_link() returns, so we could as well drop them from the
stack right in get_link().  Makes life simpler in terms of control
flow analysis...
	And now the calling conventions for do_last() and lookup_last()
have reached the final shape - ERR_PTR(-E...) for error, NULL for
"we are done", string for "traverse this".
21/55	merging pick_link() with get_link(), part 4
	now all calls of walk_component() are followed by the same
boilerplate - "if it has returned 1, call get_link() and if that
has returned NULL treat that as if walk_component() has returned 0".
Eliminate by folding that into walk_component() itself.  Now
walk_component() return value conventions have joined those of
do_last()/lookup_last().
22/55	merging pick_link() with get_link(), part 5
	same as for the previous, only this time the boilerplate
migrates one level down, into step_into().  Only one caller of
get_link() left, step_into() has joined the same return value
conventions.
23/55	merging pick_link() with get_link(), part 6
	move that thing into pick_link().  Now all traces of
"return 1 if we are following a symlink" are gone.
24/55	finally fold get_link() into pick_link()
	ta-da - expand get_link() into the only caller.  As a side
benefit, we get rid of stashing the inode in nd->link_inode - it
was done only to carry that piece of information from pick_link()
to eventual get_link().  That's not the main benefit, though - the
control flow became considerably easier to reason about.

For what it's worth, the example above (/dev/stdin) becomes
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  On the stack it goes and we get its body.  Which is
"/proc/self/fd/0", so we move to root and return "proc/self/fd/0".
	* do_last() sees non-NULL and returns it - whether it's an error
or a pathname to traverse, we hadn't reached something we'll be opening.
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes.   This time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  So we
drop symlink from stack and return that NULL to caller.
	* that NULL is returned by step_into(), same as if we had just
moved to a non-symlink.
	* do_last() proceeds to open the damn thing.

Some low-hanging fruits become available: LAST_BIND can be removed and
the predicate controlling may_follow_link() we'd moved into pick_link()
in #18 can be made more straightforward:
25/55	LAST_BIND removal
	The only reason to keep it had just been eliminated - it was
needed to route the control flow through that weird last iteration
through the loop.  With that iteration gone...
26/55	invert the meaning of WALK_FOLLOW
27/55	pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT
	In #18 the checks specific to trailing symlinks got moved into
pick_link(), where they were made conditional upon LOOKUP_PARENT in
nd->flags.  That works, but it's more subtle than I would like it to
be - it depends upon the dynamic state (nd->flags) which gets changed
through the pathwalk and it's sensitive to exact locations where we
flip LOOKUP_PARENT.  Now we have a more robust way to do that - the
call chains that end up in pick_link() with LOOKUP_PARENT in nd->flags
are those that had WALK_TRAILING passed to the immediate caller of
pick_link() (step_into()).  So we can pass WALK_... down to pick_link()
and turn the check into explicit "if we are passed WALK_TRAILING,
it's a trailing symlink and we need to apply the checks in may_follow()".
	We could, in principle, reorder these two commits into the
very beginning of symlink series; that would make #18 slightly simpler
at the cost of (marginally) more boilerplate to carry through the
get_link() call moves.  Not sure if it's worth doing, though...

28/55	link_path_walk(): simplify stack handling
	Another cleanup that becomes possible is handling of the stack(s).
We use nd->stack to store two things: pinning down the symlinks we are
resolving and resuming the name traversal when a nested symlink is finished.
    
Currently, nd->depth is used to keep track of both.  It's 0 when we call
link_path_walk() for the first time (for the pathname itself) and 1 on
all subsequent calls (for trailing symlinks, if any).  That's fine,
as far as pinning symlinks goes - when handling a trailing symlink,
the string we are interpreting is the body of symlink pinned down in
nd->stack[0].  It's rather inconvenient with respect to handling nested
symlinks, though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case we need
to pick the string saved back when we started to interpret that nested
symlink and resume its traversal) or not (in which case we are done with
link_path_walk()).
    
Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when running out of
a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.
    
It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.
    
The things get simpler if we keep track of the second set's size explicitly
and always store it in nd->stack[0..depth-1].name.  We access the second set
only inside link_path_walk(), so its size can live in a local variable;
that way the check becomes trivial without the need of that kludge.

part 4.  some mount traversal cleanups.

29/55	massage __follow_mount_rcu() a bit
	make it more similar to non-RCU counterpart
30/55	new helper: traverse_mounts()
	the guts of follow_managed() are very similar to
follow_down().  The calling conventions are different (follow_managed()
works with nameidata, follow_down() - with standalone struct path),
but the core loop is pretty much the same in both.  Turned that loop
into a common helper (traverse_mounts()) and since follow_managed()
becomes a very thin wrapper around it, expand follow_managed() at its
only call site (in handle_mounts()),

part 5.  do_last() untangling.

Control flow in do_last() is an atrocity, and liveness analysis in there
is rather painful.  What follows is a massage of that thing into (hopefully)
more straightforward shape; by the end of the series it's still unpleasant,
but at least easier to follow.

A major source of headache is treatment of "we'd already managed to
open it in ->atomic_open()" and "we'd just created that sucker" cases -
that's what gives complicated control flow graph.  As it is, we
have the following horror:
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      #   |           |
 |      *--------------------->		is it a symlink?
 \------+   |           |		finish_open:
        #   |           |
        +--/            |		finish_open_created:
        #               |
        +---------------/		opened:
        #
To make it even more unpleasant, there is quite a bit of similar,
but not entirely identical logics on parallel branches, some of it
buried in lookup_open() *and* atomic_open() called by it.  Keeping
track of that has been hard and that had lead to more than one bug.

31/55	atomic_open(): return the right dentry in FMODE_OPENED case
	As it is, several invariants do not hold in "we'd already opened
it in ->atomic_open()" case.  In particular, nd->path.dentry might be
pointing to the wrong place by the time we return to do_last() - on
that codepath we don't care anymore.  That both makes it harder to reason
about and serves as an obstacle to transformations that would untangle
that mess.  Fortunately, it's not hard to regularize.
32/55	atomic_open(): lift the call of may_open() into do_last()
	may_open() is called before vfs_open() in "hadn't opened in
->atomic_open()" case.  Rightfully so, since vfs_open() for e.g.
devices can have side effects.  In "opened in ->atomic_open()" case
we have to do it after the actual opening - the whole point is to
combine open with lookup and we only get the information needed for
may_open() after the combined lookup/open has happened.  That's
OK - no side effects are possible in that case.  However, we don't
have to keep that call of may_open() inside fs/namei.c:atomic_open();
as the matter of fact, lifting it into do_last() allows to simplify
life there...
33/55	do_last(): merge the may_open() calls
	... since now we have the "it's already opened" case in
do_last() rejoin the main path at earlier point.

At that point the horror graph from above has become
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      #   |           |
 |      *--------------------->		is it a symlink?
 \------+   |           |		finish_open:
        #   |           |
        +--/------------/		finish_open_created:
        #
34/55	do_last(): don't bother with keeping got_write in FMODE_OPENED case
	Another source of unpleasantness is an attempt to be clever and
keep track of write access status; the thing is, it doesn't really buy
us anything - we could as well drop it right after the lookup_open() and
only regain it for truncation, should such be needed.  Makes for much
simpler cleanups on failures and sets the things up for unification of
"already opened" and "new file" branches with the main path...
35/55	do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
	... which we do here.
36/55	do_last(): simplify the liveness analysis past finish_open_created
	It also makes possible to shrink the liveness intervals for local
variables.
37/55	do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
	Further unification of parallel branches.

At that point we get
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---\				opened by ->atomic_open() or freshly creatd?	
 |  \---+   |				finish_lookup:
 |      #   |
 |      *--------------------->		is it a symlink?
 \------+   |				finish_open:
        |   |
        +--/				finish_open_created:
        #
with very little work done between finish_open: and finish_open_created:,
as well as on any of the side branches.  Moreover, we have a pretty clear
separation: most of the work on _opening_ is after finish_open_created
(some of it - conditional), while the work on lookups and creation is all
before that point.  Even better, most of the local variables are used
either only before or only after that cutoff point.

38/55	split the lookup-related parts of do_last() into a separate helper
	... which allows to separate the lookup-related parts from
open-related ones.

	I'm not saying I'm entirely happy with the resulting state of
do_last() clusterfuck, but it got a lot easier to follow and reason
about.  There are more cleanups possible (and needed) in there, though -
there will be followups.

part 6: ".." handling

	The main problem with .. traversal is that it has open-coded mount
crossing in the end.  That mount crossing used to be identical to that done
after the normal name components, but it got overlooked in several series
(most recently - openat2, prior to that - mount traps and automounts) and
now we have an out-of-sync variant (two of them, actually - RCU and non-RCU
cases) festering there and breeding hard-to-spot bugs.  The most recent
example was when openat2 got extra checks added to the normal mount crossing;
it added the same to RCU case of .. handling, but missed the non-RCU one.
Nobody noticed during the many rounds of review - me, Christoph and Linus
included.

	Another issue is that we are heavier on the locking than we need to
during the rootwards mount traversal there; traversing mounts in other
direction (from mountpoint to mounted) gets by without grabbing mount_lock
exclusive (or dirtying its cacheline in any way, for that matter), even
in non-RCU case.  The same should be the case for mounted-to-mountpoint
transitions - except for the case of very rare races with mount --move
and friends, we should be fine with just the seqcount checks there.

	The following is how .. handling behaves on just about any post-v7
Unix:

	while true
		if the caller is chrooted into directory      // rare
(A)			parent = directory
			break
		if directory is absolute root		      // rare
(B)			parent = directory
			break
		if directory is mounted on top of mountpoint // 2nd most common
(C)			directory = mountpoint
		else					     // the most common
(D)			parent = the parent of directory (within its fs)
			break
	while something is mounted on parent		     // unusual setup
(E)		parent = whatever overmounts it
	return parent

There are 3 paths that execution commonly takes, and they cover almost everything
that occurs in practice:
	1) [A]		We are in / and we stay there
	2) [C,D]	We are in root of mounted filesystem,
			we step into the underlying mountpoint,
			then into the parent of mountpoint.
	3) [D]		The place we are in is not a root of mounted filesystem,
			we step into its parent.
These cases are obvious.  However, other execution paths are possible;
in fact, the only constraint is that if we leave the first loop via (A) or (B),
the body of the second loop (i.e. going from mountpoint to mounted) will be
executed at least as many times as (C) (going from mounted to mountpoint)
had been.

A closer look at the predicates in the above shows that "is absolute root"
is actually "is root of a mount and that mount is not attached to anything"
while "is mounted on top of mountpoint" is "is a root of a mount and the
mount is attached to mountpoint".  Which suggest the following transformation:

choose_mountpoint(mount, &ancestor)
	while mount is attached to something
		d = mountpoint(mount)
		mount = parent(mount)
		if the caller is chrooted into <mount, d>
			break
		if d is a root of mount
			ancestor = <mount, d>
			return true
	return false

handle_dotdot(directory)
	if unlikely(the caller is chrooted into directory)
		goto in_root
	if unlikely(directory is a root of some mount)
		if !choose_mountpoint(mount, &ancestor)
			goto in_root
		directory = ancestor
	parent = the parent of directory (within its fs)
	while unlikely(something is mounted on parent)
		parent = whatever overmounts it
	return parent
in_root:
	parent = directory
	while unlikely(something is mounted on parent)
		parent = whatever overmounts it
	return parent

In this form we have mounted-to-mountpoint mount traversals clearly
separated.  Moreover, required updates of pathwalk context (nameidata)
can be packed into a call of the same primitive (step_into()) we use
for moves into normal components, including the forward mount
traversals.

NO_XDEV and BENEATH checks (added by openat2 series) fit into that
just fine - NO_XDEV at "directory = ancestor" part, BENEATH - at
in_root.  Since the forward mount traversal is done by step_into(),
the regular NO_XDEV checks in there take care of the rest.

The following series massages follow_dotdot/follow_dotdot_rcu() to
that form and does choose_mountpoint() implementation with saner
locking than what we do in mainline now - for RCU case we only need
to check mount_lock seqcount once (in the caller), for non-RCU we
can use a loop similar to what lookup_mnt() does for forward traversals.

39/55	path_connected(): pass mount and dentry separately
40/55	path_parent_directory(): leave changing path->dentry to callers
41/55	follow_dotdot(): expand the call of path_parent_directory()
	currently switching to parent is done inside path_parent_directory(),
called from the loop in follow_dotdot().  These 3 commits lift that into
the loop in follow_dotdot() itself...
42/55	follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop
43/55	follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop
	... and out of the loop(s) (both on the RCU and non-RCU sides)

44/55	move put_link() into handle_dots()
45/55	handle_dots(): return ERR_PTR/NULL instead of -E.../0
46/55	move follow_dotdot() and follow_dotdot_rcu() towards handle_dots()
47/55	follow_dotdot{,_rcu}(): preparation to switch to step_into()
	gather the pieces for step_into() calls we are about to construct.

48/55	follow_dotdot{,_rcu}(): switch to use of step_into()
	... and switch both to it.  Now the RCU and non-RCU variants of
the loop that used to do forward mount traversal on .. are replaced with
step_into() calls...
49/55	lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu
	... which can be consolidated.  We are done with the forward traversal
parts.

50/55	follow_dotdot{,_rcu}(): massage loops
51/55	follow_dotdot_rcu(): be lazy about changing nd->path
52/55	follow_dotdot(): be lazy about changing nd->path
	get the rootwards traversal into shape described above

53/55	helper for mount rootwards traversal
54/55	non-RCU analogue of the previous commit
	... and introduce choose_mountpoint{,_rcu}(), switching both RCU and
non-RCU variants to it.

55/55	fs/namei.c: kill follow_mount()
	detritus removal - the only remaining caller (path_pts()) ought to
use follow_down() anyway.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers
  2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
@ 2020-03-01 21:51   ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 02/55] fix automount/automount race properly Al Viro
                       ` (53 more replies)
  2020-03-01 22:34   ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Linus Torvalds
  2020-03-13 23:53   ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Al Viro
  2 siblings, 54 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

preparation to finish_automount() fix (next commit)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 85b5f7bea82e..dcd015fafe01 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2697,45 +2697,32 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
+			struct path *path, int mnt_flags)
 {
-	struct mountpoint *mp;
-	struct mount *parent;
-	int err;
+	struct mount *parent = real_mount(path->mnt);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
-	mp = lock_mount(path);
-	if (IS_ERR(mp))
-		return PTR_ERR(mp);
-
-	parent = real_mount(path->mnt);
-	err = -EINVAL;
 	if (unlikely(!check_mnt(parent))) {
 		/* that's acceptable only for automounts done in private ns */
 		if (!(mnt_flags & MNT_SHRINKABLE))
-			goto unlock;
+			return -EINVAL;
 		/* ... and for those we'd better have mountpoint still alive */
 		if (!parent->mnt_ns)
-			goto unlock;
+			return -EINVAL;
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	err = -EBUSY;
 	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
 	    path->mnt->mnt_root == path->dentry)
-		goto unlock;
+		return -EBUSY;
 
-	err = -EINVAL;
 	if (d_is_symlink(newmnt->mnt.mnt_root))
-		goto unlock;
+		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	err = graft_tree(newmnt, parent, mp);
-
-unlock:
-	unlock_mount(mp);
-	return err;
+	return graft_tree(newmnt, parent, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -2748,6 +2735,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct vfsmount *mnt;
+	struct mountpoint *mp;
 	struct super_block *sb = fc->root->d_sb;
 	int error;
 
@@ -2768,7 +2756,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
+	mp = lock_mount(mountpoint);
+	if (IS_ERR(mp)) {
+		mntput(mnt);
+		return PTR_ERR(mp);
+	}
+	error = do_add_mount(real_mount(mnt), mp, mountpoint, mnt_flags);
+	unlock_mount(mp);
 	if (error < 0)
 		mntput(mnt);
 	return error;
@@ -2830,6 +2824,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct mount *mnt = real_mount(m);
+	struct mountpoint *mp;
 	int err;
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
@@ -2842,7 +2837,13 @@ int finish_automount(struct vfsmount *m, struct path *path)
 		goto fail;
 	}
 
-	err = do_add_mount(mnt, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	mp = lock_mount(path);
+	if (IS_ERR(mp)) {
+		err = PTR_ERR(mp);
+		goto fail;
+	}
+	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	unlock_mount(mp);
 	if (!err)
 		return 0;
 fail:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 02/55] fix automount/automount race properly
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 03/55] follow_automount(): get rid of dead^Wstillborn code Al Viro
                       ` (52 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Protection against automount/automount races (two threads hitting the same
referral point at the same time) is based upon do_add_mount() prevention of
identical overmounts - trying to overmount the root of mounted tree with
the same tree fails with -EBUSY.  It's unreliable (the other thread might've
mounted something on top of the automount it has triggered) *and* causes
no end of headache for follow_automount() and its caller, since
finish_automount() behaves like do_new_mount() - if the mountpoint to be is
overmounted, it mounts on top what's overmounting it.  It's not only wrong
(we want to go into what's overmounting the automount point and quietly
discard what we planned to mount there), it introduces the possibility of
original parent mount getting dropped.  That's what 8aef18845266 (VFS: Fix
vfsmount overput on simultaneous automount) deals with, but it can't do
anything about the reliability of conflict detection - if something had
been overmounted the other thread's automount (e.g. that other thread
having stepped into automount in mount(2)), we don't get that -EBUSY and
the result is
	 referral point under automounted NFS under explicit overmount
under another copy of automounted NFS

What we need is finish_automount() *NOT* digging into overmounts - if it
finds one, it should just quietly discard the thing it was asked to mount.
And don't bother with actually crossing into the results of finish_automount() -
the same loop that calls follow_automount() will do that just fine on the
next iteration.

IOW, instead of calling lock_mount() have finish_automount() do it manually,
_without_ the "move into overmount and retry" part.  And leave crossing into
the results to the caller of follow_automount(), which simplifies it a lot.

Moral: if you end up with a lot of glue working around the calling conventions
of something, perhaps these calling conventions are simply wrong...

Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 29 ++++-------------------------
 fs/namespace.c | 40 +++++++++++++++++++++++++++++++++-------
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db6565c99825..626eddb33508 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,11 +1208,9 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd,
-			    bool *need_mntput)
+static int follow_automount(struct path *path, struct nameidata *nd)
 {
 	struct vfsmount *mnt;
-	int err;
 
 	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
 		return -EREMOTE;
@@ -1253,29 +1251,10 @@ static int follow_automount(struct path *path, struct nameidata *nd,
 		return PTR_ERR(mnt);
 	}
 
-	if (!mnt) /* mount collision */
-		return 0;
-
-	if (!*need_mntput) {
-		/* lock_mount() may release path->mnt on error */
-		mntget(path->mnt);
-		*need_mntput = true;
-	}
-	err = finish_automount(mnt, path);
-
-	switch (err) {
-	case -EBUSY:
-		/* Someone else made a mount here whilst we were busy */
+	if (!mnt)
 		return 0;
-	case 0:
-		path_put(path);
-		path->mnt = mnt;
-		path->dentry = dget(mnt->mnt_root);
-		return 0;
-	default:
-		return err;
-	}
 
+	return finish_automount(mnt, path);
 }
 
 /*
@@ -1333,7 +1312,7 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd, &need_mntput);
+			ret = follow_automount(path, nd);
 			if (ret < 0)
 				break;
 			continue;
diff --git a/fs/namespace.c b/fs/namespace.c
index dcd015fafe01..777c3116e62e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2823,6 +2823,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 
 int finish_automount(struct vfsmount *m, struct path *path)
 {
+	struct dentry *dentry = path->dentry;
 	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
 	int err;
@@ -2832,21 +2833,46 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	BUG_ON(mnt_get_count(mnt) < 2);
 
 	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == path->dentry) {
+	    m->mnt_root == dentry) {
 		err = -ELOOP;
-		goto fail;
+		goto discard;
 	}
 
-	mp = lock_mount(path);
+	/*
+	 * we don't want to use lock_mount() - in this case finding something
+	 * that overmounts our mountpoint to be means "quitely drop what we've
+	 * got", not "try to mount it on top".
+	 */
+	inode_lock(dentry->d_inode);
+	namespace_lock();
+	if (unlikely(cant_mount(dentry))) {
+		err = -ENOENT;
+		goto discard_locked;
+	}
+	rcu_read_lock();
+	if (unlikely(__lookup_mnt(path->mnt, dentry))) {
+		rcu_read_unlock();
+		err = 0;
+		goto discard_locked;
+	}
+	rcu_read_unlock();
+	mp = get_mountpoint(dentry);
 	if (IS_ERR(mp)) {
 		err = PTR_ERR(mp);
-		goto fail;
+		goto discard_locked;
 	}
+
 	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(mp);
-	if (!err)
-		return 0;
-fail:
+	if (unlikely(err))
+		goto discard;
+	mntput(m);
+	return 0;
+
+discard_locked:
+	namespace_unlock();
+	inode_unlock(dentry->d_inode);
+discard:
 	/* remove m from any expiration list it may be on */
 	if (!list_empty(&mnt->mnt_expire)) {
 		namespace_lock();
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 03/55] follow_automount(): get rid of dead^Wstillborn code
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 02/55] fix automount/automount race properly Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 04/55] follow_automount() doesn't need the entire nameidata Al Viro
                       ` (51 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) no instances of ->d_automount() have ever made use of the "return
ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's
a rudiment of plans that got superseded before the thing went into
the tree.  Despite the comment in follow_automount(), autofs has
never done that.

2) if there's no ->d_automount() in dentry_operations, filesystems
should not set DCACHE_NEED_AUTOMOUNT in the first place.  None have
ever done so...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 28 +++-------------------------
 fs/namespace.c |  9 ++++++++-
 2 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 626eddb33508..39dd56f5171f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1210,10 +1210,7 @@ EXPORT_SYMBOL(follow_up);
  */
 static int follow_automount(struct path *path, struct nameidata *nd)
 {
-	struct vfsmount *mnt;
-
-	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
-		return -EREMOTE;
+	struct dentry *dentry = path->dentry;
 
 	/* We don't want to mount if someone's just doing a stat -
 	 * unless they're stat'ing a directory and appended a '/' to
@@ -1228,33 +1225,14 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 */
 	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-	    path->dentry->d_inode)
+	    dentry->d_inode)
 		return -EISDIR;
 
 	nd->total_link_count++;
 	if (nd->total_link_count >= 40)
 		return -ELOOP;
 
-	mnt = path->dentry->d_op->d_automount(path);
-	if (IS_ERR(mnt)) {
-		/*
-		 * The filesystem is allowed to return -EISDIR here to indicate
-		 * it doesn't want to automount.  For instance, autofs would do
-		 * this so that its userspace daemon can mount on this dentry.
-		 *
-		 * However, we can only permit this if it's a terminal point in
-		 * the path being looked up; if it wasn't then the remainder of
-		 * the path is inaccessible and we should say so.
-		 */
-		if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
-			return -EREMOTE;
-		return PTR_ERR(mnt);
-	}
-
-	if (!mnt)
-		return 0;
-
-	return finish_automount(mnt, path);
+	return finish_automount(dentry->d_op->d_automount(path), path);
 }
 
 /*
diff --git a/fs/namespace.c b/fs/namespace.c
index 777c3116e62e..743980380a8f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2824,9 +2824,16 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
-	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
+	struct mount *mnt;
 	int err;
+
+	if (!m)
+		return 0;
+	if (IS_ERR(m))
+		return PTR_ERR(m);
+
+	mnt = real_mount(m);
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
 	 */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 04/55] follow_automount() doesn't need the entire nameidata
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 02/55] fix automount/automount race properly Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 03/55] follow_automount(): get rid of dead^Wstillborn code Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 05/55] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
                       ` (50 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Only the address of ->total_link_count and the flags.
And fix an off-by-one is ELOOP detection - make it
consistent with symlink following, where we check if
the pre-increment value has reached 40, rather than
check the post-increment one.

[kudos to Christian Brauner for spotted braino]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 39dd56f5171f..6721c5f7e9d5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,7 +1208,7 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd)
+static int follow_automount(struct path *path, int *count, unsigned lookup_flags)
 {
 	struct dentry *dentry = path->dentry;
 
@@ -1223,13 +1223,12 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 * as being automount points.  These will need the attentions
 	 * of the daemon to instantiate them before they can be used.
 	 */
-	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+	if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
 	    dentry->d_inode)
 		return -EISDIR;
 
-	nd->total_link_count++;
-	if (nd->total_link_count >= 40)
+	if (count && (*count)++ >= MAXSYMLINKS)
 		return -ELOOP;
 
 	return finish_automount(dentry->d_op->d_automount(path), path);
@@ -1290,7 +1289,8 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd);
+			ret = follow_automount(path, &nd->total_link_count,
+						nd->flags);
 			if (ret < 0)
 				break;
 			continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 05/55] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (2 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 04/55] follow_automount() doesn't need the entire nameidata Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 06/55] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
                       ` (49 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink".
As it is, we might or might not have LOOKUP_FOLLOW in op->intent
in that case - that depends upon having O_NOFOLLOW in open flags.
It doesn't matter, since we won't be checking it in that case -
do_last() bails out earlier.

However, making sure it's not set (i.e. acting as if we had an explicit
O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the
check for O_CREAT | O_EXCL in do_last() with the call of step_into()
immediately following it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 +++++----------
 fs/open.c  |  4 +++-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6721c5f7e9d5..6938d20aa73a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3396,22 +3396,17 @@ static int do_last(struct nameidata *nd,
 	if (unlikely(error < 0))
 		return error;
 
-	/*
-	 * create/update audit record if it already exists.
-	 */
-	audit_inode(nd->name, path.dentry, 0);
-
-	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-		path_to_nameidata(&path, nd);
-		return -EEXIST;
-	}
-
 	seq = 0;	/* out of RCU mode, so the value doesn't matter */
 	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
+
+	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
+		audit_inode(nd->name, nd->path.dentry, 0);
+		return -EEXIST;
+	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..e5227cd533f4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1049,8 +1049,10 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 
 	if (flags & O_CREAT) {
 		op->intent |= LOOKUP_CREATE;
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			op->intent |= LOOKUP_EXCL;
+			flags |= O_NOFOLLOW;
+		}
 	}
 
 	if (flags & O_DIRECTORY)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 06/55] handle_mounts(): start building a sane wrapper for follow_managed()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (3 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 05/55] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 07/55] atomic_open(): saner calling conventions (return dentry on success) Al Viro
                       ` (48 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers of follow_managed() follow it on success with the same steps -
d_backing_inode(path->dentry) is calculated and stored into some struct inode *
variable and, in all but one case, an unsigned variable (nd->seq to be) is
zeroed.  The single exception is lookup_fast() and there zeroing is correct
thing to do - not doing it is a pointless microoptimization.

	Add a wrapper for follow_managed() that would do that combination.
It's mostly a vehicle for code massage - it will be changing quite a bit,
and the current calling conventions are by no means final.  Right now it
takes path, nameidata and (as out params) inode and seq, similar to
__follow_mount_rcu().  Which will soon get folded into it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6938d20aa73a..c104ec75faef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,6 +1385,18 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
+static inline int handle_mounts(struct path *path, struct nameidata *nd,
+			  struct inode **inode, unsigned int *seqp)
+{
+	int ret = follow_managed(path, nd);
+
+	if (likely(ret >= 0)) {
+		*inode = d_backing_inode(path->dentry);
+		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
+	}
+	return ret;
+}
+
 static int follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct inode *inode = nd->inode;
@@ -1607,7 +1619,6 @@ static int lookup_fast(struct nameidata *nd,
 	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
-	int err;
 
 	/*
 	 * Rename seqlock is not required here because in the off chance
@@ -1677,10 +1688,7 @@ static int lookup_fast(struct nameidata *nd,
 
 	path->mnt = mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd);
-	if (likely(err > 0))
-		*inode = d_backing_inode(path->dentry);
-	return err;
+	return handle_mounts(path, nd, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1875,12 +1883,9 @@ static int walk_component(struct nameidata *nd, int flags)
 			return PTR_ERR(path.dentry);
 
 		path.mnt = nd->path.mnt;
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-
-		seq = 0;	/* we are already out of RCU mode */
-		inode = d_backing_inode(path.dentry);
 	}
 
 	return step_into(nd, &path, flags, inode, seq);
@@ -2365,11 +2370,9 @@ static int handle_lookup_down(struct nameidata *nd)
 			return -ECHILD;
 	} else {
 		dget(path.dentry);
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-		inode = d_backing_inode(path.dentry);
-		seq = 0;
 	}
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
@@ -3392,12 +3395,9 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	error = follow_managed(&path, nd);
+	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-
-	seq = 0;	/* out of RCU mode, so the value doesn't matter */
-	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 07/55] atomic_open(): saner calling conventions (return dentry on success)
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (4 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 06/55] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 08/55] lookup_open(): " Al Viro
                       ` (47 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Currently it either returns -E... or puts (nd->path.mnt,dentry)
into *path and returns 0.  Make it return ERR_PTR(-E...) or
dentry; adjust the caller.  Fewer arguments and it's easier
to keep track of *path contents that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c104ec75faef..5f8b791a6d6e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3087,10 +3087,10 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
  *
  * Returns an error code otherwise.
  */
-static int atomic_open(struct nameidata *nd, struct dentry *dentry,
-			struct path *path, struct file *file,
-			const struct open_flags *op,
-			int open_flag, umode_t mode)
+static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
+				  struct file *file,
+				  const struct open_flags *op,
+				  int open_flag, umode_t mode)
 {
 	struct dentry *const DENTRY_NOT_SET = (void *) -1UL;
 	struct inode *dir =  nd->path.dentry->d_inode;
@@ -3131,17 +3131,15 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
 			}
 			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-			if (unlikely(d_is_negative(dentry))) {
+			if (unlikely(d_is_negative(dentry)))
 				error = -ENOENT;
-			} else {
-				path->dentry = dentry;
-				path->mnt = nd->path.mnt;
-				return 0;
-			}
 		}
 	}
-	dput(dentry);
-	return error;
+	if (error) {
+		dput(dentry);
+		dentry = ERR_PTR(error);
+	}
+	return dentry;
 }
 
 /*
@@ -3236,11 +3234,20 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 
 	if (dir_inode->i_op->atomic_open) {
-		error = atomic_open(nd, dentry, path, file, op, open_flag,
-				    mode);
-		if (unlikely(error == -ENOENT) && create_error)
-			error = create_error;
-		return error;
+		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
+		if (IS_ERR(dentry)) {
+			error = PTR_ERR(dentry);
+			if (unlikely(error == -ENOENT) && create_error)
+				error = create_error;
+			return error;
+		}
+		if (file->f_mode & FMODE_OPENED) {
+			dput(dentry);
+			return 0;
+		}
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+		return 0;
 	}
 
 no_open:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 08/55] lookup_open(): saner calling conventions (return dentry on success)
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (5 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 07/55] atomic_open(): saner calling conventions (return dentry on success) Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 09/55] do_last(): collapse the call of path_to_nameidata() Al Viro
                       ` (46 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

same story as for atomic_open() in the previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5f8b791a6d6e..4946d006ba20 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3157,10 +3157,9 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
  *
  * An error code is returned on failure.
  */
-static int lookup_open(struct nameidata *nd, struct path *path,
-			struct file *file,
-			const struct open_flags *op,
-			bool got_write)
+static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
+				  const struct open_flags *op,
+				  bool got_write)
 {
 	struct dentry *dir = nd->path.dentry;
 	struct inode *dir_inode = dir->d_inode;
@@ -3171,7 +3170,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
-		return -ENOENT;
+		return ERR_PTR(-ENOENT);
 
 	file->f_mode &= ~FMODE_CREATED;
 	dentry = d_lookup(dir, &nd->last);
@@ -3179,7 +3178,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		if (!dentry) {
 			dentry = d_alloc_parallel(dir, &nd->last, &wq);
 			if (IS_ERR(dentry))
-				return PTR_ERR(dentry);
+				return dentry;
 		}
 		if (d_in_lookup(dentry))
 			break;
@@ -3195,7 +3194,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 	if (dentry->d_inode) {
 		/* Cached positive dentry: will open in f_op->open */
-		goto out_no_open;
+		return dentry;
 	}
 
 	/*
@@ -3236,18 +3235,10 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	if (dir_inode->i_op->atomic_open) {
 		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
 		if (IS_ERR(dentry)) {
-			error = PTR_ERR(dentry);
-			if (unlikely(error == -ENOENT) && create_error)
-				error = create_error;
-			return error;
+			if (dentry == ERR_PTR(-ENOENT) && create_error)
+				dentry = ERR_PTR(create_error);
 		}
-		if (file->f_mode & FMODE_OPENED) {
-			dput(dentry);
-			return 0;
-		}
-		path->mnt = nd->path.mnt;
-		path->dentry = dentry;
-		return 0;
+		return dentry;
 	}
 
 no_open:
@@ -3283,14 +3274,11 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		error = create_error;
 		goto out_dput;
 	}
-out_no_open:
-	path->dentry = dentry;
-	path->mnt = nd->path.mnt;
-	return 0;
+	return dentry;
 
 out_dput:
 	dput(dentry);
-	return error;
+	return ERR_PTR(error);
 }
 
 /*
@@ -3309,6 +3297,7 @@ static int do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct path path;
+	struct dentry *dentry;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3365,14 +3354,18 @@ static int do_last(struct nameidata *nd,
 		inode_lock(dir->d_inode);
 	else
 		inode_lock_shared(dir->d_inode);
-	error = lookup_open(nd, &path, file, op, got_write);
+	dentry = lookup_open(nd, file, op, got_write);
 	if (open_flag & O_CREAT)
 		inode_unlock(dir->d_inode);
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (error)
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
 		goto out;
+	}
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3380,6 +3373,7 @@ static int do_last(struct nameidata *nd,
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
+		dput(dentry);
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 09/55] do_last(): collapse the call of path_to_nameidata()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (6 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 08/55] lookup_open(): " Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 10/55] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
                       ` (45 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and shift filling struct path to just before the call of
handle_mounts().  All callers of handle_mounts() are
immediately preceded by path->mnt = nd->path.mnt now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4946d006ba20..f26af0559abf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3364,8 +3364,6 @@ static int do_last(struct nameidata *nd,
 		error = PTR_ERR(dentry);
 		goto out;
 	}
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3382,7 +3380,8 @@ static int do_last(struct nameidata *nd,
 		open_flag &= ~O_TRUNC;
 		will_truncate = false;
 		acc_mode = 0;
-		path_to_nameidata(&path, nd);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
 		goto finish_open_created;
 	}
 
@@ -3396,6 +3395,8 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 10/55] handle_mounts(): pass dentry in, turn path into a pure out argument
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (7 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 09/55] do_last(): collapse the call of path_to_nameidata() Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 11/55] lookup_fast(): consolidate the RCU success case Al Viro
                       ` (44 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers are equivalent to
	path->dentry = dentry;
	path->mnt = nd->path.mnt;
	err = handle_mounts(path, ...)
Pass dentry as an explicit argument, fill *path in handle_mounts()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f26af0559abf..033f91a72bb5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,11 +1385,15 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
-static inline int handle_mounts(struct path *path, struct nameidata *nd,
-			  struct inode **inode, unsigned int *seqp)
+static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
+			  struct path *path, struct inode **inode,
+			  unsigned int *seqp)
 {
-	int ret = follow_managed(path, nd);
+	int ret;
 
+	path->mnt = nd->path.mnt;
+	path->dentry = dentry;
+	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
@@ -1685,10 +1689,7 @@ static int lookup_fast(struct nameidata *nd,
 		dput(dentry);
 		return status;
 	}
-
-	path->mnt = mnt;
-	path->dentry = dentry;
-	return handle_mounts(path, nd, inode, seqp);
+	return handle_mounts(nd, dentry, path, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1859,6 +1860,7 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 static int walk_component(struct nameidata *nd, int flags)
 {
 	struct path path;
+	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
 	int err;
@@ -1877,13 +1879,11 @@ static int walk_component(struct nameidata *nd, int flags)
 	if (unlikely(err <= 0)) {
 		if (err < 0)
 			return err;
-		path.dentry = lookup_slow(&nd->last, nd->path.dentry,
-					  nd->flags);
-		if (IS_ERR(path.dentry))
-			return PTR_ERR(path.dentry);
+		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
 
-		path.mnt = nd->path.mnt;
-		err = handle_mounts(&path, nd, &inode, &seq);
+		err = handle_mounts(nd, dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -2355,7 +2355,7 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path = nd->path;
+	struct path path;
 	struct inode *inode = nd->inode;
 	unsigned seq = nd->seq;
 	int err;
@@ -2366,11 +2366,12 @@ static int handle_lookup_down(struct nameidata *nd)
 		 * at the very beginning of walk, so we lose nothing
 		 * if we simply redo everything in non-RCU mode
 		 */
+		path = nd->path;
 		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
 			return -ECHILD;
 	} else {
-		dget(path.dentry);
-		err = handle_mounts(&path, nd, &inode, &seq);
+		dget(nd->path.dentry);
+		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -3395,9 +3396,7 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
-	error = handle_mounts(&path, nd, &inode, &seq);
+	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
 finish_lookup:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 11/55] lookup_fast(): consolidate the RCU success case
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (8 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 10/55] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 12/55] teach handle_mounts() to handle RCU mode Al Viro
                       ` (43 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
to call unlazy_child() and, should it succeed, handle_mounts().
Note that we have status > 0 (or we wouldn't be calling
__follow_mount_rcu() at all), so all stuff conditional upon
non-positive status won't be even touched.

Consolidate just that sequence after the call of __follow_mount_rcu().

2) calling d_is_negative() and keeping its result is pointless -
we either don't get past checking ->d_seq (and don't use the results of
d_is_negative() at all), or we are guaranteed that ->d_inode and
type bits of ->d_flags had been consistent at the time of d_is_negative()
call.  IOW, we could only get to the use of its result if it's
equal to !inode.  The same ->d_seq check guarantees that after that point
this CPU won't observe ->d_flags values older than ->d_inode update.
So 'negative' variable is completely pointless these days.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 033f91a72bb5..53e859b80b4c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1631,7 +1631,6 @@ static int lookup_fast(struct nameidata *nd,
 	 */
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned seq;
-		bool negative;
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
@@ -1644,7 +1643,6 @@ static int lookup_fast(struct nameidata *nd,
 		 * the dentry name information from lookup.
 		 */
 		*inode = d_backing_inode(dentry);
-		negative = d_is_negative(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
 			return -ECHILD;
 
@@ -1665,12 +1663,15 @@ static int lookup_fast(struct nameidata *nd,
 			 * Note: do negative dentry check after revalidation in
 			 * case that drops it.
 			 */
-			if (unlikely(negative))
+			if (unlikely(!inode))
 				return -ENOENT;
 			path->mnt = mnt;
 			path->dentry = dentry;
 			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
 				return 1;
+			if (unlazy_child(nd, dentry, seq))
+				return -ECHILD;
+			return handle_mounts(nd, dentry, path, inode, seqp);
 		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 12/55] teach handle_mounts() to handle RCU mode
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (9 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 11/55] lookup_fast(): consolidate the RCU success case Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 13/55] lookup_fast(): take mount traversal into callers Al Viro
                       ` (42 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and make the callers of __follow_mount_rcu() use handle_mounts().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 53e859b80b4c..3215b0da6e91 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1393,6 +1393,18 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
+	if (nd->flags & LOOKUP_RCU) {
+		unsigned int seq = *seqp;
+		if (unlikely(!*inode))
+			return -ENOENT;
+		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+			return 1;
+		if (unlazy_child(nd, dentry, seq))
+			return -ECHILD;
+		// *path might've been clobbered by __follow_mount_rcu()
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
 	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
@@ -1620,7 +1632,6 @@ static int lookup_fast(struct nameidata *nd,
 		       struct path *path, struct inode **inode,
 		       unsigned *seqp)
 {
-	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
 
@@ -1658,21 +1669,8 @@ static int lookup_fast(struct nameidata *nd,
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
-		if (likely(status > 0)) {
-			/*
-			 * Note: do negative dentry check after revalidation in
-			 * case that drops it.
-			 */
-			if (unlikely(!inode))
-				return -ENOENT;
-			path->mnt = mnt;
-			path->dentry = dentry;
-			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-				return 1;
-			if (unlazy_child(nd, dentry, seq))
-				return -ECHILD;
+		if (likely(status > 0))
 			return handle_mounts(nd, dentry, path, inode, seqp);
-		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		if (unlikely(status == -ECHILD))
@@ -2361,21 +2359,11 @@ static int handle_lookup_down(struct nameidata *nd)
 	unsigned seq = nd->seq;
 	int err;
 
-	if (nd->flags & LOOKUP_RCU) {
-		/*
-		 * don't bother with unlazy_walk on failure - we are
-		 * at the very beginning of walk, so we lose nothing
-		 * if we simply redo everything in non-RCU mode
-		 */
-		path = nd->path;
-		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
-			return -ECHILD;
-	} else {
+	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
-	}
+	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+		return err;
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 13/55] lookup_fast(): take mount traversal into callers
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (10 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 12/55] teach handle_mounts() to handle RCU mode Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:51     ` [RFC][PATCH v3 14/55] new step_into() flag: WALK_NOFOLLOW Al Viro
                       ` (41 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Current calling conventions: -E... on error, 0 on cache miss,
result of handle_mounts(nd, dentry, path, inode, seqp) on
success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
resp.; deal with handle_mounts() in the callers.  The thing
is, they already do that in cache miss handling case, so we
just need to supply dentry to them and unify the mount traversal
in those cases.  Fewer arguments that way, and we get closer
to merging handle_mounts() and step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3215b0da6e91..4c9b633e1981 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1628,9 +1628,9 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static int lookup_fast(struct nameidata *nd,
-		       struct path *path, struct inode **inode,
-		       unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd,
+				  struct inode **inode,
+			          unsigned *seqp)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1645,8 +1645,8 @@ static int lookup_fast(struct nameidata *nd,
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
-				return -ECHILD;
-			return 0;
+				return ERR_PTR(-ECHILD);
+			return NULL;
 		}
 
 		/*
@@ -1655,7 +1655,7 @@ static int lookup_fast(struct nameidata *nd,
 		 */
 		*inode = d_backing_inode(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		/*
 		 * This sequence count validates that the parent had no
@@ -1665,30 +1665,30 @@ static int lookup_fast(struct nameidata *nd,
 		 *  enough, we can use __read_seqcount_retry here.
 		 */
 		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
-			return handle_mounts(nd, dentry, path, inode, seqp);
+			return dentry;
 		if (unlazy_child(nd, dentry, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		if (unlikely(status == -ECHILD))
 			/* we'd been told to redo it in non-rcu mode */
 			status = d_revalidate(dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
-			return 0;
+			return NULL;
 		status = d_revalidate(dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
 			d_invalidate(dentry);
 		dput(dentry);
-		return status;
+		return ERR_PTR(status);
 	}
-	return handle_mounts(nd, dentry, path, inode, seqp);
+	return dentry;
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1874,19 +1874,18 @@ static int walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return err;
 	}
-	err = lookup_fast(nd, &path, &inode, &seq);
-	if (unlikely(err <= 0)) {
-		if (err < 0)
-			return err;
+	dentry = lookup_fast(nd, &inode, &seq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
-
-		err = handle_mounts(nd, dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
 	}
 
+	err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+			return err;
 	return step_into(nd, &path, flags, inode, seq);
 }
 
@@ -3304,13 +3303,12 @@ static int do_last(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		error = lookup_fast(nd, &path, &inode, &seq);
-		if (likely(error > 0))
+		dentry = lookup_fast(nd, &inode, &seq);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+		if (likely(dentry))
 			goto finish_lookup;
 
-		if (error < 0)
-			return error;
-
 		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
@@ -3385,10 +3383,10 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+finish_lookup:
 	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 14/55] new step_into() flag: WALK_NOFOLLOW
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (11 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 13/55] lookup_fast(): take mount traversal into callers Al Viro
@ 2020-03-01 21:51     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 15/55] fold handle_mounts() into step_into() Al Viro
                       ` (40 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:51 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
Allows to switch handle_lookup_down() to of step_into(), getting
all follow_managed() and step_into() calls paired.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4c9b633e1981..fe48a8d00ae5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1827,7 +1827,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	return 1;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2};
+enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1841,7 +1841,8 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path->dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
+	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
+	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(path, nd);
 		nd->inode = inode;
@@ -2363,10 +2364,7 @@ static int handle_lookup_down(struct nameidata *nd)
 	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 	if (unlikely(err < 0))
 		return err;
-	path_to_nameidata(&path, nd);
-	nd->inode = inode;
-	nd->seq = seq;
-	return 0;
+	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 15/55] fold handle_mounts() into step_into()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (12 preceding siblings ...)
  2020-03-01 21:51     ` [RFC][PATCH v3 14/55] new step_into() flag: WALK_NOFOLLOW Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 16/55] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
                       ` (39 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The following is true:
	* calls of handle_mounts() and step_into() are always
paired in sequences like
	err = handle_mounts(nd, dentry, &path, &inode, &seq);
	if (unlikely(err < 0))
		return err;
	err = step_into(nd, &path, flags, inode, seq);
	* in all such sequences path is uninitialized before and
unused after this pair of calls
	* in all such sequences inode and seq are unused afterwards.

So the call of handle_mounts() can be shifted inside step_into(),
turning 'path' into a local variable in the combined function.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe48a8d00ae5..28835ee7168a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1835,31 +1835,35 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static inline int step_into(struct nameidata *nd, struct path *path,
-			    int flags, struct inode *inode, unsigned seq)
+static int step_into(struct nameidata *nd, int flags,
+		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
+	struct path path;
+	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+
+	if (err < 0)
+		return err;
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	if (likely(!d_is_symlink(path->dentry)) ||
+	if (likely(!d_is_symlink(path.dentry)) ||
 	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
 	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
-		path_to_nameidata(path, nd);
+		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
 		return 0;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
-		if (read_seqcount_retry(&path->dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return -ECHILD;
 	}
-	return pick_link(nd, path, inode, seq);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static int walk_component(struct nameidata *nd, int flags)
 {
-	struct path path;
 	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
@@ -1883,11 +1887,7 @@ static int walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
-
-	err = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-			return err;
-	return step_into(nd, &path, flags, inode, seq);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2354,17 +2354,10 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path;
-	struct inode *inode = nd->inode;
-	unsigned seq = nd->seq;
-	int err;
-
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-		return err;
-	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
+	return step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3283,7 +3276,6 @@ static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path path;
 	struct dentry *dentry;
 	int error;
 
@@ -3382,10 +3374,7 @@ static int do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	error = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(error < 0))
-		return error;
-	error = step_into(nd, &path, 0, inode, seq);
+	error = step_into(nd, 0, dentry, inode, seq);
 	if (unlikely(error))
 		return error;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 16/55] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (13 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 15/55] fold handle_mounts() into step_into() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 17/55] expand the only remaining call of path_lookup_conditional() Al Viro
                       ` (38 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs/dev-ioctl.c |  6 ++--
 fs/internal.h         |  1 -
 fs/namei.c            | 89 ++++-----------------------------------------------
 fs/namespace.c        |  4 +--
 include/linux/namei.h |  2 +-
 5 files changed, 12 insertions(+), 90 deletions(-)

diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
index a3cdb0036c5d..f3a0f412b43b 100644
--- a/fs/autofs/dev-ioctl.c
+++ b/fs/autofs/dev-ioctl.c
@@ -186,7 +186,7 @@ static int find_autofs_mount(const char *pathname,
 	struct path path;
 	int err;
 
-	err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0);
+	err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path);
 	if (err)
 		return err;
 	err = -ENOENT;
@@ -519,8 +519,8 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
 
 	if (!fp || param->ioctlfd == -1) {
 		if (autofs_type_any(type))
-			err = kern_path_mountpoint(AT_FDCWD,
-						   name, &path, LOOKUP_FOLLOW);
+			err = kern_path(name, LOOKUP_FOLLOW | LOOKUP_MOUNTPOINT,
+					&path);
 		else
 			err = find_autofs_mount(name, &path,
 						test_by_type, &type);
diff --git a/fs/internal.h b/fs/internal.h
index f3f280b952a3..b108a8eb75ca 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -61,7 +61,6 @@ extern int finish_clean_context(struct fs_context *fc);
  */
 extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
 			   struct path *path, struct path *root);
-extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
 long do_mknodat(int dfd, const char __user *filename, umode_t mode,
diff --git a/fs/namei.c b/fs/namei.c
index 28835ee7168a..6f1f46b931a6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2382,6 +2382,10 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 	if (!err && nd->flags & LOOKUP_DIRECTORY)
 		if (!d_can_lookup(nd->path.dentry))
 			err = -ENOTDIR;
+	if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
+		err = handle_lookup_down(nd);
+		nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
+	}
 	if (!err) {
 		*path = nd->path;
 		nd->path.mnt = NULL;
@@ -2410,7 +2414,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 		retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);
 
 	if (likely(!retval))
-		audit_inode(name, path->dentry, 0);
+		audit_inode(name, path->dentry,
+			    flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
 	restore_nameidata();
 	putname(name);
 	return retval;
@@ -2688,88 +2693,6 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
 }
 EXPORT_SYMBOL(user_path_at_empty);
 
-/**
- * path_mountpoint - look up a path to be umounted
- * @nd:		lookup context
- * @flags:	lookup flags
- * @path:	pointer to container for result
- *
- * Look up the given name, but don't attempt to revalidate the last component.
- * Returns 0 and "path" will be valid on success; Returns error otherwise.
- */
-static int
-path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
-{
-	const char *s = path_init(nd, flags);
-	int err;
-
-	while (!(err = link_path_walk(s, nd)) &&
-		(err = lookup_last(nd)) > 0) {
-		s = trailing_symlink(nd);
-	}
-	if (!err && (nd->flags & LOOKUP_RCU))
-		err = unlazy_walk(nd);
-	if (!err)
-		err = handle_lookup_down(nd);
-	if (!err) {
-		*path = nd->path;
-		nd->path.mnt = NULL;
-		nd->path.dentry = NULL;
-	}
-	terminate_walk(nd);
-	return err;
-}
-
-static int
-filename_mountpoint(int dfd, struct filename *name, struct path *path,
-			unsigned int flags)
-{
-	struct nameidata nd;
-	int error;
-	if (IS_ERR(name))
-		return PTR_ERR(name);
-	set_nameidata(&nd, dfd, name);
-	error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
-	if (unlikely(error == -ECHILD))
-		error = path_mountpoint(&nd, flags, path);
-	if (unlikely(error == -ESTALE))
-		error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
-	if (likely(!error))
-		audit_inode(name, path->dentry, AUDIT_INODE_NOEVAL);
-	restore_nameidata();
-	putname(name);
-	return error;
-}
-
-/**
- * user_path_mountpoint_at - lookup a path from userland in order to umount it
- * @dfd:	directory file descriptor
- * @name:	pathname from userland
- * @flags:	lookup flags
- * @path:	pointer to container to hold result
- *
- * A umount is a special case for path walking. We're not actually interested
- * in the inode in this situation, and ESTALE errors can be a problem. We
- * simply want track down the dentry and vfsmount attached at the mountpoint
- * and avoid revalidating the last component.
- *
- * Returns 0 and populates "path" on success.
- */
-int
-user_path_mountpoint_at(int dfd, const char __user *name, unsigned int flags,
-			struct path *path)
-{
-	return filename_mountpoint(dfd, getname(name), path, flags);
-}
-
-int
-kern_path_mountpoint(int dfd, const char *name, struct path *path,
-			unsigned int flags)
-{
-	return filename_mountpoint(dfd, getname_kernel(name), path, flags);
-}
-EXPORT_SYMBOL(kern_path_mountpoint);
-
 int __check_sticky(struct inode *dir, struct inode *inode)
 {
 	kuid_t fsuid = current_fsuid();
diff --git a/fs/namespace.c b/fs/namespace.c
index 743980380a8f..a28e4db075ed 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1669,7 +1669,7 @@ int ksys_umount(char __user *name, int flags)
 	struct path path;
 	struct mount *mnt;
 	int retval;
-	int lookup_flags = 0;
+	int lookup_flags = LOOKUP_MOUNTPOINT;
 
 	if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
 		return -EINVAL;
@@ -1680,7 +1680,7 @@ int ksys_umount(char __user *name, int flags)
 	if (!(flags & UMOUNT_NOFOLLOW))
 		lookup_flags |= LOOKUP_FOLLOW;
 
-	retval = user_path_mountpoint_at(AT_FDCWD, name, lookup_flags, &path);
+	retval = user_path_at(AT_FDCWD, name, lookup_flags, &path);
 	if (retval)
 		goto out;
 	mnt = real_mount(path.mnt);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 0dd980d7318f..d9576a051808 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -23,6 +23,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_AUTOMOUNT	0x0004  /* force terminal automount */
 #define LOOKUP_EMPTY		0x4000	/* accept empty path [user_... only] */
 #define LOOKUP_DOWN		0x8000	/* follow mounts in the starting point */
+#define LOOKUP_MOUNTPOINT	0x0080	/* follow mounts in the end */
 
 #define LOOKUP_REVAL		0x0020	/* tell ->d_revalidate() to trust no cache */
 #define LOOKUP_RCU		0x0040	/* RCU pathwalk mode; semi-internal */
@@ -64,7 +65,6 @@ extern struct dentry *kern_path_create(int, const char *, struct path *, unsigne
 extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
 extern void done_path_create(struct path *, struct dentry *);
 extern struct dentry *kern_path_locked(const char *, struct path *);
-extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int);
 
 extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
 extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 17/55] expand the only remaining call of path_lookup_conditional()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (14 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 16/55] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 18/55] merging pick_link() with get_link(), part 1 Al Viro
                       ` (37 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6f1f46b931a6..3eed5784942a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -858,13 +858,6 @@ static int set_root(struct nameidata *nd)
 	return 0;
 }
 
-static void path_put_conditional(struct path *path, struct nameidata *nd)
-{
-	dput(path->dentry);
-	if (path->mnt != nd->path.mnt)
-		mntput(path->mnt);
-}
-
 static inline void path_to_nameidata(const struct path *path,
 					struct nameidata *nd)
 {
@@ -1312,8 +1305,11 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 		ret = 1;
 	if (ret > 0 && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0))
-		path_put_conditional(path, nd);
+	if (unlikely(ret < 0)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	}
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 18/55] merging pick_link() with get_link(), part 1
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (15 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 17/55] expand the only remaining call of path_lookup_conditional() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 19/55] merging pick_link() with get_link(), part 2 Al Viro
                       ` (36 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
the call of get_link() (nothing _currently_ uses them in there).
That allows to moved the call of may_follow_link() into get_link()
as well, since now the presence of LOOKUP_PARENT distinguishes
the callers from each other (link_path_walk() has it, trailing_symlink()
doesn't).

Preparations for folding trailing_symlink() into callers (lookup_last()
and do_last()) and changing the calling conventions of those.  Next
stage after that will have get_link() call migrate into walk_component(),
then - into step_into().  It's tricky enough to warrant doing that
in stages, unfortunately...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3eed5784942a..3594f6a4998b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1115,6 +1115,12 @@ const char *get_link(struct nameidata *nd)
 	int error;
 	const char *res;
 
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
 	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
 		return ERR_PTR(-ELOOP);
 
@@ -2329,13 +2335,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static const char *trailing_symlink(struct nameidata *nd)
 {
-	const char *s;
-	int error = may_follow_link(nd);
-	if (unlikely(error))
-		return ERR_PTR(error);
+	const char *s = get_link(nd);
 	nd->flags |= LOOKUP_PARENT;
 	nd->stack[0].name = NULL;
-	s = get_link(nd);
 	return s ? s : "";
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 19/55] merging pick_link() with get_link(), part 2
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (16 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 18/55] merging pick_link() with get_link(), part 1 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 20/55] merging pick_link() with get_link(), part 3 Al Viro
                       ` (35 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Fold trailing_symlink() into lookup_last() and do_last(), change
the calling conventions of those two.  Rules change:
	success, we are done => NULL instead of 0
	error	=> ERR_PTR(-E...) instead of -E...
	got a symlink to follow => return the path to be followed instead of 1

The loops calling those (in path_lookupat() and path_openat()) adjusted.

A subtle change of control flow here: originally a pure-jump trailing
symlink ("/" or procfs one) would've passed through the upper level
loop once more, with "" for path to traverse.  That would've brought
us back to the lookup_last/do_last entry and we would've hit LAST_BIND
case (LAST_BIND left from get_link() called by trailing_symlink())
and pretty much skip to the point right after where we'd left the
sucker back when we picked that trailing symlink.

Now we don't bother with that extra pass through the upper level
loop - if get_link() says "I've just done a pure jump, nothing
else to do", we just treat that as non-symlink case.

Boilerplate added on that step will go away shortly - it'll migrate
into walk_component() and then to step_into(), collapsing into the
change of calling conventions for those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 68 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3594f6a4998b..888b1e5b994e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2333,21 +2333,26 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	return s;
 }
 
-static const char *trailing_symlink(struct nameidata *nd)
-{
-	const char *s = get_link(nd);
-	nd->flags |= LOOKUP_PARENT;
-	nd->stack[0].name = NULL;
-	return s ? s : "";
-}
-
-static inline int lookup_last(struct nameidata *nd)
+static inline const char *lookup_last(struct nameidata *nd)
 {
+	int err;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	return walk_component(nd, 0);
+	err = walk_component(nd, 0);
+	if (unlikely(err)) {
+		const char *s;
+		if (err < 0)
+			return PTR_ERR(err);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
+	return NULL;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
@@ -2370,10 +2375,9 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 			s = ERR_PTR(err);
 	}
 
-	while (!(err = link_path_walk(s, nd))
-		&& ((err = lookup_last(nd)) > 0)) {
-		s = trailing_symlink(nd);
-	}
+	while (!(err = link_path_walk(s, nd)) &&
+	       (s = lookup_last(nd)) != NULL)
+		;
 	if (!err)
 		err = complete_walk(nd);
 
@@ -3185,7 +3189,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 /*
  * Handle the last step of open()
  */
-static int do_last(struct nameidata *nd,
+static const char *do_last(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
@@ -3206,7 +3210,7 @@ static int do_last(struct nameidata *nd,
 	if (nd->last_type != LAST_NORM) {
 		error = handle_dots(nd, nd->last_type);
 		if (unlikely(error))
-			return error;
+			return ERR_PTR(error);
 		goto finish_open;
 	}
 
@@ -3216,7 +3220,7 @@ static int do_last(struct nameidata *nd,
 		/* we _can_ be in RCU mode here */
 		dentry = lookup_fast(nd, &inode, &seq);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 		if (likely(dentry))
 			goto finish_lookup;
 
@@ -3231,12 +3235,12 @@ static int do_last(struct nameidata *nd,
 		 */
 		error = complete_walk(nd);
 		if (error)
-			return error;
+			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))
-			return -EISDIR;
+			return ERR_PTR(-EISDIR);
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
@@ -3296,18 +3300,28 @@ static int do_last(struct nameidata *nd,
 
 finish_lookup:
 	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error))
-		return error;
+	if (unlikely(error)) {
+		const char *s;
+		if (error < 0)
+			return ERR_PTR(error);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
 		audit_inode(nd->name, nd->path.dentry, 0);
-		return -EEXIST;
+		return ERR_PTR(-EEXIST);
 	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
 	if (error)
-		return error;
+		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		error = -EISDIR;
@@ -3349,7 +3363,7 @@ static int do_last(struct nameidata *nd,
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	return error;
+	return ERR_PTR(error);
 }
 
 struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
@@ -3452,10 +3466,8 @@ static struct file *path_openat(struct nameidata *nd,
 	} else {
 		const char *s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
-			(error = do_last(nd, file, op)) > 0) {
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			s = trailing_symlink(nd);
-		}
+		       (s = do_last(nd, file, op)) != NULL)
+			;
 		terminate_walk(nd);
 	}
 	if (likely(!error)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 20/55] merging pick_link() with get_link(), part 3
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (17 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 19/55] merging pick_link() with get_link(), part 2 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 21/55] merging pick_link() with get_link(), part 4 Al Viro
                       ` (34 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 888b1e5b994e..46cd3e5cb052 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1154,7 +1154,9 @@ const char *get_link(struct nameidata *nd)
 		} else {
 			res = get(dentry, inode, &last->done);
 		}
-		if (IS_ERR_OR_NULL(res))
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
 			return res;
 	}
 	if (*res == '/') {
@@ -1164,9 +1166,11 @@ const char *get_link(struct nameidata *nd)
 		while (unlikely(*++res == '/'))
 			;
 	}
-	if (!*res)
-		res = NULL;
-	return res;
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 /*
@@ -2211,11 +2215,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 			if (IS_ERR(s))
 				return PTR_ERR(s);
-			err = 0;
-			if (unlikely(!s)) {
-				/* jumped */
-				put_link(nd);
-			} else {
+			if (likely(s)) {
 				nd->stack[nd->depth - 1].name = name;
 				name = s;
 				continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 21/55] merging pick_link() with get_link(), part 4
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (18 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 20/55] merging pick_link() with get_link(), part 3 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 22/55] merging pick_link() with get_link(), part 5 Al Viro
                       ` (33 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move the call of get_link() into walk_component().  Change the
calling conventions for walk_component() to returning the link
body to follow (if any).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 59 ++++++++++++++++++++++++++---------------------------------
 1 file changed, 26 insertions(+), 33 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 46cd3e5cb052..21bd4cebd71b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1868,7 +1868,7 @@ static int step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq);
 }
 
-static int walk_component(struct nameidata *nd, int flags)
+static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1883,17 +1883,22 @@ static int walk_component(struct nameidata *nd, int flags)
 		err = handle_dots(nd, nd->last_type);
 		if (!(flags & WALK_MORE) && nd->depth)
 			put_link(nd);
-		return err;
+		return ERR_PTR(err);
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
 	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
+		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 	}
-	return step_into(nd, flags, dentry, inode, seq);
+	err = step_into(nd, flags, dentry, inode, seq);
+	if (!err)
+		return NULL;
+	if (err > 0)
+		return get_link(nd);
+	return ERR_PTR(err);
 }
 
 /*
@@ -2145,6 +2150,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
+		const char *link;
 		u64 hash_len;
 		int type;
 
@@ -2202,24 +2208,18 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			err = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, WALK_FOLLOW);
 		} else {
 			/* not the last component */
-			err = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
 		}
-		if (err < 0)
-			return err;
-
-		if (err) {
-			const char *s = get_link(nd);
-
-			if (IS_ERR(s))
-				return PTR_ERR(s);
-			if (likely(s)) {
-				nd->stack[nd->depth - 1].name = name;
-				name = s;
-				continue;
-			}
+		if (unlikely(link)) {
+			if (IS_ERR(link))
+				return PTR_ERR(link);
+			/* a symlink to follow */
+			nd->stack[nd->depth - 1].name = name;
+			name = link;
+			continue;
 		}
 		if (unlikely(!d_can_lookup(nd->path.dentry))) {
 			if (nd->flags & LOOKUP_RCU) {
@@ -2335,24 +2335,17 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static inline const char *lookup_last(struct nameidata *nd)
 {
-	int err;
+	const char *link;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	err = walk_component(nd, 0);
-	if (unlikely(err)) {
-		const char *s;
-		if (err < 0)
-			return PTR_ERR(err);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	link = walk_component(nd, 0);
+	if (link) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->stack[0].name = NULL;
 	}
-	return NULL;
+	return link;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 22/55] merging pick_link() with get_link(), part 5
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (19 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 21/55] merging pick_link() with get_link(), part 4 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 23/55] merging pick_link() with get_link(), part 6 Al Viro
                       ` (32 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move get_link() call into step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 43 ++++++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 21bd4cebd71b..3388958bb32f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1841,14 +1841,14 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static int step_into(struct nameidata *nd, int flags,
+static const char *step_into(struct nameidata *nd, int flags,
 		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
 	struct path path;
 	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
 
 	if (err < 0)
-		return err;
+		return ERR_PTR(err);
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path.dentry)) ||
@@ -1858,14 +1858,17 @@ static int step_into(struct nameidata *nd, int flags,
 		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
-		return 0;
+		return NULL;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 	}
-	return pick_link(nd, &path, inode, seq);
+	err = pick_link(nd, &path, inode, seq);
+	if (err > 0)
+		return get_link(nd);
+	return ERR_PTR(err);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
@@ -1893,12 +1896,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 	}
-	err = step_into(nd, flags, dentry, inode, seq);
-	if (!err)
-		return NULL;
-	if (err > 0)
-		return get_link(nd);
-	return ERR_PTR(err);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2352,8 +2350,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq);
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3195,6 +3193,7 @@ static const char *do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
+	const char *link;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3292,18 +3291,12 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error)) {
-		const char *s;
-		if (error < 0)
-			return ERR_PTR(error);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	link = step_into(nd, 0, dentry, inode, seq);
+	if (unlikely(link)) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+		nd->stack[0].name = NULL;
+		return link;
 	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 23/55] merging pick_link() with get_link(), part 6
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (20 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 22/55] merging pick_link() with get_link(), part 5 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 24/55] finally fold get_link() into pick_link() Al Viro
                       ` (31 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move the only remaining call of get_link() into pick_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3388958bb32f..fef2c447219d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1793,14 +1793,14 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static int pick_link(struct nameidata *nd, struct path *link,
+static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
 	int error;
 	struct saved *last;
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
-		return -ELOOP;
+		return ERR_PTR(-ELOOP);
 	}
 	if (!(nd->flags & LOOKUP_RCU)) {
 		if (link->mnt == nd->path.mnt)
@@ -1821,7 +1821,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 		}
 		if (error) {
 			path_put(link);
-			return error;
+			return ERR_PTR(error);
 		}
 	}
 
@@ -1830,7 +1830,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	clear_delayed_call(&last->done);
 	nd->link_inode = inode;
 	last->seq = seq;
-	return 1;
+	return get_link(nd);
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
@@ -1865,10 +1865,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
 	}
-	err = pick_link(nd, &path, inode, seq);
-	if (err > 0)
-		return get_link(nd);
-	return ERR_PTR(err);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 24/55] finally fold get_link() into pick_link()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (21 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 23/55] merging pick_link() with get_link(), part 6 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 25/55] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
                       ` (30 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

kill nd->link_inode, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 135 ++++++++++++++++++++++++++++---------------------------------
 1 file changed, 61 insertions(+), 74 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fef2c447219d..9b6c3e3edd75 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -503,7 +503,6 @@ struct nameidata {
 	} *stack, internal[EMBEDDED_LEVELS];
 	struct filename	*name;
 	struct nameidata *saved;
-	struct inode	*link_inode;
 	unsigned	root_seq;
 	int		dfd;
 } __randomize_layout;
@@ -962,9 +961,8 @@ int sysctl_protected_regular __read_mostly;
  *
  * Returns 0 if following the symlink is allowed, -ve on error.
  */
-static inline int may_follow_link(struct nameidata *nd)
+static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
 {
-	const struct inode *inode;
 	const struct inode *parent;
 	kuid_t puid;
 
@@ -972,7 +970,6 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if owner and follower match. */
-	inode = nd->link_inode;
 	if (uid_eq(current_cred()->fsuid, inode->i_uid))
 		return 0;
 
@@ -1106,73 +1103,6 @@ static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid,
 	return 0;
 }
 
-static __always_inline
-const char *get_link(struct nameidata *nd)
-{
-	struct saved *last = nd->stack + nd->depth - 1;
-	struct dentry *dentry = last->link.dentry;
-	struct inode *inode = nd->link_inode;
-	int error;
-	const char *res;
-
-	if (!(nd->flags & LOOKUP_PARENT)) {
-		error = may_follow_link(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-	}
-
-	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
-		return ERR_PTR(-ELOOP);
-
-	if (!(nd->flags & LOOKUP_RCU)) {
-		touch_atime(&last->link);
-		cond_resched();
-	} else if (atime_needs_update(&last->link, inode)) {
-		if (unlikely(unlazy_walk(nd)))
-			return ERR_PTR(-ECHILD);
-		touch_atime(&last->link);
-	}
-
-	error = security_inode_follow_link(dentry, inode,
-					   nd->flags & LOOKUP_RCU);
-	if (unlikely(error))
-		return ERR_PTR(error);
-
-	nd->last_type = LAST_BIND;
-	res = READ_ONCE(inode->i_link);
-	if (!res) {
-		const char * (*get)(struct dentry *, struct inode *,
-				struct delayed_call *);
-		get = inode->i_op->get_link;
-		if (nd->flags & LOOKUP_RCU) {
-			res = get(NULL, inode, &last->done);
-			if (res == ERR_PTR(-ECHILD)) {
-				if (unlikely(unlazy_walk(nd)))
-					return ERR_PTR(-ECHILD);
-				res = get(dentry, inode, &last->done);
-			}
-		} else {
-			res = get(dentry, inode, &last->done);
-		}
-		if (!res)
-			goto all_done;
-		if (IS_ERR(res))
-			return res;
-	}
-	if (*res == '/') {
-		error = nd_jump_root(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		while (unlikely(*++res == '/'))
-			;
-	}
-	if (*res)
-		return res;
-all_done: // pure jump
-	put_link(nd);
-	return NULL;
-}
-
 /*
  * follow_up - Find the mountpoint of path's vfsmount
  *
@@ -1796,8 +1726,10 @@ static inline int handle_dots(struct nameidata *nd, int type)
 static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
-	int error;
 	struct saved *last;
+	const char *res;
+	int error;
+
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
 		return ERR_PTR(-ELOOP);
@@ -1828,9 +1760,64 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	nd->link_inode = inode;
 	last->seq = seq;
-	return get_link(nd);
+
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd, inode);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
+	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
+		return ERR_PTR(-ELOOP);
+
+	if (!(nd->flags & LOOKUP_RCU)) {
+		touch_atime(&last->link);
+		cond_resched();
+	} else if (atime_needs_update(&last->link, inode)) {
+		if (unlikely(unlazy_walk(nd)))
+			return ERR_PTR(-ECHILD);
+		touch_atime(&last->link);
+	}
+
+	error = security_inode_follow_link(link->dentry, inode,
+					   nd->flags & LOOKUP_RCU);
+	if (unlikely(error))
+		return ERR_PTR(error);
+
+	nd->last_type = LAST_BIND;
+	res = READ_ONCE(inode->i_link);
+	if (!res) {
+		const char * (*get)(struct dentry *, struct inode *,
+				struct delayed_call *);
+		get = inode->i_op->get_link;
+		if (nd->flags & LOOKUP_RCU) {
+			res = get(NULL, inode, &last->done);
+			if (res == ERR_PTR(-ECHILD)) {
+				if (unlikely(unlazy_walk(nd)))
+					return ERR_PTR(-ECHILD);
+				res = get(link->dentry, inode, &last->done);
+			}
+		} else {
+			res = get(link->dentry, inode, &last->done);
+		}
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
+			return res;
+	}
+	if (*res == '/') {
+		error = nd_jump_root(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+		while (unlikely(*++res == '/'))
+			;
+	}
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 25/55] sanitize handling of nd->last_type, kill LAST_BIND
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (22 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 24/55] finally fold get_link() into pick_link() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 26/55] namei: invert the meaning of WALK_FOLLOW Al Viro
                       ` (29 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are checked in walk_component(), lookup_last() and do_last().
They also get copied to the caller by filename_parentat().  In the last
3 cases the value is what we had at the return from link_path_walk().
In case of walk_component() it's either directly downstream from
assignment in link_path_walk() or, when called by lookup_last(), the
value we have at the return from link_path_walk().

The value at the entry into link_path_walk() can survive to return only
if the pathname contains nothing but slashes.  Note that pick_link()
never returns such - pure jumps are handled directly.  So for the calls
of link_path_walk() for trailing symlinks it does not matter what value
had been there at the entry; the value at the return won't depend upon it.

There are 3 call chains that might have pick_link() storing LAST_BIND:

1) pick_link() from step_into() from walk_component() from
link_path_walk().  In that case we will either be parsing the next
component immediately after return into link_path_walk(), which will
overwrite the ->last_type before anyone has a chance to look at it,
or we'll fail, in which case nobody will be looking at ->last_type at all.

2) pick_link() from step_into() from walk_component() from lookup_last().
The value is never looked at due to the above; it won't affect the value
seen at return from any link_path_walk().

3) pick_link() from step_into() from do_last().  Ditto.

In other words, assignemnt in pick_link() is pointless, and so is
LAST_BIND itself; nothing ever looks at that value.  Kill it off.
And make link_path_walk() _always_ assign ->last_type - in the only
case when the value at the entry might survive to the return that value
is always LAST_ROOT, inherited from path_init().  Move that assignment
from path_init() into the beginning of link_path_walk(), to consolidate
the things.

Historical note: LAST_BIND used to be used for the kludge with trailing
pure jump symlinks (extra iteration through the top-level loop).
No point keeping it anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/path-lookup.rst | 7 ++-----
 fs/namei.c                                | 3 +--
 include/linux/namei.h                     | 2 +-
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index a3216979298b..f46b05e9b96c 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -404,11 +404,8 @@ that is the "next" component in the pathname.
 ``int last_type``
 ~~~~~~~~~~~~~~~~~
 
-This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or
-``LAST_BIND``.  The ``last`` field is only valid if the type is
-``LAST_NORM``.  ``LAST_BIND`` is used when following a symlink and no
-components of the symlink have been processed yet.  Others should be
-fairly self-explanatory.
+This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT`` or ``LAST_DOTDOT``.
+The ``last`` field is only valid if the type is ``LAST_NORM``.
 
 ``struct path root``
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/fs/namei.c b/fs/namei.c
index 9b6c3e3edd75..56d4adbf4da1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1785,7 +1785,6 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	if (unlikely(error))
 		return ERR_PTR(error);
 
-	nd->last_type = LAST_BIND;
 	res = READ_ONCE(inode->i_link);
 	if (!res) {
 		const char * (*get)(struct dentry *, struct inode *,
@@ -2123,6 +2122,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 {
 	int err;
 
+	nd->last_type = LAST_ROOT;
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 	while (*name=='/')
@@ -2224,7 +2224,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
 
-	nd->last_type = LAST_ROOT; /* if there are only slashes... */
 	nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
 	nd->depth = 0;
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d9576a051808..a4bb992623c4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -15,7 +15,7 @@ enum { MAX_NESTED_LINKS = 8 };
 /*
  * Type of the last component on LOOKUP_PARENT
  */
-enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
+enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 
 /* pathwalk mode */
 #define LOOKUP_FOLLOW		0x0001	/* follow links at the end */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 26/55] namei: invert the meaning of WALK_FOLLOW
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (23 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 25/55] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 27/55] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
                       ` (28 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
That's what that flag had really been used for.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 56d4adbf4da1..f994ed10ef39 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1819,7 +1819,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	return NULL;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1838,8 +1838,8 @@ static const char *step_into(struct nameidata *nd, int flags,
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	if (likely(!d_is_symlink(path.dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
-	   flags & WALK_NOFOLLOW) {
+	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
+	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(&path, nd);
 		nd->inode = inode;
@@ -2190,10 +2190,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			link = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, 0);
 		} else {
 			/* not the last component */
-			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_MORE);
 		}
 		if (unlikely(link)) {
 			if (IS_ERR(link))
@@ -2321,7 +2321,7 @@ static inline const char *lookup_last(struct nameidata *nd)
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	link = walk_component(nd, 0);
+	link = walk_component(nd, WALK_TRAILING);
 	if (link) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->stack[0].name = NULL;
@@ -3274,7 +3274,7 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
-	link = step_into(nd, 0, dentry, inode, seq);
+	link = step_into(nd, WALK_TRAILING, dentry, inode, seq);
 	if (unlikely(link)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 27/55] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (24 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 26/55] namei: invert the meaning of WALK_FOLLOW Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 28/55] link_path_walk(): simplify stack handling Al Viro
                       ` (27 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f994ed10ef39..56bbbb9c0c0d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1723,8 +1723,10 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq)
+		     struct inode *inode, unsigned seq, int flags)
 {
 	struct saved *last;
 	const char *res;
@@ -1762,7 +1764,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	clear_delayed_call(&last->done);
 	last->seq = seq;
 
-	if (!(nd->flags & LOOKUP_PARENT)) {
+	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
 		if (unlikely(error))
 			return ERR_PTR(error);
@@ -1819,8 +1821,6 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	return NULL;
 }
 
-enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-
 /*
  * Do we need to follow links? We _really_ want to be able
  * to do this check without having to look at inode->i_op,
@@ -1851,7 +1851,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
 	}
-	return pick_link(nd, &path, inode, seq);
+	return pick_link(nd, &path, inode, seq, flags);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 28/55] link_path_walk(): simplify stack handling
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (25 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 27/55] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 29/55] massage __follow_mount_rcu() a bit Al Viro
                       ` (26 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

We use nd->stack to store two things: pinning down the symlinks
we are resolving and resuming the name traversal when a nested
symlink is finished.

Currently, nd->depth is used to keep track of both.  It's 0 when
we call link_path_walk() for the first time (for the pathname
itself) and 1 on all subsequent calls (for trailing symlinks,
if any).  That's fine, as far as pinning symlinks goes - when
handling a trailing symlink, the string we are interpreting
is the body of symlink pinned down in nd->stack[0].  It's
rather inconvenient with respect to handling nested symlinks,
though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case
we need to pick the string saved back when we started to interpret
that nested symlink and resume its traversal) or not (in which
case we are done with link_path_walk()).

Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when
running out of a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.

It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.

The things get simpler if we keep track of the second set's size
explicitly and always store it in nd->stack[0..depth-1].name.
We access the second set only inside link_path_walk(), so its
size can live in a local variable; that way the check becomes
trivial without the need of that kludge.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 56bbbb9c0c0d..5fe499399fcb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2120,6 +2120,7 @@ static inline u64 hash_name(const void *salt, const char *name)
  */
 static int link_path_walk(const char *name, struct nameidata *nd)
 {
+	int depth = 0; // depth <= nd->depth
 	int err;
 
 	nd->last_type = LAST_ROOT;
@@ -2182,14 +2183,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		} while (unlikely(*name == '/'));
 		if (unlikely(!*name)) {
 OK:
-			/* pathname body, done */
-			if (!nd->depth)
-				return 0;
-			name = nd->stack[nd->depth - 1].name;
-			/* trailing symlink, done */
-			if (!name)
+			/* pathname or trailing symlin, done */
+			if (!depth)
 				return 0;
 			/* last component of nested symlink */
+			name = nd->stack[--depth].name;
 			link = walk_component(nd, 0);
 		} else {
 			/* not the last component */
@@ -2199,7 +2197,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (IS_ERR(link))
 				return PTR_ERR(link);
 			/* a symlink to follow */
-			nd->stack[nd->depth - 1].name = name;
+			nd->stack[depth++].name = name;
 			name = link;
 			continue;
 		}
@@ -2324,7 +2322,6 @@ static inline const char *lookup_last(struct nameidata *nd)
 	link = walk_component(nd, WALK_TRAILING);
 	if (link) {
 		nd->flags |= LOOKUP_PARENT;
-		nd->stack[0].name = NULL;
 	}
 	return link;
 }
@@ -3278,7 +3275,6 @@ static const char *do_last(struct nameidata *nd,
 	if (unlikely(link)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-		nd->stack[0].name = NULL;
 		return link;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 29/55] massage __follow_mount_rcu() a bit
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (26 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 28/55] link_path_walk(): simplify stack handling Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 30/55] new helper: traverse_mounts() Al Viro
                       ` (25 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

make the loop more similar to that in follow_managed(), with
explicit tracking of flags, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 70 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5fe499399fcb..02cdd1208062 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1269,12 +1269,6 @@ int follow_down_one(struct path *path)
 }
 EXPORT_SYMBOL(follow_down_one);
 
-static inline int managed_dentry_rcu(const struct path *path)
-{
-	return (path->dentry->d_flags & DCACHE_MANAGE_TRANSIT) ?
-		path->dentry->d_op->d_manage(path, true) : 0;
-}
-
 /*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
@@ -1282,43 +1276,49 @@ static inline int managed_dentry_rcu(const struct path *path)
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 			       struct inode **inode, unsigned *seqp)
 {
+	struct dentry *dentry = path->dentry;
+	unsigned int flags = dentry->d_flags;
+
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
+		return true;
+
+	if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+		return false;
+
 	for (;;) {
-		struct mount *mounted;
 		/*
 		 * Don't forget we might have a non-mountpoint managed dentry
 		 * that wants to block transit.
 		 */
-		switch (managed_dentry_rcu(path)) {
-		case -ECHILD:
-		default:
-			return false;
-		case -EISDIR:
-			return true;
-		case 0:
-			break;
+		if (unlikely(flags & DCACHE_MANAGE_TRANSIT)) {
+			int res = dentry->d_op->d_manage(path, true);
+			if (res)
+				return res == -EISDIR;
+			flags = dentry->d_flags;
 		}
 
-		if (!d_mountpoint(path->dentry))
-			return !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
-
-		mounted = __lookup_mnt(path->mnt, path->dentry);
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return false;
-		path->mnt = &mounted->mnt;
-		path->dentry = mounted->mnt.mnt_root;
-		nd->flags |= LOOKUP_JUMPED;
-		*seqp = read_seqcount_begin(&path->dentry->d_seq);
-		/*
-		 * Update the inode too. We don't need to re-check the
-		 * dentry sequence number here after this d_inode read,
-		 * because a mount-point is always pinned.
-		 */
-		*inode = path->dentry->d_inode;
+		if (flags & DCACHE_MOUNTED) {
+			struct mount *mounted = __lookup_mnt(path->mnt, dentry);
+			if (mounted) {
+				path->mnt = &mounted->mnt;
+				dentry = path->dentry = mounted->mnt.mnt_root;
+				nd->flags |= LOOKUP_JUMPED;
+				*seqp = read_seqcount_begin(&dentry->d_seq);
+				*inode = dentry->d_inode;
+				/*
+				 * We don't need to re-check ->d_seq after this
+				 * ->d_inode read - there will be an RCU delay
+				 * between mount hash removal and ->mnt_root
+				 * becoming unpinned.
+				 */
+				flags = dentry->d_flags;
+				continue;
+			}
+			if (read_seqretry(&mount_lock, nd->m_seq))
+				return false;
+		}
+		return !(flags & DCACHE_NEED_AUTOMOUNT);
 	}
-	return !read_seqretry(&mount_lock, nd->m_seq) &&
-		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 30/55] new helper: traverse_mounts()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (27 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 29/55] massage __follow_mount_rcu() a bit Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 31/55] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
                       ` (24 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

common guts of follow_down() and follow_managed() taken to a new
helper - traverse_mounts().  The remnants of follow_managed()
are folded into its sole remaining caller (handle_mounts()).
Calling conventions of handle_mounts() slightly sanitized -
instead of the weird "1 for success, -E... for failure" that used
to be imposed by the calling conventions of walk_component() et.al.
we can use the normal "0 for success, -E... for failure".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 177 +++++++++++++++++++++++++------------------------------------
 1 file changed, 72 insertions(+), 105 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 02cdd1208062..56d39a8cfc3e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1168,91 +1168,79 @@ static int follow_automount(struct path *path, int *count, unsigned lookup_flags
 }
 
 /*
- * Handle a dentry that is managed in some way.
- * - Flagged for transit management (autofs)
- * - Flagged as mountpoint
- * - Flagged as automount point
- *
- * This may only be called in refwalk mode.
- * On success path->dentry is known positive.
- *
- * Serialization is taken care of in namespace.c
+ * mount traversal - out-of-line part.  One note on ->d_flags accesses -
+ * dentries are pinned but not locked here, so negative dentry can go
+ * positive right under us.  Use of smp_load_acquire() provides a barrier
+ * sufficient for ->d_inode and ->d_flags consistency.
  */
-static int follow_managed(struct path *path, struct nameidata *nd)
+static int __traverse_mounts(struct path *path, unsigned flags, bool *jumped,
+			     int *count, unsigned lookup_flags)
 {
-	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
-	unsigned flags;
+	struct vfsmount *mnt = path->mnt;
 	bool need_mntput = false;
 	int ret = 0;
 
-	/* Given that we're not holding a lock here, we retain the value in a
-	 * local variable for each dentry as we look at it so that we don't see
-	 * the components of that value change under us */
-	while (flags = smp_load_acquire(&path->dentry->d_flags),
-	       unlikely(flags & DCACHE_MANAGED_DENTRY)) {
+	while (flags & DCACHE_MANAGED_DENTRY) {
 		/* Allow the filesystem to manage the transit without i_mutex
 		 * being held. */
 		if (flags & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
 			ret = path->dentry->d_op->d_manage(path, false);
 			flags = smp_load_acquire(&path->dentry->d_flags);
 			if (ret < 0)
 				break;
 		}
 
-		/* Transit to a mounted filesystem. */
-		if (flags & DCACHE_MOUNTED) {
+		if (flags & DCACHE_MOUNTED) {	// something's mounted on it..
 			struct vfsmount *mounted = lookup_mnt(path);
-			if (mounted) {
+			if (mounted) {		// ... in our namespace
 				dput(path->dentry);
 				if (need_mntput)
 					mntput(path->mnt);
 				path->mnt = mounted;
 				path->dentry = dget(mounted->mnt_root);
+				// here we know it's positive
+				flags = path->dentry->d_flags;
 				need_mntput = true;
 				continue;
 			}
-
-			/* Something is mounted on this dentry in another
-			 * namespace and/or whatever was mounted there in this
-			 * namespace got unmounted before lookup_mnt() could
-			 * get it */
 		}
 
-		/* Handle an automount point */
-		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, &nd->total_link_count,
-						nd->flags);
-			if (ret < 0)
-				break;
-			continue;
-		}
+		if (!(flags & DCACHE_NEED_AUTOMOUNT))
+			break;
 
-		/* We didn't change the current path point */
-		break;
+		// uncovered automount point
+		ret = follow_automount(path, count, lookup_flags);
+		flags = smp_load_acquire(&path->dentry->d_flags);
+		if (ret < 0)
+			break;
 	}
 
-	if (need_mntput) {
-		if (path->mnt == mnt)
-			mntput(path->mnt);
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			ret = -EXDEV;
-		else
-			nd->flags |= LOOKUP_JUMPED;
-	}
-	if (ret == -EISDIR || !ret)
-		ret = 1;
-	if (ret > 0 && unlikely(d_flags_negative(flags)))
+	if (ret == -EISDIR)
+		ret = 0;
+	// possible if you race with several mount --move
+	if (need_mntput && path->mnt == mnt)
+		mntput(path->mnt);
+	if (!ret && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0)) {
-		dput(path->dentry);
-		if (path->mnt != nd->path.mnt)
-			mntput(path->mnt);
-	}
+	*jumped = need_mntput;
 	return ret;
 }
 
+static inline int traverse_mounts(struct path *path, bool *jumped,
+				  int *count, unsigned lookup_flags)
+{
+	unsigned flags = smp_load_acquire(&path->dentry->d_flags);
+
+	/* fastpath */
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY))) {
+		*jumped = false;
+		if (unlikely(d_flags_negative(flags)))
+			return -ENOENT;
+		return 0;
+	}
+	return __traverse_mounts(path, flags, jumped, count, lookup_flags);
+}
+
 int follow_down_one(struct path *path)
 {
 	struct vfsmount *mounted;
@@ -1270,6 +1258,23 @@ int follow_down_one(struct path *path)
 EXPORT_SYMBOL(follow_down_one);
 
 /*
+ * Follow down to the covering mount currently visible to userspace.  At each
+ * point, the filesystem owning that dentry may be queried as to whether the
+ * caller is permitted to proceed or not.
+ */
+int follow_down(struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	bool jumped;
+	int ret = traverse_mounts(path, &jumped, NULL, 0);
+
+	if (path->mnt != mnt)
+		mntput(mnt);
+	return ret;
+}
+EXPORT_SYMBOL(follow_down);
+
+/*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
@@ -1325,6 +1330,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 			  struct path *path, struct inode **inode,
 			  unsigned int *seqp)
 {
+	bool jumped;
 	int ret;
 
 	path->mnt = nd->path.mnt;
@@ -1334,15 +1340,25 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		if (unlikely(!*inode))
 			return -ENOENT;
 		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-			return 1;
+			return 0;
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		// *path might've been clobbered by __follow_mount_rcu()
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
 	}
-	ret = follow_managed(path, nd);
-	if (likely(ret >= 0)) {
+	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
+	if (jumped) {
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			ret = -EXDEV;
+		else
+			nd->flags |= LOOKUP_JUMPED;
+	}
+	if (unlikely(ret)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	} else {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
@@ -1411,55 +1427,6 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 }
 
 /*
- * Follow down to the covering mount currently visible to userspace.  At each
- * point, the filesystem owning that dentry may be queried as to whether the
- * caller is permitted to proceed or not.
- */
-int follow_down(struct path *path)
-{
-	unsigned managed;
-	int ret;
-
-	while (managed = READ_ONCE(path->dentry->d_flags),
-	       unlikely(managed & DCACHE_MANAGED_DENTRY)) {
-		/* Allow the filesystem to manage the transit without i_mutex
-		 * being held.
-		 *
-		 * We indicate to the filesystem if someone is trying to mount
-		 * something here.  This gives autofs the chance to deny anyone
-		 * other than its daemon the right to mount on its
-		 * superstructure.
-		 *
-		 * The filesystem may sleep at this point.
-		 */
-		if (managed & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
-			ret = path->dentry->d_op->d_manage(path, false);
-			if (ret < 0)
-				return ret == -EISDIR ? 0 : ret;
-		}
-
-		/* Transit to a mounted filesystem. */
-		if (managed & DCACHE_MOUNTED) {
-			struct vfsmount *mounted = lookup_mnt(path);
-			if (!mounted)
-				break;
-			dput(path->dentry);
-			mntput(path->mnt);
-			path->mnt = mounted;
-			path->dentry = dget(mounted->mnt_root);
-			continue;
-		}
-
-		/* Don't handle automount points here */
-		break;
-	}
-	return 0;
-}
-EXPORT_SYMBOL(follow_down);
-
-/*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
 static void follow_mount(struct path *path)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 31/55] atomic_open(): return the right dentry in FMODE_OPENED case
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (28 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 30/55] new helper: traverse_mounts() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 32/55] atomic_open(): lift the call of may_open() into do_last() Al Viro
                       ` (23 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

->atomic_open() might have used a different alias than the one we'd
passed to it; in "not opened" case we take care of that, in "opened"
one we don't.  Currently we don't care downstream of "opened" case
which alias to return; however, that will change shortly when we
get to unifying may_open() calls.

It's not hard to get right in all cases, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 56d39a8cfc3e..bfd62fa3856c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2952,11 +2952,15 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
+			int acc_mode = op->acc_mode;
+			if (unlikely(dentry != file->f_path.dentry)) {
+				dput(dentry);
+				dentry = dget(file->f_path.dentry);
+			}
 			/*
 			 * We didn't have the inode before the open, so check open
 			 * permission here.
 			 */
-			int acc_mode = op->acc_mode;
 			if (file->f_mode & FMODE_CREATED) {
 				WARN_ON(!(open_flag & O_CREAT));
 				fsnotify_create(dir, dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 32/55] atomic_open(): lift the call of may_open() into do_last()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (29 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 31/55] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 33/55] do_last(): merge the may_open() calls Al Viro
                       ` (22 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

there we'll be able to merge it with its counterparts in other
cases, and there's no reason to do it before the parent has
been unlocked

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bfd62fa3856c..9fcb8f635e66 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2952,23 +2952,12 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
-			int acc_mode = op->acc_mode;
 			if (unlikely(dentry != file->f_path.dentry)) {
 				dput(dentry);
 				dentry = dget(file->f_path.dentry);
 			}
-			/*
-			 * We didn't have the inode before the open, so check open
-			 * permission here.
-			 */
-			if (file->f_mode & FMODE_CREATED) {
-				WARN_ON(!(open_flag & O_CREAT));
+			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-				acc_mode = 0;
-			}
-			error = may_open(&file->f_path, acc_mode, open_flag);
-			if (WARN_ON(error > 0))
-				error = -EINVAL;
 		} else if (WARN_ON(file->f_path.dentry == DENTRY_NOT_SET)) {
 			error = -EIO;
 		} else {
@@ -3212,12 +3201,19 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_OPENED) {
-		if ((file->f_mode & FMODE_CREATED) ||
-		    !S_ISREG(file_inode(file)->i_mode))
+		if (file->f_mode & FMODE_CREATED) {
+			open_flag &= ~O_TRUNC;
+			will_truncate = false;
+			acc_mode = 0;
+		} else if (!S_ISREG(file_inode(file)->i_mode))
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(dentry);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
+		error = may_open(&nd->path, acc_mode, open_flag);
+		if (error)
+			goto out;
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 33/55] do_last(): merge the may_open() calls
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (30 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 32/55] atomic_open(): lift the call of may_open() into do_last() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 34/55] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
                       ` (21 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

have FMODE_OPENED case rejoin the main path at earlier point

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9fcb8f635e66..91346e5d8bd8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3211,10 +3211,7 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		error = may_open(&nd->path, acc_mode, open_flag);
-		if (error)
-			goto out;
-		goto opened;
+		goto finish_open_created;
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
@@ -3280,11 +3277,10 @@ static const char *do_last(struct nameidata *nd,
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-	BUG_ON(file->f_mode & FMODE_OPENED); /* once it's opened, it's opened */
-	error = vfs_open(&nd->path, file);
+	if (!(file->f_mode & FMODE_OPENED))
+		error = vfs_open(&nd->path, file);
 	if (error)
 		goto out;
-opened:
 	error = ima_file_check(file, op->acc_mode);
 	if (!error && will_truncate)
 		error = handle_truncate(file);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 34/55] do_last(): don't bother with keeping got_write in FMODE_OPENED case
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (31 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 33/55] do_last(): merge the may_open() calls Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 35/55] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
                       ` (20 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

it's easier to drop it right after lookup_open() and regain if
needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
path we do that anyway.  In case of FMODE_CREATED we won't be
needing it.  And it's easier to prove correctness that way,
especially since the initial failure to get write access is not
always fatal; proving that we'll never end up truncating in that
case is rather convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 91346e5d8bd8..b4de0994ccee 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3195,11 +3195,14 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (IS_ERR(dentry)) {
-		error = PTR_ERR(dentry);
-		goto out;
+	if (got_write) {
+		mnt_drop_write(nd->path.mnt);
+		got_write = false;
 	}
 
+	if (IS_ERR(dentry))
+		return ERR_CAST(dentry);
+
 	if (file->f_mode & FMODE_OPENED) {
 		if (file->f_mode & FMODE_CREATED) {
 			open_flag &= ~O_TRUNC;
@@ -3224,16 +3227,6 @@ static const char *do_last(struct nameidata *nd,
 		goto finish_open_created;
 	}
 
-	/*
-	 * If atomic_open() acquired write access it is dropped now due to
-	 * possible mount and symlink following (this might be optimized away if
-	 * necessary...)
-	 */
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-
 finish_lookup:
 	link = step_into(nd, WALK_TRAILING, dentry, inode, seq);
 	if (unlikely(link)) {
@@ -3253,27 +3246,25 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-		error = -EISDIR;
 		if (d_is_dir(nd->path.dentry))
-			goto out;
+			return ERR_PTR(-EISDIR);
 		error = may_create_in_sticky(dir_mode, dir_uid,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
-			goto out;
+			return ERR_PTR(error);
 	}
-	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
-		goto out;
+		return ERR_PTR(-ENOTDIR);
 	if (!d_is_reg(nd->path.dentry))
 		will_truncate = false;
 
+finish_open_created:
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-			goto out;
+			return ERR_PTR(error);
 		got_write = true;
 	}
-finish_open_created:
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 35/55] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (32 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 34/55] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 36/55] do_last(): simplify the liveness analysis past finish_open_created Al Viro
                       ` (19 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b4de0994ccee..66c3e2df46b2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3204,13 +3204,6 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_CAST(dentry);
 
 	if (file->f_mode & FMODE_OPENED) {
-		if (file->f_mode & FMODE_CREATED) {
-			open_flag &= ~O_TRUNC;
-			will_truncate = false;
-			acc_mode = 0;
-		} else if (!S_ISREG(file_inode(file)->i_mode))
-			will_truncate = false;
-
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
@@ -3218,10 +3211,6 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
-		/* Don't check for write permission, don't truncate */
-		open_flag &= ~O_TRUNC;
-		will_truncate = false;
-		acc_mode = 0;
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3255,10 +3244,16 @@ static const char *do_last(struct nameidata *nd,
 	}
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
-	if (!d_is_reg(nd->path.dentry))
-		will_truncate = false;
 
 finish_open_created:
+	if (file->f_mode & FMODE_CREATED) {
+		/* Don't check for write permission, don't truncate */
+		open_flag &= ~O_TRUNC;
+		will_truncate = false;
+		acc_mode = 0;
+	} else if (!d_is_reg(nd->path.dentry)) {
+		will_truncate = false;
+	}
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 36/55] do_last(): simplify the liveness analysis past finish_open_created
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (33 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 35/55] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 37/55] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
                       ` (18 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Don't mess with got_write there - it is guaranteed to be false on
entry and it will be set true if and only if we decide to go for
truncation and manage to get write access for that.

Don't carry acc_mode through the entire thing - it's only used
in that part.  And don't bother with gotos in there - compiler is
quite capable of optimizing that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 66c3e2df46b2..af2eb52c4a8d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3127,9 +3127,9 @@ static const char *do_last(struct nameidata *nd,
 	kuid_t dir_uid = nd->inode->i_uid;
 	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool will_truncate = (open_flag & O_TRUNC) != 0;
+	bool do_truncate;
 	bool got_write = false;
-	int acc_mode = op->acc_mode;
+	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3246,36 +3246,30 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(-ENOTDIR);
 
 finish_open_created:
+	do_truncate = false;
+	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
 		/* Don't check for write permission, don't truncate */
 		open_flag &= ~O_TRUNC;
-		will_truncate = false;
 		acc_mode = 0;
-	} else if (!d_is_reg(nd->path.dentry)) {
-		will_truncate = false;
-	}
-	if (will_truncate) {
+	} else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
 			return ERR_PTR(error);
-		got_write = true;
+		do_truncate = true;
 	}
 	error = may_open(&nd->path, acc_mode, open_flag);
-	if (error)
-		goto out;
-	if (!(file->f_mode & FMODE_OPENED))
+	if (!error && !(file->f_mode & FMODE_OPENED))
 		error = vfs_open(&nd->path, file);
-	if (error)
-		goto out;
-	error = ima_file_check(file, op->acc_mode);
-	if (!error && will_truncate)
+	if (!error)
+		error = ima_file_check(file, op->acc_mode);
+	if (!error && do_truncate)
 		error = handle_truncate(file);
-out:
 	if (unlikely(error > 0)) {
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	if (got_write)
+	if (do_truncate)
 		mnt_drop_write(nd->path.mnt);
 	return ERR_PTR(error);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 37/55] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (34 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 36/55] do_last(): simplify the liveness analysis past finish_open_created Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 38/55] split the lookup-related parts of do_last() into a separate helper Al Viro
                       ` (17 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... getting may_create_in_sticky() checks in FMODE_OPENED case as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index af2eb52c4a8d..66f236d5aee7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3203,14 +3203,7 @@ static const char *do_last(struct nameidata *nd,
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 
-	if (file->f_mode & FMODE_OPENED) {
-		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(nd->path.dentry);
-		nd->path.dentry = dentry;
-		goto finish_open_created;
-	}
-
-	if (file->f_mode & FMODE_CREATED) {
+	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3233,7 +3226,9 @@ static const char *do_last(struct nameidata *nd,
 	error = complete_walk(nd);
 	if (error)
 		return ERR_PTR(error);
-	audit_inode(nd->name, nd->path.dentry, 0);
+finish_open_created:
+	if (!(file->f_mode & FMODE_CREATED))
+		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		if (d_is_dir(nd->path.dentry))
 			return ERR_PTR(-EISDIR);
@@ -3245,7 +3240,6 @@ static const char *do_last(struct nameidata *nd,
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
 
-finish_open_created:
 	do_truncate = false;
 	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 38/55] split the lookup-related parts of do_last() into a separate helper
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (35 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 37/55] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 39/55] path_connected(): pass mount and dentry separately Al Viro
                       ` (16 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 66f236d5aee7..b73ff6dcca24 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3117,19 +3117,12 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	return ERR_PTR(error);
 }
 
-/*
- * Handle the last step of open()
- */
-static const char *do_last(struct nameidata *nd,
+static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
-	kuid_t dir_uid = nd->inode->i_uid;
-	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool do_truncate;
 	bool got_write = false;
-	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3141,9 +3134,9 @@ static const char *do_last(struct nameidata *nd,
 
 	if (nd->last_type != LAST_NORM) {
 		error = handle_dots(nd, nd->last_type);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		goto finish_open;
+		if (likely(!error))
+			error = complete_walk(nd);
+		return ERR_PTR(error);
 	}
 
 	if (!(open_flag & O_CREAT)) {
@@ -3156,7 +3149,6 @@ static const char *do_last(struct nameidata *nd,
 		if (likely(dentry))
 			goto finish_lookup;
 
-		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
 		/* create side of things */
@@ -3166,7 +3158,7 @@ static const char *do_last(struct nameidata *nd,
 		 * about to look up
 		 */
 		error = complete_walk(nd);
-		if (error)
+		if (unlikely(error))
 			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
@@ -3195,10 +3187,8 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (got_write) {
+	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
 
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
@@ -3206,7 +3196,7 @@ static const char *do_last(struct nameidata *nd,
 	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		goto finish_open_created;
+		return NULL;
 	}
 
 finish_lookup:
@@ -3221,12 +3211,29 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, nd->path.dentry, 0);
 		return ERR_PTR(-EEXIST);
 	}
-finish_open:
+
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
-	error = complete_walk(nd);
-	if (error)
-		return ERR_PTR(error);
-finish_open_created:
+	return ERR_PTR(complete_walk(nd));
+}
+
+/*
+ * Handle the last step of open()
+ */
+static const char *do_last(struct nameidata *nd,
+		   struct file *file, const struct open_flags *op)
+{
+	kuid_t dir_uid = nd->inode->i_uid;
+	umode_t dir_mode = nd->inode->i_mode;
+	int open_flag = op->open_flag;
+	bool do_truncate;
+	int acc_mode;
+	const char *link;
+	int error;
+
+	link = open_last_lookups(nd, file, op);
+	if (unlikely(link))
+		return link;
+
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 39/55] path_connected(): pass mount and dentry separately
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (36 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 38/55] split the lookup-related parts of do_last() into a separate helper Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 40/55] path_parent_directory(): leave changing path->dentry to callers Al Viro
                       ` (15 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

eventually we'll want to do that check *before* mangling
nd->path.dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b73ff6dcca24..971d0ee11286 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -550,22 +550,20 @@ static int __nd_alloc_stack(struct nameidata *nd)
 }
 
 /**
- * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
- * @path: nameidate to verify
+ * path_connected - Verify that a dentry is below mnt.mnt_root
  *
  * Rename can sometimes move a file or directory outside of a bind
  * mount, path_connected allows those cases to be detected.
  */
-static bool path_connected(const struct path *path)
+static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
 {
-	struct vfsmount *mnt = path->mnt;
 	struct super_block *sb = mnt->mnt_sb;
 
 	/* Bind mounts and multi-root filesystems can have disconnected paths */
 	if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
 		return true;
 
-	return is_subdir(path->dentry, mnt->mnt_root);
+	return is_subdir(dentry, mnt->mnt_root);
 }
 
 static inline int nd_alloc_stack(struct nameidata *nd)
@@ -1386,7 +1384,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 				return -ECHILD;
 			nd->path.dentry = parent;
 			nd->seq = seq;
-			if (unlikely(!path_connected(&nd->path)))
+			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return -ECHILD;
 			break;
 		} else {
@@ -1448,7 +1446,7 @@ static int path_parent_directory(struct path *path)
 	/* rare case of legitimate dget_parent()... */
 	path->dentry = dget_parent(path->dentry);
 	dput(old);
-	if (unlikely(!path_connected(path)))
+	if (unlikely(!path_connected(path->mnt, path->dentry)))
 		return -ENOENT;
 	return 0;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 40/55] path_parent_directory(): leave changing path->dentry to callers
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (37 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 39/55] path_connected(): pass mount and dentry separately Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 41/55] follow_dotdot(): expand the call of path_parent_directory() Al Viro
                       ` (14 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Instead of returning 0, return new dentry; instead of returning
-ENOENT, return NULL.  Adjust the callers accordingly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 971d0ee11286..8e588632e03a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1440,19 +1440,21 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static int path_parent_directory(struct path *path)
+static struct dentry *path_parent_directory(struct path *path)
 {
-	struct dentry *old = path->dentry;
 	/* rare case of legitimate dget_parent()... */
-	path->dentry = dget_parent(path->dentry);
-	dput(old);
-	if (unlikely(!path_connected(path->mnt, path->dentry)))
-		return -ENOENT;
-	return 0;
+	struct dentry *parent = dget_parent(path->dentry);
+
+	if (unlikely(!path_connected(path->mnt, parent))) {
+		dput(parent);
+		parent = NULL;
+	}
+	return parent;
 }
 
 static int follow_dotdot(struct nameidata *nd)
 {
+	struct dentry *parent;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
 			if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1460,9 +1462,11 @@ static int follow_dotdot(struct nameidata *nd)
 			break;
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			int ret = path_parent_directory(&nd->path);
-			if (ret)
-				return ret;
+			parent = path_parent_directory(&nd->path);
+			if (!parent)
+				return -ENOENT;
+			dput(nd->path.dentry);
+			nd->path.dentry = parent;
 			break;
 		}
 		if (!follow_up(&nd->path))
@@ -2603,13 +2607,13 @@ int path_pts(struct path *path)
 	 */
 	struct dentry *child, *parent;
 	struct qstr this;
-	int ret;
 
-	ret = path_parent_directory(path);
-	if (ret)
-		return ret;
+	parent = path_parent_directory(path);
+	if (!parent)
+		return -ENOENT;
 
-	parent = path->dentry;
+	dput(path->dentry);
+	path->dentry = parent;
 	this.name = "pts";
 	this.len = 3;
 	child = d_hash_and_lookup(parent, &this);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 41/55] follow_dotdot(): expand the call of path_parent_directory()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (38 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 40/55] path_parent_directory(): leave changing path->dentry to callers Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 42/55] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
                       ` (13 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and move the definition of path_parent_directory() towards
the last remaining user.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8e588632e03a..9b66cd00edcb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1440,21 +1440,8 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static struct dentry *path_parent_directory(struct path *path)
-{
-	/* rare case of legitimate dget_parent()... */
-	struct dentry *parent = dget_parent(path->dentry);
-
-	if (unlikely(!path_connected(path->mnt, parent))) {
-		dput(parent);
-		parent = NULL;
-	}
-	return parent;
-}
-
 static int follow_dotdot(struct nameidata *nd)
 {
-	struct dentry *parent;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
 			if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1462,9 +1449,13 @@ static int follow_dotdot(struct nameidata *nd)
 			break;
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			parent = path_parent_directory(&nd->path);
-			if (!parent)
+			/* rare case of legitimate dget_parent()... */
+			struct dentry *parent = dget_parent(nd->path.dentry);
+
+			if (unlikely(!path_connected(nd->path.mnt, parent))) {
+				dput(parent);
 				return -ENOENT;
+			}
 			dput(nd->path.dentry);
 			nd->path.dentry = parent;
 			break;
@@ -2600,6 +2591,17 @@ struct dentry *lookup_positive_unlocked(const char *name,
 EXPORT_SYMBOL(lookup_positive_unlocked);
 
 #ifdef CONFIG_UNIX98_PTYS
+static int path_parent_directory(struct path *path)
+{
+	struct dentry *old = path->dentry;
+	/* rare case of legitimate dget_parent()... */
+	path->dentry = dget_parent(path->dentry);
+	dput(old);
+	if (unlikely(!path_connected(path->mnt, path->dentry)))
+		return -ENOENT;
+	return 0;
+}
+
 int path_pts(struct path *path)
 {
 	/* Find something mounted on "pts" in the same directory as
@@ -2607,13 +2609,13 @@ int path_pts(struct path *path)
 	 */
 	struct dentry *child, *parent;
 	struct qstr this;
+	int ret;
 
-	parent = path_parent_directory(path);
-	if (!parent)
-		return -ENOENT;
+	ret = path_parent_directory(path);
+	if (ret)
+		return ret;
 
-	dput(path->dentry);
-	path->dentry = parent;
+	parent = path->dentry;
 	this.name = "pts";
 	this.len = 3;
 	child = d_hash_and_lookup(parent, &this);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 42/55] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (39 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 41/55] follow_dotdot(): expand the call of path_parent_directory() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 43/55] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
                       ` (12 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9b66cd00edcb..c307bf7cbaa1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1365,7 +1365,9 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 
 static int follow_dotdot_rcu(struct nameidata *nd)
 {
+	struct dentry *parent = NULL;
 	struct inode *inode = nd->inode;
+	unsigned seq;
 
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
@@ -1375,15 +1377,12 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
-			struct dentry *parent = old->d_parent;
-			unsigned seq;
 
+			parent = old->d_parent;
 			inode = parent->d_inode;
 			seq = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 				return -ECHILD;
-			nd->path.dentry = parent;
-			nd->seq = seq;
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return -ECHILD;
 			break;
@@ -1406,6 +1405,10 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			nd->seq = seq;
 		}
 	}
+	if (likely(parent)) {
+		nd->path.dentry = parent;
+		nd->seq = seq;
+	}
 	while (unlikely(d_mountpoint(nd->path.dentry))) {
 		struct mount *mounted;
 		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
@@ -1442,6 +1445,7 @@ static void follow_mount(struct path *path)
 
 static int follow_dotdot(struct nameidata *nd)
 {
+	struct dentry *parent = NULL;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
 			if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1450,14 +1454,11 @@ static int follow_dotdot(struct nameidata *nd)
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
-			struct dentry *parent = dget_parent(nd->path.dentry);
-
+			parent = dget_parent(nd->path.dentry);
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
 				return -ENOENT;
 			}
-			dput(nd->path.dentry);
-			nd->path.dentry = parent;
 			break;
 		}
 		if (!follow_up(&nd->path))
@@ -1465,6 +1466,10 @@ static int follow_dotdot(struct nameidata *nd)
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return -EXDEV;
 	}
+	if (likely(parent)) {
+		dput(nd->path.dentry);
+		nd->path.dentry = parent;
+	}
 	follow_mount(&nd->path);
 	nd->inode = nd->path.dentry->d_inode;
 	return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 43/55] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (40 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 42/55] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 44/55] move put_link() into handle_dots() Al Viro
                       ` (11 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root
yields an error even if it's not the process' root.  That's
possible only if you'd managed to escape chroot jail by way of
procfs symlinks, but IMO the resulting behaviour is not worse -
more consistent and easier to describe:
	".." in root is "stay where you are", uness LOOKUP_BENEATH
	has been given, in which case it's "fail with EXDEV".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c307bf7cbaa1..be756aa32240 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1370,11 +1370,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 	unsigned seq;
 
 	while (1) {
-		if (path_equal(&nd->path, &nd->root)) {
-			if (unlikely(nd->flags & LOOKUP_BENEATH))
-				return -ECHILD;
+		if (path_equal(&nd->path, &nd->root))
 			break;
-		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
 
@@ -1405,7 +1402,10 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			nd->seq = seq;
 		}
 	}
-	if (likely(parent)) {
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -ECHILD;
+	} else {
 		nd->path.dentry = parent;
 		nd->seq = seq;
 	}
@@ -1447,11 +1447,8 @@ static int follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent = NULL;
 	while (1) {
-		if (path_equal(&nd->path, &nd->root)) {
-			if (unlikely(nd->flags & LOOKUP_BENEATH))
-				return -EXDEV;
+		if (path_equal(&nd->path, &nd->root))
 			break;
-		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
 			parent = dget_parent(nd->path.dentry);
@@ -1466,7 +1463,10 @@ static int follow_dotdot(struct nameidata *nd)
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return -EXDEV;
 	}
-	if (likely(parent)) {
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -EXDEV;
+	} else {
 		dput(nd->path.dentry);
 		nd->path.dentry = parent;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 44/55] move put_link() into handle_dots()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (41 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 43/55] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 45/55] handle_dots(): return ERR_PTR/NULL instead of -E.../0 Al Viro
                       ` (10 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

That closes a harmless irregularity in do_last() - if trailing
symlink ends with . or .. we forget to drop it.  Not a problem,
since nameidata is about to be done with anyway, but let's keep
it more regular.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 76 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index be756aa32240..3f15b18af991 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1654,40 +1654,6 @@ static inline int may_lookup(struct nameidata *nd)
 	return inode_permission(nd->inode, MAY_EXEC);
 }
 
-static inline int handle_dots(struct nameidata *nd, int type)
-{
-	if (type == LAST_DOTDOT) {
-		int error = 0;
-
-		if (!nd->root.mnt) {
-			error = set_root(nd);
-			if (error)
-				return error;
-		}
-		if (nd->flags & LOOKUP_RCU)
-			error = follow_dotdot_rcu(nd);
-		else
-			error = follow_dotdot(nd);
-		if (error)
-			return error;
-
-		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
-			/*
-			 * If there was a racing rename or mount along our
-			 * path, then we can't be sure that ".." hasn't jumped
-			 * above nd->root (and so userspace should retry or use
-			 * some fallback).
-			 */
-			smp_rmb();
-			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
-				return -EAGAIN;
-			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
-				return -EAGAIN;
-		}
-	}
-	return 0;
-}
-
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
@@ -1819,6 +1785,42 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
+static int handle_dots(struct nameidata *nd, int type, int flags)
+{
+	if (type == LAST_DOTDOT) {
+		int error = 0;
+
+		if (!nd->root.mnt) {
+			error = set_root(nd);
+			if (error)
+				return error;
+		}
+		if (nd->flags & LOOKUP_RCU)
+			error = follow_dotdot_rcu(nd);
+		else
+			error = follow_dotdot(nd);
+		if (error)
+			return error;
+
+		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
+			/*
+			 * If there was a racing rename or mount along our
+			 * path, then we can't be sure that ".." hasn't jumped
+			 * above nd->root (and so userspace should retry or use
+			 * some fallback).
+			 */
+			smp_rmb();
+			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
+				return -EAGAIN;
+			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
+				return -EAGAIN;
+		}
+	}
+	if (!(flags & WALK_MORE) && nd->depth)
+		put_link(nd);
+	return 0;
+}
+
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
@@ -1831,9 +1833,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	 * parent relationships.
 	 */
 	if (unlikely(nd->last_type != LAST_NORM)) {
-		err = handle_dots(nd, nd->last_type);
-		if (!(flags & WALK_MORE) && nd->depth)
-			put_link(nd);
+		err = handle_dots(nd, nd->last_type, flags);
 		return ERR_PTR(err);
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
@@ -3142,7 +3142,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	nd->flags |= op->intent;
 
 	if (nd->last_type != LAST_NORM) {
-		error = handle_dots(nd, nd->last_type);
+		error = handle_dots(nd, nd->last_type, WALK_TRAILING);
 		if (likely(!error))
 			error = complete_walk(nd);
 		return ERR_PTR(error);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 45/55] handle_dots(): return ERR_PTR/NULL instead of -E.../0
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (42 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 44/55] move put_link() into handle_dots() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 46/55] move follow_dotdot() and follow_dotdot_rcu() towards handle_dots() Al Viro
                       ` (9 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

all callers prefer such calling conventions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3f15b18af991..20a2fc27dd87 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1785,7 +1785,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static int handle_dots(struct nameidata *nd, int type, int flags)
+static const char *handle_dots(struct nameidata *nd, int type, int flags)
 {
 	if (type == LAST_DOTDOT) {
 		int error = 0;
@@ -1793,14 +1793,14 @@ static int handle_dots(struct nameidata *nd, int type, int flags)
 		if (!nd->root.mnt) {
 			error = set_root(nd);
 			if (error)
-				return error;
+				return ERR_PTR(error);
 		}
 		if (nd->flags & LOOKUP_RCU)
 			error = follow_dotdot_rcu(nd);
 		else
 			error = follow_dotdot(nd);
 		if (error)
-			return error;
+			return ERR_PTR(error);
 
 		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
 			/*
@@ -1811,14 +1811,14 @@ static int handle_dots(struct nameidata *nd, int type, int flags)
 			 */
 			smp_rmb();
 			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
-				return -EAGAIN;
+				return ERR_PTR(-EAGAIN);
 			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
-				return -EAGAIN;
+				return ERR_PTR(-EAGAIN);
 		}
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return 0;
+	return NULL;
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
@@ -1826,16 +1826,13 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
-	int err;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
 	 * parent relationships.
 	 */
-	if (unlikely(nd->last_type != LAST_NORM)) {
-		err = handle_dots(nd, nd->last_type, flags);
-		return ERR_PTR(err);
-	}
+	if (unlikely(nd->last_type != LAST_NORM))
+		return handle_dots(nd, nd->last_type, flags);
 	dentry = lookup_fast(nd, &inode, &seq);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
@@ -3135,17 +3132,17 @@ static const char *open_last_lookups(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
-	const char *link;
+	const char *res;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
 	nd->flags |= op->intent;
 
 	if (nd->last_type != LAST_NORM) {
-		error = handle_dots(nd, nd->last_type, WALK_TRAILING);
-		if (likely(!error))
-			error = complete_walk(nd);
-		return ERR_PTR(error);
+		res = handle_dots(nd, nd->last_type, WALK_TRAILING);
+		if (likely(!res))
+			res = ERR_PTR(complete_walk(nd));
+		return res;
 	}
 
 	if (!(open_flag & O_CREAT)) {
@@ -3209,11 +3206,11 @@ static const char *open_last_lookups(struct nameidata *nd,
 	}
 
 finish_lookup:
-	link = step_into(nd, WALK_TRAILING, dentry, inode, seq);
-	if (unlikely(link)) {
+	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+	if (unlikely(res)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-		return link;
+		return res;
 	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 46/55] move follow_dotdot() and follow_dotdot_rcu() towards handle_dots()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (43 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 45/55] handle_dots(): return ERR_PTR/NULL instead of -E.../0 Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 47/55] follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
                       ` (8 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

pure move

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 192 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 20a2fc27dd87..1bfefb99cbca 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1363,70 +1363,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	return ret;
 }
 
-static int follow_dotdot_rcu(struct nameidata *nd)
-{
-	struct dentry *parent = NULL;
-	struct inode *inode = nd->inode;
-	unsigned seq;
-
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			struct dentry *old = nd->path.dentry;
-
-			parent = old->d_parent;
-			inode = parent->d_inode;
-			seq = read_seqcount_begin(&parent->d_seq);
-			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return -ECHILD;
-			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return -ECHILD;
-			break;
-		} else {
-			struct mount *mnt = real_mount(nd->path.mnt);
-			struct mount *mparent = mnt->mnt_parent;
-			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode2 = mountpoint->d_inode;
-			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return -ECHILD;
-			if (&mparent->mnt == nd->path.mnt)
-				break;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return -ECHILD;
-			/* we know that mountpoint was pinned */
-			nd->path.dentry = mountpoint;
-			nd->path.mnt = &mparent->mnt;
-			inode = inode2;
-			nd->seq = seq;
-		}
-	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -ECHILD;
-	} else {
-		nd->path.dentry = parent;
-		nd->seq = seq;
-	}
-	while (unlikely(d_mountpoint(nd->path.dentry))) {
-		struct mount *mounted;
-		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
-		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return -ECHILD;
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -ECHILD;
-		nd->path.mnt = &mounted->mnt;
-		nd->path.dentry = mounted->mnt.mnt_root;
-		inode = nd->path.dentry->d_inode;
-		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
-	}
-	nd->inode = inode;
-	return 0;
-}
-
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1443,38 +1379,6 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static int follow_dotdot(struct nameidata *nd)
-{
-	struct dentry *parent = NULL;
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			/* rare case of legitimate dget_parent()... */
-			parent = dget_parent(nd->path.dentry);
-			if (unlikely(!path_connected(nd->path.mnt, parent))) {
-				dput(parent);
-				return -ENOENT;
-			}
-			break;
-		}
-		if (!follow_up(&nd->path))
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -EXDEV;
-	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -EXDEV;
-	} else {
-		dput(nd->path.dentry);
-		nd->path.dentry = parent;
-	}
-	follow_mount(&nd->path);
-	nd->inode = nd->path.dentry->d_inode;
-	return 0;
-}
-
 /*
  * This looks up the name in dcache and possibly revalidates the found dentry.
  * NULL is returned if the dentry does not exist in the cache.
@@ -1785,6 +1689,102 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
+static int follow_dotdot_rcu(struct nameidata *nd)
+{
+	struct dentry *parent = NULL;
+	struct inode *inode = nd->inode;
+	unsigned seq;
+
+	while (1) {
+		if (path_equal(&nd->path, &nd->root))
+			break;
+		if (nd->path.dentry != nd->path.mnt->mnt_root) {
+			struct dentry *old = nd->path.dentry;
+
+			parent = old->d_parent;
+			inode = parent->d_inode;
+			seq = read_seqcount_begin(&parent->d_seq);
+			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+				return -ECHILD;
+			if (unlikely(!path_connected(nd->path.mnt, parent)))
+				return -ECHILD;
+			break;
+		} else {
+			struct mount *mnt = real_mount(nd->path.mnt);
+			struct mount *mparent = mnt->mnt_parent;
+			struct dentry *mountpoint = mnt->mnt_mountpoint;
+			struct inode *inode2 = mountpoint->d_inode;
+			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
+			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+				return -ECHILD;
+			if (&mparent->mnt == nd->path.mnt)
+				break;
+			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+				return -ECHILD;
+			/* we know that mountpoint was pinned */
+			nd->path.dentry = mountpoint;
+			nd->path.mnt = &mparent->mnt;
+			inode = inode2;
+			nd->seq = seq;
+		}
+	}
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -ECHILD;
+	} else {
+		nd->path.dentry = parent;
+		nd->seq = seq;
+	}
+	while (unlikely(d_mountpoint(nd->path.dentry))) {
+		struct mount *mounted;
+		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
+		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+			return -ECHILD;
+		if (!mounted)
+			break;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return -ECHILD;
+		nd->path.mnt = &mounted->mnt;
+		nd->path.dentry = mounted->mnt.mnt_root;
+		inode = nd->path.dentry->d_inode;
+		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+	}
+	nd->inode = inode;
+	return 0;
+}
+
+static int follow_dotdot(struct nameidata *nd)
+{
+	struct dentry *parent = NULL;
+	while (1) {
+		if (path_equal(&nd->path, &nd->root))
+			break;
+		if (nd->path.dentry != nd->path.mnt->mnt_root) {
+			/* rare case of legitimate dget_parent()... */
+			parent = dget_parent(nd->path.dentry);
+			if (unlikely(!path_connected(nd->path.mnt, parent))) {
+				dput(parent);
+				return -ENOENT;
+			}
+			break;
+		}
+		if (!follow_up(&nd->path))
+			break;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return -EXDEV;
+	}
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -EXDEV;
+	} else {
+		dput(nd->path.dentry);
+		nd->path.dentry = parent;
+	}
+	follow_mount(&nd->path);
+	nd->inode = nd->path.dentry->d_inode;
+	return 0;
+}
+
 static const char *handle_dots(struct nameidata *nd, int type, int flags)
 {
 	if (type == LAST_DOTDOT) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 47/55] follow_dotdot{,_rcu}(): preparation to switch to step_into()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (44 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 46/55] move follow_dotdot() and follow_dotdot_rcu() towards handle_dots() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 48/55] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
                       ` (7 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Right now the tail ends of follow_dotdot{,_rcu}() are pretty
much the open-coded analogues of step_into().  The differences:
	* the lack of proper LOOKUP_NO_XDEV handling in non-RCU case
(arguably a bug)
	* the lack of ->d_manage() handling (again, arguably a bug)
	* the lack of (conditional) put_link() - that sits in
handle_dots() right now.

Bring the put_link() in, adjust the calling conventions so that
on the next step with could just switch those functions to
returning step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 49 +++++++++++++++++++++++++++----------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1bfefb99cbca..3e3bf11ee3d7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1689,7 +1689,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static int follow_dotdot_rcu(struct nameidata *nd)
+static const char *follow_dotdot_rcu(struct nameidata *nd, int flags)
 {
 	struct dentry *parent = NULL;
 	struct inode *inode = nd->inode;
@@ -1705,9 +1705,9 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			inode = parent->d_inode;
 			seq = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			break;
 		} else {
 			struct mount *mnt = real_mount(nd->path.mnt);
@@ -1716,11 +1716,11 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			struct inode *inode2 = mountpoint->d_inode;
 			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			if (&mparent->mnt == nd->path.mnt)
 				break;
 			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
@@ -1730,7 +1730,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 	} else {
 		nd->path.dentry = parent;
 		nd->seq = seq;
@@ -1739,21 +1739,23 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 		struct mount *mounted;
 		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		if (!mounted)
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		nd->path.mnt = &mounted->mnt;
 		nd->path.dentry = mounted->mnt.mnt_root;
 		inode = nd->path.dentry->d_inode;
 		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
 	}
 	nd->inode = inode;
-	return 0;
+	if (!(flags & WALK_MORE) && nd->depth)
+		put_link(nd);
+	return NULL;
 }
 
-static int follow_dotdot(struct nameidata *nd)
+static const char *follow_dotdot(struct nameidata *nd, int flags)
 {
 	struct dentry *parent = NULL;
 	while (1) {
@@ -1764,43 +1766,45 @@ static int follow_dotdot(struct nameidata *nd)
 			parent = dget_parent(nd->path.dentry);
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
-				return -ENOENT;
+				return ERR_PTR(-ENOENT);
 			}
 			break;
 		}
 		if (!follow_up(&nd->path))
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -EXDEV;
+			return ERR_PTR(-EXDEV);
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -EXDEV;
+			return ERR_PTR(-EXDEV);
 	} else {
 		dput(nd->path.dentry);
 		nd->path.dentry = parent;
 	}
 	follow_mount(&nd->path);
 	nd->inode = nd->path.dentry->d_inode;
-	return 0;
+	if (!(flags & WALK_MORE) && nd->depth)
+		put_link(nd);
+	return NULL;
 }
 
 static const char *handle_dots(struct nameidata *nd, int type, int flags)
 {
 	if (type == LAST_DOTDOT) {
-		int error = 0;
+		const char *error = NULL;
 
 		if (!nd->root.mnt) {
-			error = set_root(nd);
+			error = ERR_PTR(set_root(nd));
 			if (error)
-				return ERR_PTR(error);
+				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			error = follow_dotdot_rcu(nd);
+			error = follow_dotdot_rcu(nd, flags);
 		else
-			error = follow_dotdot(nd);
+			error = follow_dotdot(nd, flags);
 		if (error)
-			return ERR_PTR(error);
+			return error;
 
 		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
 			/*
@@ -1815,9 +1819,10 @@ static const char *handle_dots(struct nameidata *nd, int type, int flags)
 			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
 				return ERR_PTR(-EAGAIN);
 		}
+	} else { // LAST_DOT
+		if (!(flags & WALK_MORE) && nd->depth)
+			put_link(nd);
 	}
-	if (!(flags & WALK_MORE) && nd->depth)
-		put_link(nd);
 	return NULL;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 48/55] follow_dotdot{,_rcu}(): switch to use of step_into()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (45 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 47/55] follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 49/55] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
                       ` (6 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

gets the regular mount crossing on result of ..

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 37 +++++++++----------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3e3bf11ee3d7..cc01bca2266c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1724,35 +1724,19 @@ static const char *follow_dotdot_rcu(struct nameidata *nd, int flags)
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
-			inode = inode2;
+			inode = nd->inode = inode2;
 			nd->seq = seq;
 		}
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
 			return ERR_PTR(-ECHILD);
+		return step_into(nd, flags | WALK_NOFOLLOW,
+				 nd->path.dentry, nd->inode, nd->seq);
 	} else {
-		nd->path.dentry = parent;
-		nd->seq = seq;
-	}
-	while (unlikely(d_mountpoint(nd->path.dentry))) {
-		struct mount *mounted;
-		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
-		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return ERR_PTR(-ECHILD);
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return ERR_PTR(-ECHILD);
-		nd->path.mnt = &mounted->mnt;
-		nd->path.dentry = mounted->mnt.mnt_root;
-		inode = nd->path.dentry->d_inode;
-		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+		return step_into(nd, flags | WALK_NOFOLLOW,
+				 parent, inode, seq);
 	}
-	nd->inode = inode;
-	if (!(flags & WALK_MORE) && nd->depth)
-		put_link(nd);
-	return NULL;
 }
 
 static const char *follow_dotdot(struct nameidata *nd, int flags)
@@ -1778,15 +1762,12 @@ static const char *follow_dotdot(struct nameidata *nd, int flags)
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
 			return ERR_PTR(-EXDEV);
+		return step_into(nd, flags | WALK_NOFOLLOW,
+				 dget(nd->path.dentry), nd->inode, nd->seq);
 	} else {
-		dput(nd->path.dentry);
-		nd->path.dentry = parent;
+		return step_into(nd, flags | WALK_NOFOLLOW,
+				 parent, parent->d_inode, 0);
 	}
-	follow_mount(&nd->path);
-	nd->inode = nd->path.dentry->d_inode;
-	if (!(flags & WALK_MORE) && nd->depth)
-		put_link(nd);
-	return NULL;
 }
 
 static const char *handle_dots(struct nameidata *nd, int type, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 49/55] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (46 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 48/55] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 50/55] follow_dotdot{,_rcu}(): massage loops Al Viro
                       ` (5 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

lift step_into() into handle_dots() (where they merge with each other);
have follow_... return dentry and pass inode/seq to the caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 74 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cc01bca2266c..cf7b7a6c08fc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1689,31 +1689,29 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static const char *follow_dotdot_rcu(struct nameidata *nd, int flags)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
+					struct inode **inodep,
+					unsigned *seqp)
 {
-	struct dentry *parent = NULL;
-	struct inode *inode = nd->inode;
-	unsigned seq;
-
 	while (1) {
 		if (path_equal(&nd->path, &nd->root))
 			break;
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
+			struct dentry *parent = old->d_parent;
 
-			parent = old->d_parent;
-			inode = parent->d_inode;
-			seq = read_seqcount_begin(&parent->d_seq);
+			*inodep = parent->d_inode;
+			*seqp = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 				return ERR_PTR(-ECHILD);
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return ERR_PTR(-ECHILD);
-			break;
+			return parent;
 		} else {
 			struct mount *mnt = real_mount(nd->path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode2 = mountpoint->d_inode;
+			struct inode *inode = mountpoint->d_inode;
 			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 				return ERR_PTR(-ECHILD);
@@ -1724,56 +1722,51 @@ static const char *follow_dotdot_rcu(struct nameidata *nd, int flags)
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
-			inode = nd->inode = inode2;
+			nd->inode = inode;
 			nd->seq = seq;
 		}
 	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return ERR_PTR(-ECHILD);
-		return step_into(nd, flags | WALK_NOFOLLOW,
-				 nd->path.dentry, nd->inode, nd->seq);
-	} else {
-		return step_into(nd, flags | WALK_NOFOLLOW,
-				 parent, inode, seq);
-	}
+	if (unlikely(nd->flags & LOOKUP_BENEATH))
+		return ERR_PTR(-ECHILD);
+	return NULL;
 }
 
-static const char *follow_dotdot(struct nameidata *nd, int flags)
+static struct dentry *follow_dotdot(struct nameidata *nd,
+				 struct inode **inodep,
+				 unsigned *seqp)
 {
-	struct dentry *parent = NULL;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root))
 			break;
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
-			parent = dget_parent(nd->path.dentry);
+			struct dentry *parent = dget_parent(nd->path.dentry);
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
 				return ERR_PTR(-ENOENT);
 			}
-			break;
+			*seqp = 0;
+			*inodep = parent->d_inode;
+			return parent;
 		}
 		if (!follow_up(&nd->path))
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-EXDEV);
 	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return ERR_PTR(-EXDEV);
-		return step_into(nd, flags | WALK_NOFOLLOW,
-				 dget(nd->path.dentry), nd->inode, nd->seq);
-	} else {
-		return step_into(nd, flags | WALK_NOFOLLOW,
-				 parent, parent->d_inode, 0);
-	}
+	if (unlikely(nd->flags & LOOKUP_BENEATH))
+		return ERR_PTR(-EXDEV);
+	dget(nd->path.dentry);
+	return NULL;
 }
 
 static const char *handle_dots(struct nameidata *nd, int type, int flags)
 {
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
+		struct dentry *parent;
+		struct inode *inode;
+		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1781,11 +1774,18 @@ static const char *handle_dots(struct nameidata *nd, int type, int flags)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			error = follow_dotdot_rcu(nd, flags);
+			parent = follow_dotdot_rcu(nd, &inode, &seq);
 		else
-			error = follow_dotdot(nd, flags);
-		if (error)
-			return error;
+			parent = follow_dotdot(nd, &inode, &seq);
+		if (IS_ERR(parent))
+			return ERR_CAST(parent);
+		if (unlikely(!parent)) {
+			error = step_into(nd, flags | WALK_NOFOLLOW,
+					 nd->path.dentry, nd->inode, nd->seq);
+		} else {
+			error = step_into(nd, flags | WALK_NOFOLLOW,
+					 parent, inode, seq);
+		}
 
 		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
 			/*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 50/55] follow_dotdot{,_rcu}(): massage loops
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (47 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 49/55] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 51/55] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
                       ` (4 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The logics in both of them is the same:
	while true
		if in process' root	// uncommon
			break
		if *not* in mount root	// normal case
			find the parent
			return
		if at absolute root	// very uncommon
			break
		move to underlying mountpoint
	report that we are in root

Pull the common path out of the loop:
	if in process' root		// uncommon
		goto in_root
	if unlikely(in mount root)
		while true
			if at absolute root
				goto in_root
			move to underlying mountpoint
			if in process' root
				goto in_root
			if in mount root
				break;
	find the parent	// we are not in mount root
	return
in_root:
	report that we are in root

The reason for that transformation is that we get to keep the
common path straight *and* get a separate block for "move
through underlying mountpoints", which will allow to sanitize
NO_XDEV handling there.  What's more, the pared-down loops
will be easier to deal with - in particular, non-RCU case
has no need to grab mount_lock and rewriting it to the
form that wouldn't do that is a non-trivial change.  Better
do that with less stuff getting in the way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 77 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cf7b7a6c08fc..aa03a4b2ebe9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1693,21 +1693,12 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 					struct inode **inodep,
 					unsigned *seqp)
 {
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			struct dentry *old = nd->path.dentry;
-			struct dentry *parent = old->d_parent;
+	struct dentry *parent, *old;
 
-			*inodep = parent->d_inode;
-			*seqp = read_seqcount_begin(&parent->d_seq);
-			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return ERR_PTR(-ECHILD);
-			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return ERR_PTR(-ECHILD);
-			return parent;
-		} else {
+	if (path_equal(&nd->path, &nd->root))
+		goto in_root;
+	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		while (1) {
 			struct mount *mnt = real_mount(nd->path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
@@ -1716,7 +1707,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 				return ERR_PTR(-ECHILD);
 			if (&mparent->mnt == nd->path.mnt)
-				break;
+				goto in_root;
 			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 				return ERR_PTR(-ECHILD);
 			/* we know that mountpoint was pinned */
@@ -1724,8 +1715,22 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 			nd->path.mnt = &mparent->mnt;
 			nd->inode = inode;
 			nd->seq = seq;
+			if (path_equal(&nd->path, &nd->root))
+				goto in_root;
+			if (nd->path.dentry != nd->path.mnt->mnt_root)
+				break;
 		}
 	}
+	old = nd->path.dentry;
+	parent = old->d_parent;
+	*inodep = parent->d_inode;
+	*seqp = read_seqcount_begin(&parent->d_seq);
+	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+		return ERR_PTR(-ECHILD);
+	if (unlikely(!path_connected(nd->path.mnt, parent)))
+		return ERR_PTR(-ECHILD);
+	return parent;
+in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	return NULL;
@@ -1735,25 +1740,33 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 				 struct inode **inodep,
 				 unsigned *seqp)
 {
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			/* rare case of legitimate dget_parent()... */
-			struct dentry *parent = dget_parent(nd->path.dentry);
-			if (unlikely(!path_connected(nd->path.mnt, parent))) {
-				dput(parent);
-				return ERR_PTR(-ENOENT);
-			}
-			*seqp = 0;
-			*inodep = parent->d_inode;
-			return parent;
+	struct dentry *parent;
+
+	if (path_equal(&nd->path, &nd->root))
+		goto in_root;
+	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		while (1) {
+			if (!follow_up(&nd->path))
+				goto in_root;
+			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+				return ERR_PTR(-EXDEV);
+			if (path_equal(&nd->path, &nd->root))
+				goto in_root;
+			if (nd->path.dentry != nd->path.mnt->mnt_root)
+				break;
 		}
-		if (!follow_up(&nd->path))
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return ERR_PTR(-EXDEV);
 	}
+	/* rare case of legitimate dget_parent()... */
+	parent = dget_parent(nd->path.dentry);
+	if (unlikely(!path_connected(nd->path.mnt, parent))) {
+		dput(parent);
+		return ERR_PTR(-ENOENT);
+	}
+	*seqp = 0;
+	*inodep = parent->d_inode;
+	return parent;
+
+in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
 	dget(nd->path.dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 51/55] follow_dotdot_rcu(): be lazy about changing nd->path
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (48 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 50/55] follow_dotdot{,_rcu}(): massage loops Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 52/55] follow_dotdot(): " Al Viro
                       ` (3 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Change nd->path only after the loop is done and only in case we hadn't
ended up finding ourselves in root.  Same for NO_XDEV check.  Don't
recheck mount_lock on each step either.

That separates the "check how far back do we need to go through the
mount stack" logics from the rest of .. traversal.

Note that the sequence for d_seq/d_inode here is
	* sample mount_lock seqcount
...
	* sample d_seq
	* fetch d_inode
	* verify mount_lock seqcount
The last step makes sure that d_inode value we'd got matches d_seq -
it dentry is guaranteed to have been a mountpoint through the
entire thing, so its d_inode must have been stable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index aa03a4b2ebe9..fe49db1df8ae 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1698,28 +1698,31 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		struct path path = nd->path;
+		unsigned seq;
+
 		while (1) {
-			struct mount *mnt = real_mount(nd->path.mnt);
+			struct mount *mnt = real_mount(path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode = mountpoint->d_inode;
-			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return ERR_PTR(-ECHILD);
-			if (&mparent->mnt == nd->path.mnt)
+			seq = read_seqcount_begin(&mountpoint->d_seq);
+			if (&mparent->mnt == path.mnt)
 				goto in_root;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return ERR_PTR(-ECHILD);
-			/* we know that mountpoint was pinned */
-			nd->path.dentry = mountpoint;
-			nd->path.mnt = &mparent->mnt;
-			nd->inode = inode;
-			nd->seq = seq;
-			if (path_equal(&nd->path, &nd->root))
+			path.dentry = mountpoint;
+			path.mnt = &mparent->mnt;
+			if (path_equal(&path, &nd->root))
 				goto in_root;
-			if (nd->path.dentry != nd->path.mnt->mnt_root)
+			if (path.dentry != path.mnt->mnt_root)
 				break;
 		}
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return ERR_PTR(-ECHILD);
+		nd->path = path;
+		nd->inode = path.dentry->d_inode;
+		nd->seq = seq;
+		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+			return ERR_PTR(-ECHILD);
+		/* we know that mountpoint was pinned */
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
@@ -1731,6 +1734,8 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	return parent;
 in_root:
+	if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	return NULL;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 52/55] follow_dotdot(): be lazy about changing nd->path
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (49 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 51/55] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 53/55] helper for mount rootwards traversal Al Viro
                       ` (2 subsequent siblings)
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Change nd->path only after the loop is done and only in case we hadn't
ended up finding ourselves in root.  Same for NO_XDEV check.

That separates the "check how far back do we need to go through the
mount stack" logics from the rest of .. traversal.

NOTE: path_get/path_put introduced here are temporary.  They will
go away later in the series.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fe49db1df8ae..ab17f7ea8c9f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1750,16 +1750,24 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		struct path path = nd->path;
+		path_get(&path);
 		while (1) {
-			if (!follow_up(&nd->path))
+			if (!follow_up(&path)) {
+				path_put(&path);
 				goto in_root;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return ERR_PTR(-EXDEV);
-			if (path_equal(&nd->path, &nd->root))
+			}
+			if (path_equal(&path, &nd->root)) {
+				path_put(&path);
 				goto in_root;
-			if (nd->path.dentry != nd->path.mnt->mnt_root)
+			}
+			if (path.dentry != nd->path.mnt->mnt_root)
 				break;
 		}
+		path_put(&nd->path);
+		nd->path = path;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return ERR_PTR(-EXDEV);
 	}
 	/* rare case of legitimate dget_parent()... */
 	parent = dget_parent(nd->path.dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 53/55] helper for mount rootwards traversal
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (50 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 52/55] follow_dotdot(): " Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 54/55] non-RCU analogue of the previous commit Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 55/55] fs/namei.c: kill follow_mount() Al Viro
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The loops in follow_dotdot{_rcu()} are doing the same thing:
we have a mount and we want to find out how far up the chain
of mounts do we need to go.

We follow the chain of mount until we find one that is not
directly overmounting the root of another mount.  If such
a mount is found, we want the location it's mounted upon.
If we run out of chain (i.e. get to a mount that is not
mounted on anything else) or run into process' root, we
report failure.

On success, we want (in RCU case) d_seq of resulting location
sampled or (in non-RCU case) references to that location
acquired.

This commit introduces such primitive for RCU case and
switches follow_dotdot_rcu() to it; non-RCU case will be
go in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ab17f7ea8c9f..8c09523c6f56 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1363,6 +1363,26 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	return ret;
 }
 
+static bool choose_mountpoint_rcu(struct mount *m, const struct path *root,
+				  struct path *path, unsigned *seqp)
+{
+	while (mnt_has_parent(m)) {
+		struct dentry *mountpoint = m->mnt_mountpoint;
+
+		m = m->mnt_parent;
+		if (unlikely(root->dentry == mountpoint &&
+			     root->mnt == &m->mnt))
+			break;
+		if (mountpoint != m->mnt.mnt_root) {
+			path->mnt = &m->mnt;
+			path->dentry = mountpoint;
+			*seqp = read_seqcount_begin(&mountpoint->d_seq);
+			return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1698,23 +1718,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
-		struct path path = nd->path;
+		struct path path;
 		unsigned seq;
-
-		while (1) {
-			struct mount *mnt = real_mount(path.mnt);
-			struct mount *mparent = mnt->mnt_parent;
-			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (&mparent->mnt == path.mnt)
-				goto in_root;
-			path.dentry = mountpoint;
-			path.mnt = &mparent->mnt;
-			if (path_equal(&path, &nd->root))
-				goto in_root;
-			if (path.dentry != path.mnt->mnt_root)
-				break;
-		}
+		if (!choose_mountpoint_rcu(real_mount(nd->path.mnt),
+					   &nd->root, &path, &seq))
+			goto in_root;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-ECHILD);
 		nd->path = path;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 54/55] non-RCU analogue of the previous commit
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (51 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 53/55] helper for mount rootwards traversal Al Viro
@ 2020-03-01 21:52     ` Al Viro
  2020-03-01 21:52     ` [RFC][PATCH v3 55/55] fs/namei.c: kill follow_mount() Al Viro
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

new helper: choose_mountpoint().  Wrapper around choose_mountpoint_rcu(),
similar to lookup_mnt() vs. __lookup_mnt().  follow_dotdot() switched to
it.  Now we don't grab mount_lock exclusive anymore; note that the
primitive used non-RCU mount traversals in other direction (lookup_mnt())
doesn't bother with that either - it uses mount_lock seqcount instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8c09523c6f56..f247eca119f1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -605,10 +605,9 @@ static void terminate_walk(struct nameidata *nd)
 }
 
 /* path_put is needed afterwards regardless of success or failure */
-static bool legitimize_path(struct nameidata *nd,
-			    struct path *path, unsigned seq)
+static bool __legitimize_path(struct path *path, unsigned seq, unsigned mseq)
 {
-	int res = __legitimize_mnt(path->mnt, nd->m_seq);
+	int res = __legitimize_mnt(path->mnt, mseq);
 	if (unlikely(res)) {
 		if (res > 0)
 			path->mnt = NULL;
@@ -622,6 +621,12 @@ static bool legitimize_path(struct nameidata *nd,
 	return !read_seqcount_retry(&path->dentry->d_seq, seq);
 }
 
+static inline bool legitimize_path(struct nameidata *nd,
+			    struct path *path, unsigned seq)
+{
+	return __legitimize_path(path, nd->m_seq, seq);
+}
+
 static bool legitimize_links(struct nameidata *nd)
 {
 	int i;
@@ -1383,6 +1388,31 @@ static bool choose_mountpoint_rcu(struct mount *m, const struct path *root,
 	return false;
 }
 
+static bool choose_mountpoint(struct mount *m, const struct path *root,
+			      struct path *path)
+{
+	bool found;
+
+	rcu_read_lock();
+	while (1) {
+		unsigned seq, mseq = read_seqbegin(&mount_lock);
+
+		found = choose_mountpoint_rcu(m, root, path, &seq);
+		if (unlikely(!found)) {
+			if (!read_seqretry(&mount_lock, mseq))
+				break;
+		} else {
+			if (likely(__legitimize_path(path, seq, mseq)))
+				break;
+			rcu_read_unlock();
+			path_put(path);
+			rcu_read_lock();
+		}
+	}
+	rcu_read_unlock();
+	return found;
+}
+
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1758,22 +1788,14 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
-		struct path path = nd->path;
-		path_get(&path);
-		while (1) {
-			if (!follow_up(&path)) {
-				path_put(&path);
-				goto in_root;
-			}
-			if (path_equal(&path, &nd->root)) {
-				path_put(&path);
-				goto in_root;
-			}
-			if (path.dentry != nd->path.mnt->mnt_root)
-				break;
-		}
+		struct path path;
+
+		if (!choose_mountpoint(real_mount(nd->path.mnt),
+				       &nd->root, &path))
+			goto in_root;
 		path_put(&nd->path);
 		nd->path = path;
+		nd->inode = path.dentry->d_inode;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-EXDEV);
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v3 55/55] fs/namei.c: kill follow_mount()
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                       ` (52 preceding siblings ...)
  2020-03-01 21:52     ` [RFC][PATCH v3 54/55] non-RCU analogue of the previous commit Al Viro
@ 2020-03-01 21:52     ` Al Viro
  53 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-01 21:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

the only remaining caller should be using follow_down() anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f247eca119f1..d9fd5fd43ca1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1414,22 +1414,6 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
 }
 
 /*
- * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
- */
-static void follow_mount(struct path *path)
-{
-	while (d_mountpoint(path->dentry)) {
-		struct vfsmount *mounted = lookup_mnt(path);
-		if (!mounted)
-			break;
-		dput(path->dentry);
-		mntput(path->mnt);
-		path->mnt = mounted;
-		path->dentry = dget(mounted->mnt_root);
-	}
-}
-
-/*
  * This looks up the name in dcache and possibly revalidates the found dentry.
  * NULL is returned if the dentry does not exist in the cache.
  */
@@ -2668,7 +2652,7 @@ int path_pts(struct path *path)
 
 	path->dentry = child;
 	dput(parent);
-	follow_mount(path);
+	follow_down(path);
 	return 0;
 }
 #endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-03-01 22:34   ` Linus Torvalds
  2020-03-02  0:39     ` Al Viro
  2020-03-13 23:53   ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Al Viro
  2 siblings, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-03-01 22:34 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sun, Mar 1, 2020 at 3:51 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         Extended since the last repost.  The branch is in #work.dotdot;
> #work.do_last is its beginning (about 2/3 of the total), slightly
> reworked since the last time.

I'm traveling, so only a quick read-through.

One request: can you add the total diffstat to the cover letter (along
with what you used as a base)? I did apply it to a branch just to look
at it more closely, so I can see the final diffstat that way:

 Documentation/filesystems/path-lookup.rst |    7 +-
 fs/autofs/dev-ioctl.c                     |    6 +-
 fs/internal.h                             |    1 -
 fs/namei.c                                | 1333 +++++++++------------
 fs/namespace.c                            |   96 +-
 fs/open.c                                 |    4 +-
 include/linux/namei.h                     |    4 +-
 7 files changed, 642 insertions(+), 809 deletions(-)

but it would have been nice to see in your explanation too.

Anyway, from a quick read-through, I don't see anything that raises my
hackles - you've fixed the goto label naming, and I didn't notice
anything else odd.

Maybe that was because I wasn't careful enough. But the final line
count certainly speaks for the series..

             Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-01 22:34   ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Linus Torvalds
@ 2020-03-02  0:39     ` Al Viro
  2020-03-03 23:48       ` Eric W. Biederman
  0 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-02  0:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Sun, Mar 01, 2020 at 04:34:06PM -0600, Linus Torvalds wrote:
> On Sun, Mar 1, 2020 at 3:51 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         Extended since the last repost.  The branch is in #work.dotdot;
> > #work.do_last is its beginning (about 2/3 of the total), slightly
> > reworked since the last time.
> 
> I'm traveling, so only a quick read-through.
> 
> One request: can you add the total diffstat to the cover letter (along
> with what you used as a base)?

Sure, no problem (and the base is still -rc1)

> I did apply it to a branch just to look
> at it more closely, so I can see the final diffstat that way:
> 
>  Documentation/filesystems/path-lookup.rst |    7 +-
>  fs/autofs/dev-ioctl.c                     |    6 +-
>  fs/internal.h                             |    1 -
>  fs/namei.c                                | 1333 +++++++++------------
>  fs/namespace.c                            |   96 +-
>  fs/open.c                                 |    4 +-
>  include/linux/namei.h                     |    4 +-
>  7 files changed, 642 insertions(+), 809 deletions(-)
> 
> but it would have been nice to see in your explanation too.
> 
> Anyway, from a quick read-through, I don't see anything that raises my
> hackles - you've fixed the goto label naming, and I didn't notice
> anything else odd.
> 
> Maybe that was because I wasn't careful enough. But the final line
> count certainly speaks for the series..

Heh...  Part of my metrics is actually "how large a sheet of paper does
one need to fit the call graph on" ;-)

I hope it gets serious beating, though - it touches pretty much every
codepath in pathname resolution.  Is there any way to sic the bots on
a branch, short of "push it into -next and wait for screams"?

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-02  0:39     ` Al Viro
@ 2020-03-03 23:48       ` Eric W. Biederman
  2020-03-04  0:24         ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Eric W. Biederman @ 2020-03-03 23:48 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Sun, Mar 01, 2020 at 04:34:06PM -0600, Linus Torvalds wrote:
>> On Sun, Mar 1, 2020 at 3:51 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>> >
>> >         Extended since the last repost.  The branch is in #work.dotdot;
>> > #work.do_last is its beginning (about 2/3 of the total), slightly
>> > reworked since the last time.
>> 
>> I'm traveling, so only a quick read-through.
>> 
>> One request: can you add the total diffstat to the cover letter (along
>> with what you used as a base)?
>
> Sure, no problem (and the base is still -rc1)
>
>> I did apply it to a branch just to look
>> at it more closely, so I can see the final diffstat that way:
>> 
>>  Documentation/filesystems/path-lookup.rst |    7 +-
>>  fs/autofs/dev-ioctl.c                     |    6 +-
>>  fs/internal.h                             |    1 -
>>  fs/namei.c                                | 1333 +++++++++------------
>>  fs/namespace.c                            |   96 +-
>>  fs/open.c                                 |    4 +-
>>  include/linux/namei.h                     |    4 +-
>>  7 files changed, 642 insertions(+), 809 deletions(-)
>> 
>> but it would have been nice to see in your explanation too.
>> 
>> Anyway, from a quick read-through, I don't see anything that raises my
>> hackles - you've fixed the goto label naming, and I didn't notice
>> anything else odd.
>> 
>> Maybe that was because I wasn't careful enough. But the final line
>> count certainly speaks for the series..
>
> Heh...  Part of my metrics is actually "how large a sheet of paper does
> one need to fit the call graph on" ;-)
>
> I hope it gets serious beating, though - it touches pretty much every
> codepath in pathname resolution.  Is there any way to sic the bots on
> a branch, short of "push it into -next and wait for screams"?

Last I looked pushing a branch to kernel.org was enough for the
kbuild bots.  Sending patches to LKML is also enough for those bots.

I don't know if that kind of bot is what you need testing your code.

Eric


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-03 23:48       ` Eric W. Biederman
@ 2020-03-04  0:24         ` Al Viro
  2020-03-04  5:23           ` Eric W. Biederman
  0 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-04  0:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

On Tue, Mar 03, 2020 at 05:48:31PM -0600, Eric W. Biederman wrote:

> > I hope it gets serious beating, though - it touches pretty much every
> > codepath in pathname resolution.  Is there any way to sic the bots on
> > a branch, short of "push it into -next and wait for screams"?
> 
> Last I looked pushing a branch to kernel.org was enough for the
> kbuild bots.  Sending patches to LKML is also enough for those bots.
> 
> I don't know if that kind of bot is what you need testing your code.

Build bots are generally nice, but in this case... pretty much all of
the changes are in fs/namei.c, which is not all that sensitive to
config/architecture/whatnot.  Sure, something like "is audit enabled?"
may affect the build problems, but not much beyond that.

What was that Intel-run(?) bot that posts "such-and-such metrics has
42% regression on such-and-such commit" from time to time?
<checks>
Subject: [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement
seems to be the latest of that sort,
From: kernel test robot <rong.a.chen@intel.com>

Not sure how much of pathwalk-heavy loads is covered by profiling
bots of that sort, unfortunately... ;-/

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04  0:24         ` Al Viro
@ 2020-03-04  5:23           ` Eric W. Biederman
  2020-03-04  6:55             ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Eric W. Biederman @ 2020-03-04  5:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Tue, Mar 03, 2020 at 05:48:31PM -0600, Eric W. Biederman wrote:
>
>> > I hope it gets serious beating, though - it touches pretty much every
>> > codepath in pathname resolution.  Is there any way to sic the bots on
>> > a branch, short of "push it into -next and wait for screams"?
>> 
>> Last I looked pushing a branch to kernel.org was enough for the
>> kbuild bots.  Sending patches to LKML is also enough for those bots.
>> 
>> I don't know if that kind of bot is what you need testing your code.
>
> Build bots are generally nice, but in this case... pretty much all of
> the changes are in fs/namei.c, which is not all that sensitive to
> config/architecture/whatnot.  Sure, something like "is audit enabled?"
> may affect the build problems, but not much beyond that.
>
> What was that Intel-run(?) bot that posts "such-and-such metrics has
> 42% regression on such-and-such commit" from time to time?
> <checks>
> Subject: [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement
> seems to be the latest of that sort,
> From: kernel test robot <rong.a.chen@intel.com>
>
> Not sure how much of pathwalk-heavy loads is covered by profiling
> bots of that sort, unfortunately... ;-/

Do the xfs-tests cover that sort of thing?
The emphasis is stress testing the filesystem not the VFS but there is a
lot of overlap between the two.

Eric


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04  5:23           ` Eric W. Biederman
@ 2020-03-04  6:55             ` Al Viro
  2020-03-04 13:28               ` Matthew Wilcox
       [not found]               ` <20200304105946.4xseo3jokcnpptrj@yavin>
  0 siblings, 2 replies; 198+ messages in thread
From: Al Viro @ 2020-03-04  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

On Tue, Mar 03, 2020 at 11:23:39PM -0600, Eric W. Biederman wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
> 
> > On Tue, Mar 03, 2020 at 05:48:31PM -0600, Eric W. Biederman wrote:
> >
> >> > I hope it gets serious beating, though - it touches pretty much every
> >> > codepath in pathname resolution.  Is there any way to sic the bots on
> >> > a branch, short of "push it into -next and wait for screams"?
> >> 
> >> Last I looked pushing a branch to kernel.org was enough for the
> >> kbuild bots.  Sending patches to LKML is also enough for those bots.
> >> 
> >> I don't know if that kind of bot is what you need testing your code.
> >
> > Build bots are generally nice, but in this case... pretty much all of
> > the changes are in fs/namei.c, which is not all that sensitive to
> > config/architecture/whatnot.  Sure, something like "is audit enabled?"
> > may affect the build problems, but not much beyond that.
> >
> > What was that Intel-run(?) bot that posts "such-and-such metrics has
> > 42% regression on such-and-such commit" from time to time?
> > <checks>
> > Subject: [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement
> > seems to be the latest of that sort,
> > From: kernel test robot <rong.a.chen@intel.com>
> >
> > Not sure how much of pathwalk-heavy loads is covered by profiling
> > bots of that sort, unfortunately... ;-/
> 
> Do the xfs-tests cover that sort of thing?
> The emphasis is stress testing the filesystem not the VFS but there is a
> lot of overlap between the two.

I do run xfstests.  But "runs in KVM without visible slowdowns" != "won't
cause them on 48-core bare metal".  And this area (especially when it
comes to RCU mode) can be, er, interesting in that respect.

FWIW, I'm putting together some litmus tests for pathwalk semantics -
one of the things I'd like to discuss at LSF; quite a few codepaths
are simply not touched by anything in xfstests.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04  6:55             ` Al Viro
@ 2020-03-04 13:28               ` Matthew Wilcox
  2020-03-04 16:20                 ` Al Viro
       [not found]               ` <20200304105946.4xseo3jokcnpptrj@yavin>
  1 sibling, 1 reply; 198+ messages in thread
From: Matthew Wilcox @ 2020-03-04 13:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List

On Wed, Mar 04, 2020 at 06:55:47AM +0000, Al Viro wrote:
> On Tue, Mar 03, 2020 at 11:23:39PM -0600, Eric W. Biederman wrote:
> > Do the xfs-tests cover that sort of thing?
> > The emphasis is stress testing the filesystem not the VFS but there is a
> > lot of overlap between the two.
> 
> I do run xfstests.  But "runs in KVM without visible slowdowns" != "won't
> cause them on 48-core bare metal".  And this area (especially when it
> comes to RCU mode) can be, er, interesting in that respect.
> 
> FWIW, I'm putting together some litmus tests for pathwalk semantics -
> one of the things I'd like to discuss at LSF; quite a few codepaths
> are simply not touched by anything in xfstests.

Might be more appropriate for LTP than xfstests?  will-it-scale might be
the right place for performance benchmarks.


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04 13:28               ` Matthew Wilcox
@ 2020-03-04 16:20                 ` Al Viro
  2020-03-04 20:54                   ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-04 16:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Eric W. Biederman, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List

On Wed, Mar 04, 2020 at 05:28:12AM -0800, Matthew Wilcox wrote:
> On Wed, Mar 04, 2020 at 06:55:47AM +0000, Al Viro wrote:
> > On Tue, Mar 03, 2020 at 11:23:39PM -0600, Eric W. Biederman wrote:
> > > Do the xfs-tests cover that sort of thing?
> > > The emphasis is stress testing the filesystem not the VFS but there is a
> > > lot of overlap between the two.
> > 
> > I do run xfstests.  But "runs in KVM without visible slowdowns" != "won't
> > cause them on 48-core bare metal".  And this area (especially when it
> > comes to RCU mode) can be, er, interesting in that respect.
> > 
> > FWIW, I'm putting together some litmus tests for pathwalk semantics -
> > one of the things I'd like to discuss at LSF; quite a few codepaths
> > are simply not touched by anything in xfstests.
> 
> Might be more appropriate for LTP than xfstests?  will-it-scale might be
> the right place for performance benchmarks.

Might be...  I do run LTP as well, but it's still a 4-way KVM on a 6-way
amd64 host (phenom II X6 1100T) - not well-suited for catching scalability
issues.

Litmus tests mentioned above are more about verifying the semantics;
I hadn't moved past the "bunch of home-grown scripts creating setups
that would exercise the codepaths in question + trivial pieces
in C, pretty much limited to syscall()" stage with that; moving those
to LTP framework is something I'll need to look into.  Might very well
make sense; for now I just want a way to get test coverage of that code
with minimal headache.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04 16:20                 ` Al Viro
@ 2020-03-04 20:54                   ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-04 20:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Eric W. Biederman, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List

On Wed, Mar 04, 2020 at 04:20:51PM +0000, Al Viro wrote:
> On Wed, Mar 04, 2020 at 05:28:12AM -0800, Matthew Wilcox wrote:
> > On Wed, Mar 04, 2020 at 06:55:47AM +0000, Al Viro wrote:
> > > On Tue, Mar 03, 2020 at 11:23:39PM -0600, Eric W. Biederman wrote:
> > > > Do the xfs-tests cover that sort of thing?
> > > > The emphasis is stress testing the filesystem not the VFS but there is a
> > > > lot of overlap between the two.
> > > 
> > > I do run xfstests.  But "runs in KVM without visible slowdowns" != "won't
> > > cause them on 48-core bare metal".  And this area (especially when it
> > > comes to RCU mode) can be, er, interesting in that respect.
> > > 
> > > FWIW, I'm putting together some litmus tests for pathwalk semantics -
> > > one of the things I'd like to discuss at LSF; quite a few codepaths
> > > are simply not touched by anything in xfstests.
> > 
> > Might be more appropriate for LTP than xfstests?  will-it-scale might be
> > the right place for performance benchmarks.
> 
> Might be...  I do run LTP as well, but it's still a 4-way KVM on a 6-way
> amd64 host (phenom II X6 1100T) - not well-suited for catching scalability
> issues.
> 
> Litmus tests mentioned above are more about verifying the semantics;
> I hadn't moved past the "bunch of home-grown scripts creating setups
> that would exercise the codepaths in question + trivial pieces
> in C, pretty much limited to syscall()" stage with that; moving those
> to LTP framework is something I'll need to look into.  Might very well
> make sense; for now I just want a way to get test coverage of that code
> with minimal headache.

FWIW, I've just took mainline and added to follow_dotdot{,_rcu}() (.. handling,
RCU and non-RCU cases) gathering the number of full iterations through the
first loop, the way that loop had been left and the number of full iterations
through the second loop.  Anything other than 3 common cases (think /..,
/proc/.., /proc/scsi/..) get reported.  Built, booted and started xfstests
on it.  So far all it has caught was .. into overmounted root early in the
boot (0 full passes through the first loop, left it via the first break
when it ran into current->fs->root, 1 full pass through the second loop).
Nothing from xfstests run (yet); LTP will be the next one to go...

Note, BTW, that this unusual call in early boot has hit RCU case and would
almost certainly do so on all boxen.  So in all of that there had not been
a single sodding call that would've hit the traversal into overmount of
parent in non-RCU case.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
       [not found]               ` <20200304105946.4xseo3jokcnpptrj@yavin>
@ 2020-03-04 21:00                 ` Al Viro
  2020-03-05  1:11                   ` Matthew Wilcox
  0 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-04 21:00 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Eric W. Biederman, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List

On Wed, Mar 04, 2020 at 09:59:46PM +1100, Aleksa Sarai wrote:

> > FWIW, I'm putting together some litmus tests for pathwalk semantics -
> > one of the things I'd like to discuss at LSF; quite a few codepaths
> > are simply not touched by anything in xfstests.
> 
> I won't be at LSF unfortunately, but this is something I would be very
> interested in helping with -- one of the things I've noticed is the lack
> of a test-suite for some of the more generic VFS bits (such as namei).

BTW, has anyone tried to run tests with oprofile and see how much of the
core kernel gets exercised?  That looks like an obvious thing to try -
at least the places outside of spin_lock_irq() ought to get lit after
a while...

Have any CI folks tried doing that, or am I missing some obvious reason
why that is not feasible?

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v3)
  2020-03-04 21:00                 ` Al Viro
@ 2020-03-05  1:11                   ` Matthew Wilcox
  0 siblings, 0 replies; 198+ messages in thread
From: Matthew Wilcox @ 2020-03-05  1:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Aleksa Sarai, Eric W. Biederman, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List

On Wed, Mar 04, 2020 at 09:00:31PM +0000, Al Viro wrote:
> On Wed, Mar 04, 2020 at 09:59:46PM +1100, Aleksa Sarai wrote:
> 
> > > FWIW, I'm putting together some litmus tests for pathwalk semantics -
> > > one of the things I'd like to discuss at LSF; quite a few codepaths
> > > are simply not touched by anything in xfstests.
> > 
> > I won't be at LSF unfortunately, but this is something I would be very
> > interested in helping with -- one of the things I've noticed is the lack
> > of a test-suite for some of the more generic VFS bits (such as namei).
> 
> BTW, has anyone tried to run tests with oprofile and see how much of the
> core kernel gets exercised?  That looks like an obvious thing to try -
> at least the places outside of spin_lock_irq() ought to get lit after
> a while...
> 
> Have any CI folks tried doing that, or am I missing some obvious reason
> why that is not feasible?

I don't know about oprofile, but LTP got their gcov patches merged
into 2.6.31:

http://ltp.sourceforge.net/coverage/gcov.php

^ permalink raw reply	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers
  2020-03-13 23:53   ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Al Viro
@ 2020-03-13 23:52     ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 02/69] fix automount/automount race properly Al Viro
                         ` (71 more replies)
  2020-03-14  0:50     ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Linus Torvalds
  1 sibling, 72 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

preparation to finish_automount() fix (next commit)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 85b5f7bea82e..dcd015fafe01 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2697,45 +2697,32 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
+			struct path *path, int mnt_flags)
 {
-	struct mountpoint *mp;
-	struct mount *parent;
-	int err;
+	struct mount *parent = real_mount(path->mnt);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
-	mp = lock_mount(path);
-	if (IS_ERR(mp))
-		return PTR_ERR(mp);
-
-	parent = real_mount(path->mnt);
-	err = -EINVAL;
 	if (unlikely(!check_mnt(parent))) {
 		/* that's acceptable only for automounts done in private ns */
 		if (!(mnt_flags & MNT_SHRINKABLE))
-			goto unlock;
+			return -EINVAL;
 		/* ... and for those we'd better have mountpoint still alive */
 		if (!parent->mnt_ns)
-			goto unlock;
+			return -EINVAL;
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	err = -EBUSY;
 	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
 	    path->mnt->mnt_root == path->dentry)
-		goto unlock;
+		return -EBUSY;
 
-	err = -EINVAL;
 	if (d_is_symlink(newmnt->mnt.mnt_root))
-		goto unlock;
+		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	err = graft_tree(newmnt, parent, mp);
-
-unlock:
-	unlock_mount(mp);
-	return err;
+	return graft_tree(newmnt, parent, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -2748,6 +2735,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct vfsmount *mnt;
+	struct mountpoint *mp;
 	struct super_block *sb = fc->root->d_sb;
 	int error;
 
@@ -2768,7 +2756,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
+	mp = lock_mount(mountpoint);
+	if (IS_ERR(mp)) {
+		mntput(mnt);
+		return PTR_ERR(mp);
+	}
+	error = do_add_mount(real_mount(mnt), mp, mountpoint, mnt_flags);
+	unlock_mount(mp);
 	if (error < 0)
 		mntput(mnt);
 	return error;
@@ -2830,6 +2824,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct mount *mnt = real_mount(m);
+	struct mountpoint *mp;
 	int err;
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
@@ -2842,7 +2837,13 @@ int finish_automount(struct vfsmount *m, struct path *path)
 		goto fail;
 	}
 
-	err = do_add_mount(mnt, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	mp = lock_mount(path);
+	if (IS_ERR(mp)) {
+		err = PTR_ERR(mp);
+		goto fail;
+	}
+	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+	unlock_mount(mp);
 	if (!err)
 		return 0;
 fail:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 02/69] fix automount/automount race properly
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 03/69] follow_automount(): get rid of dead^Wstillborn code Al Viro
                         ` (70 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Protection against automount/automount races (two threads hitting the same
referral point at the same time) is based upon do_add_mount() prevention of
identical overmounts - trying to overmount the root of mounted tree with
the same tree fails with -EBUSY.  It's unreliable (the other thread might've
mounted something on top of the automount it has triggered) *and* causes
no end of headache for follow_automount() and its caller, since
finish_automount() behaves like do_new_mount() - if the mountpoint to be is
overmounted, it mounts on top what's overmounting it.  It's not only wrong
(we want to go into what's overmounting the automount point and quietly
discard what we planned to mount there), it introduces the possibility of
original parent mount getting dropped.  That's what 8aef18845266 (VFS: Fix
vfsmount overput on simultaneous automount) deals with, but it can't do
anything about the reliability of conflict detection - if something had
been overmounted the other thread's automount (e.g. that other thread
having stepped into automount in mount(2)), we don't get that -EBUSY and
the result is
	 referral point under automounted NFS under explicit overmount
under another copy of automounted NFS

What we need is finish_automount() *NOT* digging into overmounts - if it
finds one, it should just quietly discard the thing it was asked to mount.
And don't bother with actually crossing into the results of finish_automount() -
the same loop that calls follow_automount() will do that just fine on the
next iteration.

IOW, instead of calling lock_mount() have finish_automount() do it manually,
_without_ the "move into overmount and retry" part.  And leave crossing into
the results to the caller of follow_automount(), which simplifies it a lot.

Moral: if you end up with a lot of glue working around the calling conventions
of something, perhaps these calling conventions are simply wrong...

Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 29 ++++-------------------------
 fs/namespace.c | 40 +++++++++++++++++++++++++++++++++-------
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db6565c99825..626eddb33508 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,11 +1208,9 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd,
-			    bool *need_mntput)
+static int follow_automount(struct path *path, struct nameidata *nd)
 {
 	struct vfsmount *mnt;
-	int err;
 
 	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
 		return -EREMOTE;
@@ -1253,29 +1251,10 @@ static int follow_automount(struct path *path, struct nameidata *nd,
 		return PTR_ERR(mnt);
 	}
 
-	if (!mnt) /* mount collision */
-		return 0;
-
-	if (!*need_mntput) {
-		/* lock_mount() may release path->mnt on error */
-		mntget(path->mnt);
-		*need_mntput = true;
-	}
-	err = finish_automount(mnt, path);
-
-	switch (err) {
-	case -EBUSY:
-		/* Someone else made a mount here whilst we were busy */
+	if (!mnt)
 		return 0;
-	case 0:
-		path_put(path);
-		path->mnt = mnt;
-		path->dentry = dget(mnt->mnt_root);
-		return 0;
-	default:
-		return err;
-	}
 
+	return finish_automount(mnt, path);
 }
 
 /*
@@ -1333,7 +1312,7 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd, &need_mntput);
+			ret = follow_automount(path, nd);
 			if (ret < 0)
 				break;
 			continue;
diff --git a/fs/namespace.c b/fs/namespace.c
index dcd015fafe01..777c3116e62e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2823,6 +2823,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 
 int finish_automount(struct vfsmount *m, struct path *path)
 {
+	struct dentry *dentry = path->dentry;
 	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
 	int err;
@@ -2832,21 +2833,46 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	BUG_ON(mnt_get_count(mnt) < 2);
 
 	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == path->dentry) {
+	    m->mnt_root == dentry) {
 		err = -ELOOP;
-		goto fail;
+		goto discard;
 	}
 
-	mp = lock_mount(path);
+	/*
+	 * we don't want to use lock_mount() - in this case finding something
+	 * that overmounts our mountpoint to be means "quitely drop what we've
+	 * got", not "try to mount it on top".
+	 */
+	inode_lock(dentry->d_inode);
+	namespace_lock();
+	if (unlikely(cant_mount(dentry))) {
+		err = -ENOENT;
+		goto discard_locked;
+	}
+	rcu_read_lock();
+	if (unlikely(__lookup_mnt(path->mnt, dentry))) {
+		rcu_read_unlock();
+		err = 0;
+		goto discard_locked;
+	}
+	rcu_read_unlock();
+	mp = get_mountpoint(dentry);
 	if (IS_ERR(mp)) {
 		err = PTR_ERR(mp);
-		goto fail;
+		goto discard_locked;
 	}
+
 	err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(mp);
-	if (!err)
-		return 0;
-fail:
+	if (unlikely(err))
+		goto discard;
+	mntput(m);
+	return 0;
+
+discard_locked:
+	namespace_unlock();
+	inode_unlock(dentry->d_inode);
+discard:
 	/* remove m from any expiration list it may be on */
 	if (!list_empty(&mnt->mnt_expire)) {
 		namespace_lock();
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 03/69] follow_automount(): get rid of dead^Wstillborn code
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 02/69] fix automount/automount race properly Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 04/69] follow_automount() doesn't need the entire nameidata Al Viro
                         ` (69 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) no instances of ->d_automount() have ever made use of the "return
ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's
a rudiment of plans that got superseded before the thing went into
the tree.  Despite the comment in follow_automount(), autofs has
never done that.

2) if there's no ->d_automount() in dentry_operations, filesystems
should not set DCACHE_NEED_AUTOMOUNT in the first place.  None have
ever done so...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c     | 28 +++-------------------------
 fs/namespace.c |  9 ++++++++-
 2 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 626eddb33508..39dd56f5171f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1210,10 +1210,7 @@ EXPORT_SYMBOL(follow_up);
  */
 static int follow_automount(struct path *path, struct nameidata *nd)
 {
-	struct vfsmount *mnt;
-
-	if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
-		return -EREMOTE;
+	struct dentry *dentry = path->dentry;
 
 	/* We don't want to mount if someone's just doing a stat -
 	 * unless they're stat'ing a directory and appended a '/' to
@@ -1228,33 +1225,14 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 */
 	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-	    path->dentry->d_inode)
+	    dentry->d_inode)
 		return -EISDIR;
 
 	nd->total_link_count++;
 	if (nd->total_link_count >= 40)
 		return -ELOOP;
 
-	mnt = path->dentry->d_op->d_automount(path);
-	if (IS_ERR(mnt)) {
-		/*
-		 * The filesystem is allowed to return -EISDIR here to indicate
-		 * it doesn't want to automount.  For instance, autofs would do
-		 * this so that its userspace daemon can mount on this dentry.
-		 *
-		 * However, we can only permit this if it's a terminal point in
-		 * the path being looked up; if it wasn't then the remainder of
-		 * the path is inaccessible and we should say so.
-		 */
-		if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
-			return -EREMOTE;
-		return PTR_ERR(mnt);
-	}
-
-	if (!mnt)
-		return 0;
-
-	return finish_automount(mnt, path);
+	return finish_automount(dentry->d_op->d_automount(path), path);
 }
 
 /*
diff --git a/fs/namespace.c b/fs/namespace.c
index 777c3116e62e..743980380a8f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2824,9 +2824,16 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 int finish_automount(struct vfsmount *m, struct path *path)
 {
 	struct dentry *dentry = path->dentry;
-	struct mount *mnt = real_mount(m);
 	struct mountpoint *mp;
+	struct mount *mnt;
 	int err;
+
+	if (!m)
+		return 0;
+	if (IS_ERR(m))
+		return PTR_ERR(m);
+
+	mnt = real_mount(m);
 	/* The new mount record should have at least 2 refs to prevent it being
 	 * expired before we get a chance to add it
 	 */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 04/69] follow_automount() doesn't need the entire nameidata
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 02/69] fix automount/automount race properly Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 03/69] follow_automount(): get rid of dead^Wstillborn code Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 05/69] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
                         ` (68 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Only the address of ->total_link_count and the flags.
And fix an off-by-one is ELOOP detection - make it
consistent with symlink following, where we check if
the pre-increment value has reached 40, rather than
check the post-increment one.

[kudos to Christian Brauner for spotted braino]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 39dd56f5171f..6721c5f7e9d5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1208,7 +1208,7 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, struct nameidata *nd)
+static int follow_automount(struct path *path, int *count, unsigned lookup_flags)
 {
 	struct dentry *dentry = path->dentry;
 
@@ -1223,13 +1223,12 @@ static int follow_automount(struct path *path, struct nameidata *nd)
 	 * as being automount points.  These will need the attentions
 	 * of the daemon to instantiate them before they can be used.
 	 */
-	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+	if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
 			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
 	    dentry->d_inode)
 		return -EISDIR;
 
-	nd->total_link_count++;
-	if (nd->total_link_count >= 40)
+	if (count && (*count)++ >= MAXSYMLINKS)
 		return -ELOOP;
 
 	return finish_automount(dentry->d_op->d_automount(path), path);
@@ -1290,7 +1289,8 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 
 		/* Handle an automount point */
 		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, nd);
+			ret = follow_automount(path, &nd->total_link_count,
+						nd->flags);
 			if (ret < 0)
 				break;
 			continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 05/69] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (2 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 04/69] follow_automount() doesn't need the entire nameidata Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 06/69] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
                         ` (67 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink".
As it is, we might or might not have LOOKUP_FOLLOW in op->intent
in that case - that depends upon having O_NOFOLLOW in open flags.
It doesn't matter, since we won't be checking it in that case -
do_last() bails out earlier.

However, making sure it's not set (i.e. acting as if we had an explicit
O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the
check for O_CREAT | O_EXCL in do_last() with the call of step_into()
immediately following it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 +++++----------
 fs/open.c  |  4 +++-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6721c5f7e9d5..6938d20aa73a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3396,22 +3396,17 @@ static int do_last(struct nameidata *nd,
 	if (unlikely(error < 0))
 		return error;
 
-	/*
-	 * create/update audit record if it already exists.
-	 */
-	audit_inode(nd->name, path.dentry, 0);
-
-	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-		path_to_nameidata(&path, nd);
-		return -EEXIST;
-	}
-
 	seq = 0;	/* out of RCU mode, so the value doesn't matter */
 	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
+
+	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
+		audit_inode(nd->name, nd->path.dentry, 0);
+		return -EEXIST;
+	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..e5227cd533f4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1049,8 +1049,10 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 
 	if (flags & O_CREAT) {
 		op->intent |= LOOKUP_CREATE;
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			op->intent |= LOOKUP_EXCL;
+			flags |= O_NOFOLLOW;
+		}
 	}
 
 	if (flags & O_DIRECTORY)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 06/69] handle_mounts(): start building a sane wrapper for follow_managed()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (3 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 05/69] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 07/69] atomic_open(): saner calling conventions (return dentry on success) Al Viro
                         ` (66 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers of follow_managed() follow it on success with the same steps -
d_backing_inode(path->dentry) is calculated and stored into some struct inode *
variable and, in all but one case, an unsigned variable (nd->seq to be) is
zeroed.  The single exception is lookup_fast() and there zeroing is correct
thing to do - not doing it is a pointless microoptimization.

	Add a wrapper for follow_managed() that would do that combination.
It's mostly a vehicle for code massage - it will be changing quite a bit,
and the current calling conventions are by no means final.  Right now it
takes path, nameidata and (as out params) inode and seq, similar to
__follow_mount_rcu().  Which will soon get folded into it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6938d20aa73a..c104ec75faef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,6 +1385,18 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
+static inline int handle_mounts(struct path *path, struct nameidata *nd,
+			  struct inode **inode, unsigned int *seqp)
+{
+	int ret = follow_managed(path, nd);
+
+	if (likely(ret >= 0)) {
+		*inode = d_backing_inode(path->dentry);
+		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
+	}
+	return ret;
+}
+
 static int follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct inode *inode = nd->inode;
@@ -1607,7 +1619,6 @@ static int lookup_fast(struct nameidata *nd,
 	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
-	int err;
 
 	/*
 	 * Rename seqlock is not required here because in the off chance
@@ -1677,10 +1688,7 @@ static int lookup_fast(struct nameidata *nd,
 
 	path->mnt = mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd);
-	if (likely(err > 0))
-		*inode = d_backing_inode(path->dentry);
-	return err;
+	return handle_mounts(path, nd, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1875,12 +1883,9 @@ static int walk_component(struct nameidata *nd, int flags)
 			return PTR_ERR(path.dentry);
 
 		path.mnt = nd->path.mnt;
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-
-		seq = 0;	/* we are already out of RCU mode */
-		inode = d_backing_inode(path.dentry);
 	}
 
 	return step_into(nd, &path, flags, inode, seq);
@@ -2365,11 +2370,9 @@ static int handle_lookup_down(struct nameidata *nd)
 			return -ECHILD;
 	} else {
 		dget(path.dentry);
-		err = follow_managed(&path, nd);
+		err = handle_mounts(&path, nd, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
-		inode = d_backing_inode(path.dentry);
-		seq = 0;
 	}
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
@@ -3392,12 +3395,9 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	error = follow_managed(&path, nd);
+	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-
-	seq = 0;	/* out of RCU mode, so the value doesn't matter */
-	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 07/69] atomic_open(): saner calling conventions (return dentry on success)
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (4 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 06/69] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 08/69] lookup_open(): " Al Viro
                         ` (65 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Currently it either returns -E... or puts (nd->path.mnt,dentry)
into *path and returns 0.  Make it return ERR_PTR(-E...) or
dentry; adjust the caller.  Fewer arguments and it's easier
to keep track of *path contents that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c104ec75faef..5f8b791a6d6e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3087,10 +3087,10 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
  *
  * Returns an error code otherwise.
  */
-static int atomic_open(struct nameidata *nd, struct dentry *dentry,
-			struct path *path, struct file *file,
-			const struct open_flags *op,
-			int open_flag, umode_t mode)
+static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
+				  struct file *file,
+				  const struct open_flags *op,
+				  int open_flag, umode_t mode)
 {
 	struct dentry *const DENTRY_NOT_SET = (void *) -1UL;
 	struct inode *dir =  nd->path.dentry->d_inode;
@@ -3131,17 +3131,15 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
 			}
 			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-			if (unlikely(d_is_negative(dentry))) {
+			if (unlikely(d_is_negative(dentry)))
 				error = -ENOENT;
-			} else {
-				path->dentry = dentry;
-				path->mnt = nd->path.mnt;
-				return 0;
-			}
 		}
 	}
-	dput(dentry);
-	return error;
+	if (error) {
+		dput(dentry);
+		dentry = ERR_PTR(error);
+	}
+	return dentry;
 }
 
 /*
@@ -3236,11 +3234,20 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 
 	if (dir_inode->i_op->atomic_open) {
-		error = atomic_open(nd, dentry, path, file, op, open_flag,
-				    mode);
-		if (unlikely(error == -ENOENT) && create_error)
-			error = create_error;
-		return error;
+		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
+		if (IS_ERR(dentry)) {
+			error = PTR_ERR(dentry);
+			if (unlikely(error == -ENOENT) && create_error)
+				error = create_error;
+			return error;
+		}
+		if (file->f_mode & FMODE_OPENED) {
+			dput(dentry);
+			return 0;
+		}
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+		return 0;
 	}
 
 no_open:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 08/69] lookup_open(): saner calling conventions (return dentry on success)
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (5 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 07/69] atomic_open(): saner calling conventions (return dentry on success) Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 09/69] do_last(): collapse the call of path_to_nameidata() Al Viro
                         ` (64 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

same story as for atomic_open() in the previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5f8b791a6d6e..b350e1e2b46f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3157,10 +3157,9 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
  *
  * An error code is returned on failure.
  */
-static int lookup_open(struct nameidata *nd, struct path *path,
-			struct file *file,
-			const struct open_flags *op,
-			bool got_write)
+static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
+				  const struct open_flags *op,
+				  bool got_write)
 {
 	struct dentry *dir = nd->path.dentry;
 	struct inode *dir_inode = dir->d_inode;
@@ -3171,7 +3170,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
-		return -ENOENT;
+		return ERR_PTR(-ENOENT);
 
 	file->f_mode &= ~FMODE_CREATED;
 	dentry = d_lookup(dir, &nd->last);
@@ -3179,7 +3178,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		if (!dentry) {
 			dentry = d_alloc_parallel(dir, &nd->last, &wq);
 			if (IS_ERR(dentry))
-				return PTR_ERR(dentry);
+				return dentry;
 		}
 		if (d_in_lookup(dentry))
 			break;
@@ -3195,7 +3194,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 	}
 	if (dentry->d_inode) {
 		/* Cached positive dentry: will open in f_op->open */
-		goto out_no_open;
+		return dentry;
 	}
 
 	/*
@@ -3235,19 +3234,9 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 
 	if (dir_inode->i_op->atomic_open) {
 		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
-		if (IS_ERR(dentry)) {
-			error = PTR_ERR(dentry);
-			if (unlikely(error == -ENOENT) && create_error)
-				error = create_error;
-			return error;
-		}
-		if (file->f_mode & FMODE_OPENED) {
-			dput(dentry);
-			return 0;
-		}
-		path->mnt = nd->path.mnt;
-		path->dentry = dentry;
-		return 0;
+		if (unlikely(create_error) && dentry == ERR_PTR(-ENOENT))
+			dentry = ERR_PTR(create_error);
+		return dentry;
 	}
 
 no_open:
@@ -3283,14 +3272,11 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 		error = create_error;
 		goto out_dput;
 	}
-out_no_open:
-	path->dentry = dentry;
-	path->mnt = nd->path.mnt;
-	return 0;
+	return dentry;
 
 out_dput:
 	dput(dentry);
-	return error;
+	return ERR_PTR(error);
 }
 
 /*
@@ -3309,6 +3295,7 @@ static int do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct path path;
+	struct dentry *dentry;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3365,14 +3352,18 @@ static int do_last(struct nameidata *nd,
 		inode_lock(dir->d_inode);
 	else
 		inode_lock_shared(dir->d_inode);
-	error = lookup_open(nd, &path, file, op, got_write);
+	dentry = lookup_open(nd, file, op, got_write);
 	if (open_flag & O_CREAT)
 		inode_unlock(dir->d_inode);
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (error)
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
 		goto out;
+	}
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3380,6 +3371,7 @@ static int do_last(struct nameidata *nd,
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
+		dput(dentry);
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 09/69] do_last(): collapse the call of path_to_nameidata()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (6 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 08/69] lookup_open(): " Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 10/69] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
                         ` (63 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and shift filling struct path to just before the call of
handle_mounts().  All callers of handle_mounts() are
immediately preceded by path->mnt = nd->path.mnt now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b350e1e2b46f..79029a1048be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3362,8 +3362,6 @@ static int do_last(struct nameidata *nd,
 		error = PTR_ERR(dentry);
 		goto out;
 	}
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
 
 	if (file->f_mode & FMODE_OPENED) {
 		if ((file->f_mode & FMODE_CREATED) ||
@@ -3380,7 +3378,8 @@ static int do_last(struct nameidata *nd,
 		open_flag &= ~O_TRUNC;
 		will_truncate = false;
 		acc_mode = 0;
-		path_to_nameidata(&path, nd);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
 		goto finish_open_created;
 	}
 
@@ -3394,6 +3393,8 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+	path.mnt = nd->path.mnt;
+	path.dentry = dentry;
 	error = handle_mounts(&path, nd, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 10/69] handle_mounts(): pass dentry in, turn path into a pure out argument
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (7 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 09/69] do_last(): collapse the call of path_to_nameidata() Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-13 23:52       ` [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case Al Viro
                         ` (62 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

All callers are equivalent to
	path->dentry = dentry;
	path->mnt = nd->path.mnt;
	err = handle_mounts(path, ...)
Pass dentry as an explicit argument, fill *path in handle_mounts()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 79029a1048be..a613fa52a174 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1385,11 +1385,15 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
-static inline int handle_mounts(struct path *path, struct nameidata *nd,
-			  struct inode **inode, unsigned int *seqp)
+static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
+			  struct path *path, struct inode **inode,
+			  unsigned int *seqp)
 {
-	int ret = follow_managed(path, nd);
+	int ret;
 
+	path->mnt = nd->path.mnt;
+	path->dentry = dentry;
+	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
@@ -1685,10 +1689,7 @@ static int lookup_fast(struct nameidata *nd,
 		dput(dentry);
 		return status;
 	}
-
-	path->mnt = mnt;
-	path->dentry = dentry;
-	return handle_mounts(path, nd, inode, seqp);
+	return handle_mounts(nd, dentry, path, inode, seqp);
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1859,6 +1860,7 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 static int walk_component(struct nameidata *nd, int flags)
 {
 	struct path path;
+	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
 	int err;
@@ -1877,13 +1879,11 @@ static int walk_component(struct nameidata *nd, int flags)
 	if (unlikely(err <= 0)) {
 		if (err < 0)
 			return err;
-		path.dentry = lookup_slow(&nd->last, nd->path.dentry,
-					  nd->flags);
-		if (IS_ERR(path.dentry))
-			return PTR_ERR(path.dentry);
+		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
 
-		path.mnt = nd->path.mnt;
-		err = handle_mounts(&path, nd, &inode, &seq);
+		err = handle_mounts(nd, dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -2355,7 +2355,7 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path = nd->path;
+	struct path path;
 	struct inode *inode = nd->inode;
 	unsigned seq = nd->seq;
 	int err;
@@ -2366,11 +2366,12 @@ static int handle_lookup_down(struct nameidata *nd)
 		 * at the very beginning of walk, so we lose nothing
 		 * if we simply redo everything in non-RCU mode
 		 */
+		path = nd->path;
 		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
 			return -ECHILD;
 	} else {
-		dget(path.dentry);
-		err = handle_mounts(&path, nd, &inode, &seq);
+		dget(nd->path.dentry);
+		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 		if (unlikely(err < 0))
 			return err;
 	}
@@ -3393,9 +3394,7 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
-	path.mnt = nd->path.mnt;
-	path.dentry = dentry;
-	error = handle_mounts(&path, nd, &inode, &seq);
+	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
 finish_lookup:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (8 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 10/69] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
@ 2020-03-13 23:52       ` Al Viro
  2020-03-14  0:25         ` Linus Torvalds
  2020-03-13 23:53       ` [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode Al Viro
                         ` (61 subsequent siblings)
  71 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
to call unlazy_child() and, should it succeed, handle_mounts().
Note that we have status > 0 (or we wouldn't be calling
__follow_mount_rcu() at all), so all stuff conditional upon
non-positive status won't be even touched.

Consolidate just that sequence after the call of __follow_mount_rcu().

2) calling d_is_negative() and keeping its result is pointless -
we either don't get past checking ->d_seq (and don't use the results of
d_is_negative() at all), or we are guaranteed that ->d_inode and
type bits of ->d_flags had been consistent at the time of d_is_negative()
call.  IOW, we could only get to the use of its result if it's
equal to !inode.  The same ->d_seq check guarantees that after that point
this CPU won't observe ->d_flags values older than ->d_inode update.
So 'negative' variable is completely pointless these days.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a613fa52a174..494f4959e790 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1631,7 +1631,6 @@ static int lookup_fast(struct nameidata *nd,
 	 */
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned seq;
-		bool negative;
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
@@ -1644,7 +1643,6 @@ static int lookup_fast(struct nameidata *nd,
 		 * the dentry name information from lookup.
 		 */
 		*inode = d_backing_inode(dentry);
-		negative = d_is_negative(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
 			return -ECHILD;
 
@@ -1665,12 +1663,15 @@ static int lookup_fast(struct nameidata *nd,
 			 * Note: do negative dentry check after revalidation in
 			 * case that drops it.
 			 */
-			if (unlikely(negative))
+			if (unlikely(!inode))
 				return -ENOENT;
 			path->mnt = mnt;
 			path->dentry = dentry;
 			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
 				return 1;
+			if (unlazy_child(nd, dentry, seq))
+				return -ECHILD;
+			return handle_mounts(nd, dentry, path, inode, seqp);
 		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (9 preceding siblings ...)
  2020-03-13 23:52       ` [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-14  0:28         ` Linus Torvalds
  2020-03-13 23:53       ` [RFC][PATCH v4 13/69] lookup_fast(): take mount traversal into callers Al Viro
                         ` (60 subsequent siblings)
  71 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and make the callers of __follow_mount_rcu() use handle_mounts().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 494f4959e790..37b2f82e6f2c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1393,6 +1393,18 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
+	if (nd->flags & LOOKUP_RCU) {
+		unsigned int seq = *seqp;
+		if (unlikely(!*inode))
+			return -ENOENT;
+		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+			return 1;
+		if (unlazy_child(nd, dentry, seq))
+			return -ECHILD;
+		// *path might've been clobbered by __follow_mount_rcu()
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
 	ret = follow_managed(path, nd);
 	if (likely(ret >= 0)) {
 		*inode = d_backing_inode(path->dentry);
@@ -1620,7 +1632,6 @@ static int lookup_fast(struct nameidata *nd,
 		       struct path *path, struct inode **inode,
 		       unsigned *seqp)
 {
-	struct vfsmount *mnt = nd->path.mnt;
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
 
@@ -1658,21 +1669,8 @@ static int lookup_fast(struct nameidata *nd,
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
-		if (likely(status > 0)) {
-			/*
-			 * Note: do negative dentry check after revalidation in
-			 * case that drops it.
-			 */
-			if (unlikely(!inode))
-				return -ENOENT;
-			path->mnt = mnt;
-			path->dentry = dentry;
-			if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-				return 1;
-			if (unlazy_child(nd, dentry, seq))
-				return -ECHILD;
+		if (likely(status > 0))
 			return handle_mounts(nd, dentry, path, inode, seqp);
-		}
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		if (unlikely(status == -ECHILD))
@@ -2361,21 +2359,11 @@ static int handle_lookup_down(struct nameidata *nd)
 	unsigned seq = nd->seq;
 	int err;
 
-	if (nd->flags & LOOKUP_RCU) {
-		/*
-		 * don't bother with unlazy_walk on failure - we are
-		 * at the very beginning of walk, so we lose nothing
-		 * if we simply redo everything in non-RCU mode
-		 */
-		path = nd->path;
-		if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
-			return -ECHILD;
-	} else {
+	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-		err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
-	}
+	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+		return err;
 	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 13/69] lookup_fast(): take mount traversal into callers
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (10 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 14/69] step_into() callers: dismiss the symlink earlier Al Viro
                         ` (59 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Current calling conventions: -E... on error, 0 on cache miss,
result of handle_mounts(nd, dentry, path, inode, seqp) on
success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
resp.; deal with handle_mounts() in the callers.  The thing
is, they already do that in cache miss handling case, so we
just need to supply dentry to them and unify the mount traversal
in those cases.  Fewer arguments that way, and we get closer
to merging handle_mounts() and step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 37b2f82e6f2c..84ce7ccd944e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1628,9 +1628,9 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	return dentry;
 }
 
-static int lookup_fast(struct nameidata *nd,
-		       struct path *path, struct inode **inode,
-		       unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd,
+				  struct inode **inode,
+			          unsigned *seqp)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
@@ -1645,8 +1645,8 @@ static int lookup_fast(struct nameidata *nd,
 		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
-				return -ECHILD;
-			return 0;
+				return ERR_PTR(-ECHILD);
+			return NULL;
 		}
 
 		/*
@@ -1655,7 +1655,7 @@ static int lookup_fast(struct nameidata *nd,
 		 */
 		*inode = d_backing_inode(dentry);
 		if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		/*
 		 * This sequence count validates that the parent had no
@@ -1665,30 +1665,30 @@ static int lookup_fast(struct nameidata *nd,
 		 *  enough, we can use __read_seqcount_retry here.
 		 */
 		if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 
 		*seqp = seq;
 		status = d_revalidate(dentry, nd->flags);
 		if (likely(status > 0))
-			return handle_mounts(nd, dentry, path, inode, seqp);
+			return dentry;
 		if (unlazy_child(nd, dentry, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		if (unlikely(status == -ECHILD))
 			/* we'd been told to redo it in non-rcu mode */
 			status = d_revalidate(dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
-			return 0;
+			return NULL;
 		status = d_revalidate(dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
 			d_invalidate(dentry);
 		dput(dentry);
-		return status;
+		return ERR_PTR(status);
 	}
-	return handle_mounts(nd, dentry, path, inode, seqp);
+	return dentry;
 }
 
 /* Fast lookup failed, do it the slow way */
@@ -1874,19 +1874,18 @@ static int walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return err;
 	}
-	err = lookup_fast(nd, &path, &inode, &seq);
-	if (unlikely(err <= 0)) {
-		if (err < 0)
-			return err;
+	dentry = lookup_fast(nd, &inode, &seq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
-
-		err = handle_mounts(nd, dentry, &path, &inode, &seq);
-		if (unlikely(err < 0))
-			return err;
 	}
 
+	err = handle_mounts(nd, dentry, &path, &inode, &seq);
+	if (unlikely(err < 0))
+			return err;
 	return step_into(nd, &path, flags, inode, seq);
 }
 
@@ -3302,13 +3301,12 @@ static int do_last(struct nameidata *nd,
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		error = lookup_fast(nd, &path, &inode, &seq);
-		if (likely(error > 0))
+		dentry = lookup_fast(nd, &inode, &seq);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+		if (likely(dentry))
 			goto finish_lookup;
 
-		if (error < 0)
-			return error;
-
 		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
@@ -3383,10 +3381,10 @@ static int do_last(struct nameidata *nd,
 		got_write = false;
 	}
 
+finish_lookup:
 	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-finish_lookup:
 	error = step_into(nd, &path, 0, inode, seq);
 	if (unlikely(error))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 14/69] step_into() callers: dismiss the symlink earlier
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (11 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 13/69] lookup_fast(): take mount traversal into callers Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW Al Viro
                         ` (58 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

We need to dismiss a symlink when we are done traversing it;
currently that's done when we call step_into() for its last
component.  For the cases when we do not call step_into()
for that component (i.e. when it's . or ..) we do the same
symlink dismissal after the call of handle_dots().

What we need to guarantee is that the symlink won't be dismissed
while we are still using nd->last.name - it's pointing into the
body of said symlink.  step_into() is sufficiently late - by
the time it's called we'd already obtained the dentry, so the
name we'd been looking up is no longer needed.  However, it
turns out to be cleaner to have that ("we are done with that
component now, can dismiss the link") done explicitly - in the
callers of step_into().

In handle_dots() case we won't be using the component string
at all, so for . and .. the corresponding point is actually
_before_ the call of handle_dots(), not after it.

Fix a minor irregularity in do_last(), while we are at it -
if trailing symlink ended with . or .. we forgot to dismiss
it.  Not a problem, since nameidata is about to be done with
(neither . nor .. can be a trailing symlink, so this is the
last iteration through the loop) and terminate_walk() will
clean the stack anyway, but let's keep it more regular.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 84ce7ccd944e..3097edcb4a1a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1838,8 +1838,6 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2};
 static inline int step_into(struct nameidata *nd, struct path *path,
 			    int flags, struct inode *inode, unsigned seq)
 {
-	if (!(flags & WALK_MORE) && nd->depth)
-		put_link(nd);
 	if (likely(!d_is_symlink(path->dentry)) ||
 	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
 		/* not a symlink or should not follow */
@@ -1869,9 +1867,9 @@ static int walk_component(struct nameidata *nd, int flags)
 	 * parent relationships.
 	 */
 	if (unlikely(nd->last_type != LAST_NORM)) {
-		err = handle_dots(nd, nd->last_type);
 		if (!(flags & WALK_MORE) && nd->depth)
 			put_link(nd);
+		err = handle_dots(nd, nd->last_type);
 		return err;
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
@@ -1882,6 +1880,8 @@ static int walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
+	if (!(flags & WALK_MORE) && nd->depth)
+		put_link(nd);
 
 	err = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(err < 0))
@@ -3291,6 +3291,8 @@ static int do_last(struct nameidata *nd,
 	nd->flags |= op->intent;
 
 	if (nd->last_type != LAST_NORM) {
+		if (nd->depth)
+			put_link(nd);
 		error = handle_dots(nd, nd->last_type);
 		if (unlikely(error))
 			return error;
@@ -3382,6 +3384,8 @@ static int do_last(struct nameidata *nd,
 	}
 
 finish_lookup:
+	if (nd->depth)
+		put_link(nd);
 	error = handle_mounts(nd, dentry, &path, &inode, &seq);
 	if (unlikely(error < 0))
 		return error;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCHSET] sanitized pathwalk machinery (v4)
  2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
  2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-01 22:34   ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Linus Torvalds
@ 2020-03-13 23:53   ` Al Viro
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
  2020-03-14  0:50     ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Linus Torvalds
  2 siblings, 2 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

	Extended since the last repost.  The branch is in #work.dotdot,
still based at 5.6-rc1.  Diffstat is
 Documentation/filesystems/path-lookup.rst |    7 +-
 fs/autofs/dev-ioctl.c                     |    6 +-
 fs/internal.h                             |    1 -
 fs/namei.c                                | 1465 ++++++++++++-----------------
 fs/namespace.c                            |   96 +-
 fs/open.c                                 |    4 +-
 include/linux/namei.h                     |    4 +-
 7 files changed, 675 insertions(+), 908 deletions(-)

	Individual patches are in the followups.  Branch survives
the local testing (including ltp and xfstests).

	Review and testing would be _very_ welcome; it does a lot
of massage, so there had been a plenty of opportunities to fuck up
and fail to spot that.  The same goes for profiling - it doesn't
seem to slow the things down, but that needs to be verified.  If
nobody screams, into -next it goes in a few days...

	Changes since v3: ".." series (part 6) got cleaner up,
a bunch of pick_link() cleanups added (part 7) and more path_openat()/
do_last()/open_last_lookups() refactoring is added in the end (part 8).

part 1: follow_automount() cleanups and fixes.

	Quite a bit of that function had been about working around the
wrong calling conventions of finish_automount().  The problem is that
finish_automount() misuses the primitive intended for mount(2) and
friends, where we want to mount on top of the pile, even if something
has managed to add to that while we'd been trying to lock the namespace.
For automount that's not the right thing to do - there we want to discard
whatever it was going to attach and just cross into what got mounted
there in the meanwhile (most likely - the results of the same automount
triggered by somebody else).  Current mainline kinda-sorta manages to do
that, but it's unreliable and very convoluted.  Much simpler approach
is to stop using lock_mount() in finish_automount() and have it bail
out if something turns out to have been mounted on top where we wanted
to attach.  That allows to get rid of a lot of PITA in the caller.
Another simplification comes from not trying to cross into the results
of automount - simply ride through the next iteration of the loop and
let it move into overmount.

	Another thing in the same series is divorcing follow_automount()
from nameidata; that'll play later when we get to unifying follow_down()
with the guts of follow_managed().

	4 commits, the second one fixes a hard-to-hit race.  The first
is a prereq for it.

1/69	do_add_mount(): lift lock_mount/unlock_mount into callers
2/69	fix automount/automount race properly
3/69	follow_automount(): get rid of dead^Wstillborn code
4/69	follow_automount() doesn't need the entire nameidata

part 2: unifying mount traversals in pathwalk.

	Handling of mount traversal (follow_managed()) is currently called
in a bunch of places.  Each of them is shortly followed by a call of
step_into() or an open-coded equivalent thereof.  However, the locations
of those step_into() calls are far from preceding follow_managed();
moreover, that preceding call might happen on different paths that
converge to given step_into() call.  It's harder to analyse that it should
be (especially when it comes to liveness analysis) and it forces rather
ugly calling conventions on lookup_fast()/atomic_open()/lookup_open().
The series below massages the code to the point when the calls of
follow_managed() (and __follow_mount_rcu()) move into the beginning of
step_into().

5/69	make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
	gets EEXIST handling in do_last() past the step_into() call there.
6/69	handle_mounts(): start building a sane wrapper for follow_managed()
	rather than mangling follow_managed() itself (and creating conflicts
	with openat2 series), add a wrapper that will absorb the required
	interface changes.
7/69	atomic_open(): saner calling conventions (return dentry on success)
	struct path passed to it is pure out parameter; only dentry part
	ever varies, though - mnt is always nd->path.mnt.  Just return
	the dentry on success, and ERR_PTR(-E...) on failure.
8/69	lookup_open(): saner calling conventions (return dentry on success)
	propagate the same change one level up the call chain.
9/69	do_last(): collapse the call of path_to_nameidata()
	struct path filled in lookup_open() call is eventually given to
	handle_mounts(); the only use it has before that is path_to_nameidata()
	call in "->atomic_open() has actually opened it" case, and there
	path_to_nameidata() is an overkill - we are guaranteed to replace
	only nd->path.dentry.  So have the struct path filled only immediately
	prior to handle_mounts().
10/69	handle_mounts(): pass dentry in, turn path into a pure out argument
	now all callers of handle_mount() are directly preceded by filling
	struct path it gets.  path->mnt is nd->path.mnt in all cases, so we can
	pass just the dentry instead and fill path in handle_mount() itself.
	Some boilerplate gone, path is pure out argument of handle_mount()
	now.
11/69	lookup_fast(): consolidate the RCU success case
	massage to gather what will become an RCU case equivalent of
	handle_mounts(); basically, that's what we do if revalidate succeeds
	in RCU case of lookup_fast(), including unlazy and fallback to
	handle_mounts() if __follow_mount_rcu() says "it's too tricky".
12/69	teach handle_mounts() to handle RCU mode
	... and take that into handle_mount() itself.  The other caller of
	__follow_mount_rcu() is fine with the same fallback (it just didn't
	bother since it's in the very beginning of pathwalk), switched to
	handle_mount() as well.
13/69	lookup_fast(): take mount traversal into callers
	Now we are getting somewhere - both RCU and non-RCU success cases of
	lookup_fast() are ended with the same return handle_mounts(...);
	move that to the callers - there it will merge with the identical calls
	that had been on the paths where we had to do slow lookups.
	lookup_fast() returns dentry now.
14/69   step_into() callers: dismiss the symlink earlier
	dismiss the symlink being traversed as soon as we know we are done
	looking at its body; do that directly from step_into() callers, don't
	leave it for step_into() to do.
15/69	new step_into() flag: WALK_NOFOLLOW
	use step_into() instead of open-coding it in handle_lookup_down().
	Add a flag for "don't follow symlinks regardless of LOOKUP_FOLLOW" for
	that (and eventually, I hope, for .. handling).
	Now *all* calls of handle_mounts() and step_into() are right next to
	each other.
16/69	fold handle_mounts() into step_into()
	... and we can move the call of handle_mounts() into step_into(),
	getting a slightly saner calling conventions out of that.
17/69	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
	another payoff from 14/17 - we can teach path_lookupat() to do
	what path_mountpointat() used to.  And kill the latter, along with
	its wrappers.
18/69	expand the only remaining call of path_lookup_conditional()
	minor cleanup - RIP path_lookup_conditional().  Only one caller left.

Changes so far:
	* mount traversal is taken into step_into().
	* lookup_fast(), atomic_open() and lookup_open() calling conventions
are slightly changed.  All of them return dentry now, instead of returning
an int and filling struct path on success.  For lookup_fast() the old
"0 for cache miss, 1 for cache hit" is replaced with "NULL stands for cache
miss, dentry - for hit".
	* step_into() can be called in RCU mode as well.  Takes nameidata,
WALK_... flags, dentry and, in RCU case, corresponding inode and seq value.
Handles mount traversals, decides whether it's a symlink to be followed.
Error => returns -E...; symlink to follow => returns 1, puts symlink on stack;
non-symlink or symlink not to follow => returns 0, moves nd->path to new location.
	* LOOKUP_MOUNTPOINT introduced; user_path_mountpoint_at() and friends
became calls of user_path_at() et.al. with LOOKUP_MOUNTPOINT in flags.

part 3: untangling the symlink handling.

	Right now when we decide to follow a symlink it happens this way:
	* step_into() decides that it has been given a symlink that needs to
be followed.
	* it calls pick_link(), which pushes the symlink on stack and
returns 1 on success / -E... on error.  Symlink's mount/dentry/seq is
stored on stack and the inode is stashed in nd->link_inode.
	* step_into() passes that 1 to its callers, which proceed to pass it
up the call chain for several layers.  In all cases we get to get_link()
call shortly afterwards.
	* get_link() is called, picks the inode stashed in nd->link_inode
by the pick_link(), does some checks, touches the atime, etc.
	* get_link() either picks the link body out of inode or calls
->get_link().  If it's an absolute symlink, we move to the root and return
the relative portion of the body; if it's a relative one - just return the
body.  If it's a procfs-style one, the call of nd_jump_link() has been
made and we'd moved to whatever location is desired.  And return NULL,
same as we do for symlink to "/".
	* the caller proceeds to deal with the string returned to it.

	The sequence is the same in all cases (nested symlink, trailing
symlink on lookup, trailing symlink on open), but its pieces are not close
to each other and the bit between the call of pick_link() and (inevitable)
call of get_link() afterwards is not easy to follow.  Moreover, a bunch
of functions (walk_component/lookup_last/do_last) ends up with the same
conventions for return values as step_into().  And those conventions
(see above) are not pretty - 0/1/-E... is asking for mistakes, especially
when returned 1 is used only to direct control flow on a rather twisted
way to matching get_link() call.  And that path can be seriously twisted.
E.g. when we are trying to open /dev/stdin, we get the following sequence:
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  Into the stack it goes and we return 1.
	* do_last() sees 1 and returns it.
	* trailing_symlink() is called (in the top-level loop) and it
calls get_link().  OK, we get "/proc/self/fd/0" for body, move to
root again and return "proc/self/fd/0".
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes and we return 1.
	* again, do_last() passes 1 to caller
	* trailing_symlink() is called and calls get_link().
	* this time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  But the
fun doesn't stop yet.
	* trailing_symlink() returns "" to the caller
	* link_path_walk() is called on that and does nothing
whatsoever.
	* do_last() is called and sees LAST_BIND left by the get_link().
It calls handle_dots()
	* handle_dots() drops the symlink from stack and returns
	* do_last() *FINALLY* proceeds to the point after its call of
step_into() (finish_open:) and gets around to opening the damn thing.

	Making sense of the control flow through all of that is not fun,
to put it mildly; debugging anything in that area can be a massive PITA,
and this example has touched only one of 3 cases.  Arguably, the worst
one, but...  Anyway, it turns out that this code can be massaged to
considerably saner shape - both in terms of control flow and wrt calling
conventions.

19/69	merging pick_link() with get_link(), part 1
	prep work: move the "hardening" crap from trailing_symlink() into
get_link() (conditional on the absense of LOOKUP_PARENT in nd->flags).
We'll be moving the calls of get_link() around quite a bit through that
series, and the next step will be to eliminate trailing_symlink().
20/69	merging pick_link() with get_link(), part 2
	fold trailing_symlink() into lookup_last() and do_last().
Now these are returning strings; it's not the final calling conventions,
but it's almost there.  NULL => old 0, we are done.  ERR_PTR(-E...) =>
old -E..., we'd failed.  string => old 1, and the string is the symlink
body to follow.  Just as for trailing_symlink(), "/" and procfs ones
(where get_link() returns NULL) yield "", so the ugly song and dance
with no-op trip through link_path_walk()/handle_dots() still remains.
21/69	merging pick_link() with get_link(), part 3
	elimination of that round-trip.  In *all* cases having
get_link() return NULL on such symlinks means that we'll proceed to
drop the symlink from stack and get back to the point near that
get_link() call - basically, where we would be if it hadn't been
a symlink at all.  The path by which we are getting there depends
upon the call site; the end result is the same in all cases - such
symlinks (procfs ones and symlink to "/") are fully processed by
the time get_link() returns, so we could as well drop them from the
stack right in get_link().  Makes life simpler in terms of control
flow analysis...
	And now the calling conventions for do_last() and lookup_last()
have reached the final shape - ERR_PTR(-E...) for error, NULL for
"we are done", string for "traverse this".
22/69	merging pick_link() with get_link(), part 4
	now all calls of walk_component() are followed by the same
boilerplate - "if it has returned 1, call get_link() and if that
has returned NULL treat that as if walk_component() has returned 0".
Eliminate by folding that into walk_component() itself.  Now
walk_component() return value conventions have joined those of
do_last()/lookup_last().
23/69	merging pick_link() with get_link(), part 5
	same as for the previous, only this time the boilerplate
migrates one level down, into step_into().  Only one caller of
get_link() left, step_into() has joined the same return value
conventions.
24/69	merging pick_link() with get_link(), part 6
	move that thing into pick_link().  Now all traces of
"return 1 if we are following a symlink" are gone.
25/69	finally fold get_link() into pick_link()
	ta-da - expand get_link() into the only caller.  As a side
benefit, we get rid of stashing the inode in nd->link_inode - it
was done only to carry that piece of information from pick_link()
to eventual get_link().  That's not the main benefit, though - the
control flow became considerably easier to reason about.

For what it's worth, the example above (/dev/stdin) becomes
	* path_init() has put us into root and returned "/dev/stdin"
	* link_path_walk() has eventually reached /dev and left
<LAST_NORM, "stdin"> in nd->last_type/nd->last
	* we call do_last(), which sees that we have LAST_NORM and calls
lookup_fast().  Let's assume that everything is in dcache; we get the
dentry of /dev/stdin and proceed to finish_lookup:, where we call step_into()
	* it's a symlink, we have LOOKUP_FOLLOW, so we decide to pick the
damn thing.  On the stack it goes and we get its body.  Which is
"/proc/self/fd/0", so we move to root and return "proc/self/fd/0".
	* do_last() sees non-NULL and returns it - whether it's an error
or a pathname to traverse, we hadn't reached something we'll be opening.
	* link_path_walk() is given that string, eventually leading us into
/proc/self/fd, with <LAST_NORM, "0"> left as the component to handle.
	* do_last() is called, and similar to the previous case we
eventually reach the call of step_into() with dentry of /proc/self/fd/0.
	* _now_ we can discard /dev/stdin from the stack (we'd been
using its body until now).  It's dropped (from step_into()) and we get
to look at what we'd been given.  A symlink to follow, so on the stack
it goes.   This time it's a procfs symlink and its ->get_link() method
moves us to the mount/dentry of our stdin.  And returns NULL.  So we
drop symlink from stack and return that NULL to caller.
	* that NULL is returned by step_into(), same as if we had just
moved to a non-symlink.
	* do_last() proceeds to open the damn thing.

Some low-hanging fruits become available: LAST_BIND can be removed and
the predicate controlling may_follow_link() we'd moved into pick_link()
in #18 can be made more straightforward:
26/69	LAST_BIND removal
	The only reason to keep it had just been eliminated - it was
needed to route the control flow through that weird last iteration
through the loop.  With that iteration gone...
27/69	invert the meaning of WALK_FOLLOW
28/69	pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT
	In #18 the checks specific to trailing symlinks got moved into
pick_link(), where they were made conditional upon LOOKUP_PARENT in
nd->flags.  That works, but it's more subtle than I would like it to
be - it depends upon the dynamic state (nd->flags) which gets changed
through the pathwalk and it's sensitive to exact locations where we
flip LOOKUP_PARENT.  Now we have a more robust way to do that - the
call chains that end up in pick_link() with LOOKUP_PARENT in nd->flags
are those that had WALK_TRAILING passed to the immediate caller of
pick_link() (step_into()).  So we can pass WALK_... down to pick_link()
and turn the check into explicit "if we are passed WALK_TRAILING,
it's a trailing symlink and we need to apply the checks in may_follow()".
	We could, in principle, reorder these two commits into the
very beginning of symlink series; that would make #18 slightly simpler
at the cost of (marginally) more boilerplate to carry through the
get_link() call moves.  Not sure if it's worth doing, though...

29/69	link_path_walk(): simplify stack handling
	Another cleanup that becomes possible is handling of the stack(s).
We use nd->stack to store two things: pinning down the symlinks we are
resolving and resuming the name traversal when a nested symlink is finished.
    
Currently, nd->depth is used to keep track of both.  It's 0 when we call
link_path_walk() for the first time (for the pathname itself) and 1 on
all subsequent calls (for trailing symlinks, if any).  That's fine,
as far as pinning symlinks goes - when handling a trailing symlink,
the string we are interpreting is the body of symlink pinned down in
nd->stack[0].  It's rather inconvenient with respect to handling nested
symlinks, though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case we need
to pick the string saved back when we started to interpret that nested
symlink and resume its traversal) or not (in which case we are done with
link_path_walk()).
    
Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when running out of
a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.
    
It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.
    
The things get simpler if we keep track of the second set's size explicitly
and always store it in nd->stack[0..depth-1].name.  We access the second set
only inside link_path_walk(), so its size can live in a local variable;
that way the check becomes trivial without the need of that kludge.

30/69	namei: have link_path_walk() maintain LOOKUP_PARENT
	just set it on the entry into link_path_walk() and clear when
we get to the last component.  Removes boilerplate from lookup_last()
and do_last().

part 4.  some mount traversal cleanups.

31/69	massage __follow_mount_rcu() a bit
	make it more similar to non-RCU counterpart
32/69	new helper: traverse_mounts()
	the guts of follow_managed() are very similar to
follow_down().  The calling conventions are different (follow_managed()
works with nameidata, follow_down() - with standalone struct path),
but the core loop is pretty much the same in both.  Turned that loop
into a common helper (traverse_mounts()) and since follow_managed()
becomes a very thin wrapper around it, expand follow_managed() at its
only call site (in handle_mounts()),

part 5.  do_last() untangling.

Control flow in do_last() is an atrocity, and liveness analysis in there
is rather painful.  What follows is a massage of that thing into (hopefully)
more straightforward shape; by the end of the series it's still unpleasant,
but at least easier to follow.

A major source of headache is treatment of "we'd already managed to
open it in ->atomic_open()" and "we'd just created that sucker" cases -
that's what gives complicated control flow graph.  As it is, we
have the following horror:
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      #   |           |
 |      *--------------------->		is it a symlink?
 \------+   |           |		finish_open:
        #   |           |
        +--/            |		finish_open_created:
        #               |
        +---------------/		opened:
        #
To make it even more unpleasant, there is quite a bit of similar,
but not entirely identical logics on parallel branches, some of it
buried in lookup_open() *and* atomic_open() called by it.  Keeping
track of that has been hard and that had lead to more than one bug.

33/69	atomic_open(): return the right dentry in FMODE_OPENED case
	As it is, several invariants do not hold in "we'd already opened
it in ->atomic_open()" case.  In particular, nd->path.dentry might be
pointing to the wrong place by the time we return to do_last() - on
that codepath we don't care anymore.  That both makes it harder to reason
about and serves as an obstacle to transformations that would untangle
that mess.  Fortunately, it's not hard to regularize.
34/69	atomic_open(): lift the call of may_open() into do_last()
	may_open() is called before vfs_open() in "hadn't opened in
->atomic_open()" case.  Rightfully so, since vfs_open() for e.g.
devices can have side effects.  In "opened in ->atomic_open()" case
we have to do it after the actual opening - the whole point is to
combine open with lookup and we only get the information needed for
may_open() after the combined lookup/open has happened.  That's
OK - no side effects are possible in that case.  However, we don't
have to keep that call of may_open() inside fs/namei.c:atomic_open();
as the matter of fact, lifting it into do_last() allows to simplify
life there...
35/69	do_last(): merge the may_open() calls
	... since now we have the "it's already opened" case in
do_last() rejoin the main path at earlier point.

At that point the horror graph from above has become
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---------------\		already opened in ->atomic_open()?
 |  |   |               #
 |  |   *---\           |		freshly created file?
 |  |   |   #           |
 |  \---+   |           |		finish_lookup:
 |      #   |           |
 |      *--------------------->		is it a symlink?
 \------+   |           |		finish_open:
        #   |           |
        +--/------------/		finish_open_created:
        #
36/69	do_last(): don't bother with keeping got_write in FMODE_OPENED case
	Another source of unpleasantness is an attempt to be clever and
keep track of write access status; the thing is, it doesn't really buy
us anything - we could as well drop it right after the lookup_open() and
only regain it for truncation, should such be needed.  Makes for much
simpler cleanups on failures and sets the things up for unification of
"already opened" and "new file" branches with the main path...
37/69	do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
	... which we do here.
38/69	do_last(): simplify the liveness analysis past finish_open_created
	It also makes possible to shrink the liveness intervals for local
variables.
39/69	do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
	Further unification of parallel branches.

At that point we get
        #
 /------*				ends with . or ..?
 #      |
 |      #
 |  /---*				found in dcache, no O_CREAT?
 |  |   |
 |  |   #				call lookup_open() here.
 |  |   |
 |  |   *---\				opened by ->atomic_open() or freshly creatd?	
 |  \---+   |				finish_lookup:
 |      #   |
 |      *--------------------->		is it a symlink?
 \------+   |				finish_open:
        |   |
        +--/				finish_open_created:
        #
with very little work done between finish_open: and finish_open_created:,
as well as on any of the side branches.  Moreover, we have a pretty clear
separation: most of the work on _opening_ is after finish_open_created
(some of it - conditional), while the work on lookups and creation is all
before that point.  Even better, most of the local variables are used
either only before or only after that cutoff point.

40/69	split the lookup-related parts of do_last() into a separate helper
	... which allows to separate the lookup-related parts from
open-related ones.

	I'm not saying I'm entirely happy with the resulting state of
do_last() clusterfuck, but it got a lot easier to follow and reason
about.  There are more cleanups possible (and needed) in there, though -
there will be followups.

part 6: ".." handling

	The main problem with .. traversal is that it has open-coded mount
crossing in the end.  That mount crossing used to be identical to that done
after the normal name components, but it got overlooked in several series
(most recently - openat2, prior to that - mount traps and automounts) and
now we have an out-of-sync variant (two of them, actually - RCU and non-RCU
cases) festering there and breeding hard-to-spot bugs.  The most recent
example was when openat2 got extra checks added to the normal mount crossing;
it added the same to RCU case of .. handling, but missed the non-RCU one.
Nobody noticed during the many rounds of review - me, Christoph and Linus
included.

	Another issue is that we are heavier on the locking than we need to
during the rootwards mount traversal there; traversing mounts in other
direction (from mountpoint to mounted) gets by without grabbing mount_lock
exclusive (or dirtying its cacheline in any way, for that matter), even
in non-RCU case.  The same should be the case for mounted-to-mountpoint
transitions - except for the case of very rare races with mount --move
and friends, we should be fine with just the seqcount checks there.

	The following is how .. handling behaves on just about any post-v7
Unix:

	while true
		if the caller is chrooted into directory      // rare
(A)			parent = directory
			break
		if directory is absolute root		      // rare
(B)			parent = directory
			break
		if directory is mounted on top of mountpoint // 2nd most common
(C)			directory = mountpoint
		else					     // the most common
(D)			parent = the parent of directory (within its fs)
			break
	while something is mounted on parent		     // unusual setup
(E)		parent = whatever overmounts it
	return parent

There are 3 paths that execution commonly takes, and they cover almost everything
that occurs in practice:
	1) [A]		We are in / and we stay there
	2) [C,D]	We are in root of mounted filesystem,
			we step into the underlying mountpoint,
			then into the parent of mountpoint.
	3) [D]		The place we are in is not a root of mounted filesystem,
			we step into its parent.
These cases are obvious.  However, other execution paths are possible;
in fact, the only constraint is that if we leave the first loop via (A) or (B),
the body of the second loop (i.e. going from mountpoint to mounted) will be
executed at least as many times as (C) (going from mounted to mountpoint)
had been.

A closer look at the predicates in the above shows that "is absolute root"
is actually "is root of a mount and that mount is not attached to anything"
while "is mounted on top of mountpoint" is "is a root of a mount and the
mount is attached to mountpoint".  Which suggest the following transformation:

choose_mountpoint(mount, &ancestor)
	while mount is attached to something
		d = mountpoint(mount)
		mount = parent(mount)
		if the caller is chrooted into <mount, d>
			break
		if d is a root of mount
			ancestor = <mount, d>
			return true
	return false

handle_dotdot(directory)
	if unlikely(the caller is chrooted into directory)
		goto in_root
	if unlikely(directory is a root of some mount)
		if !choose_mountpoint(mount, &ancestor)
			goto in_root
		directory = ancestor
	parent = the parent of directory (within its fs)
	while unlikely(something is mounted on parent)
		parent = whatever overmounts it
	return parent
in_root:
	parent = directory
	while unlikely(something is mounted on parent)
		parent = whatever overmounts it
	return parent

In this form we have mounted-to-mountpoint mount traversals clearly
separated.  Moreover, required updates of pathwalk context (nameidata)
can be packed into a call of the same primitive (step_into()) we use
for moves into normal components, including the forward mount
traversals.

NO_XDEV and BENEATH checks (added by openat2 series) fit into that
just fine - NO_XDEV at "directory = ancestor" part, BENEATH - at
in_root.  Since the forward mount traversal is done by step_into(),
the regular NO_XDEV checks in there take care of the rest.

The following series massages follow_dotdot/follow_dotdot_rcu() to
that form and does choose_mountpoint() implementation with saner
locking than what we do in mainline now - for RCU case we only need
to check mount_lock seqcount once (in the caller), for non-RCU we
can use a loop similar to what lookup_mnt() does for forward traversals.

41/69	path_connected(): pass mount and dentry separately
42/69	path_parent_directory(): leave changing path->dentry to callers
43/69	follow_dotdot(): expand the call of path_parent_directory()
	currently switching to parent is done inside path_parent_directory(),
called from the loop in follow_dotdot().  These 3 commits lift that into
the loop in follow_dotdot() itself...
44/69	follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop
45/69	follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop
	... and out of the loop(s) (both on the RCU and non-RCU sides)

Next part is to replace the second halves (crossing into parent and whatever
might be overmounting it) to step_into().

46/69	move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into()
	pure move - get them into the right place
47/69	handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into()
	convert to returning ERR_PTR()/NULL instead of -E.../0 - that's what
	step_into() returns and the callers are actually happier that way.
48/69	follow_dotdot{,_rcu}(): switch to use of step_into()
	... and switch both to it.  Now the RCU and non-RCU variants of
the loop that used to do forward mount traversal on .. are replaced with
step_into() calls...
49/69	lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu
	... which can be consolidated.  We are done with the forward traversal
parts.

50/69	follow_dotdot{,_rcu}(): massage loops
51/69	follow_dotdot_rcu(): be lazy about changing nd->path
52/69	follow_dotdot(): be lazy about changing nd->path
	get the rootwards traversal into shape described above

53/69	helper for mount rootwards traversal
54/69	non-RCU analogue of the previous commit
	... and introduce choose_mountpoint{,_rcu}(), switching both RCU and
non-RCU variants to it.

55/69	fs/namei.c: kill follow_mount()
	detritus removal - the only remaining caller (path_pts()) ought to
use follow_down() anyway.

	That's about where the previous version of patchset used to end.

part 7: pick_link() and friends.

56/69	pick_link(): more straightforward handling of allocation failures
	There's a rather annoying wart in pick_link() handling of stack
allocation failures.  In RCU mode we try to do GFP_ATOMIC allocation; if
that fails, we need to unlazy and retry with GFP_NORMAL.  The problem is
that we need to unlazy both the stuff in nameidata *and* the link we are
about to push onto stack.  We need to do the link first (after successful
unlazy we'll have rcu_read_lock() already dropped, so it would be too late).
The question is what to do if trying to legitimize link fails.  We might
need to drop references, so that can't happen until we drop rcu_read_lock().
OTOH, we have no place to stash it until the time it's normally done on
error.  Result was microoptimized and confusing as hell - explaining the
reasons why we had to do anything special, let alone why this and not
something else was highly non-obvious.  Turns out that there's a fairly
simple (and easily explained) solution, avoiding that mess.

57/69	pick_link(): pass it struct path already with normal refcounting rules
	Some of the struct path instances share the mount reference with
nd->path.  That's done to avoid grabbing/dropping mount references all the
time; unfortunately, it had been the source of quite a few bugs, with
those beasts confused for the normal ones (and vice versa) in some failure
exit.  A lot of pathwalk-related code used to be exposed to those, mostly
due to unfortunate calling conventions.  After the patches earlier in the
series that exposure is much more limited - step_into() and handle_mounts()
(where they are really needed - they'd been introduced for a good reason)
is pretty much all that is left.  pick_link() is slightly exposed - it
gets link in such form, but immediately converts to the regular refcounting
rules.  Doing that conversion in the caller (step_link()) makes for simpler
logics.
58/69	fold path_to_nameidata() into its only remaining caller
	Another bit of exposure gone - path_to_nameidata() used to be
a primitive for working with those beasts and now it is called only in
step_into().  Expanding it there makes the things easier to read, actually.

59/69	pick_link(): take reserving space on stack into a new helper
60/69	reserve_stack(): switch to __nd_alloc_stack()
61/69	__nd_alloc_stack(): make it return bool
	Take the stack allocation out of pick_link() and clean it up.

That's not the end of it for pick_link() - there are other pieces that would be
better off in separate helpers; that's for the latter, though.

part 8: more untangling of do_last()

	While we'd already separated the lookup-related parts of do_last() into
a separate function (close analogue of lookup_last()), the top-level loop is
still calling do_last(), which starts with calling open_last_lookups().  If that
returns non-NULL, we leave do_last() immediately and go through the next iteration
of the top-level loop.  Otherwise we proceed to do the work on actual opening
and whether it succeeds or not, we are done with pathwalk at that point - we know
it's not a trailing symlink for us to follow.  More natural way to express that
is to have open_last_lookups() called by the loop, with the rest of do_last()
done after that loop has ended.  Takes a bit or prep work, though:
62/69	link_path_walk(): sample parent's i_uid and i_mode for the last component
	I tried to get that done by may_create_in_sticky(); however, we really need
the parent we'd walked through, _NOT_ whatever the file is in at the time of check.
The whole thing is about attacker guessing a name of something like a temp file
uncautious victim is about to open in e.g. /tmp with bare O_CREAT (no O_EXCL).
Attacker creates a file of his own there, lets the victim to open it and to start
writing to attacker-owned "temporary" file.  Then attacker modifies the content
and gets the victim screwed.  The check is predicated upon the directory being
a sticky one; fair enough, but the same property that makes the attack possible
allows the attacker to mkdir a non-sticky subdirectory there and move the
file to it.  If they get it after the victim has looked the damn thing up (still
in the old place), but before it does may_create_in_sticky(), the parent of
file at the moment when may_create_in_sticky() gets called is *NOT* a sticky
directory at all.  In other words, we really must sample mode/uid before
doing the lookup.
63/69	take post-lookup part of do_last() out of loop
	That turns the loop much more similar to path_lookupat() one.  And
unlike the full do_last(), open_last_lookups() is fairly similar to
lookup_last()/walk_component().

	Next come some cleanups of open_last_lookups():
64/69	open_last_lookups(): consolidate fsnotify_create() calls
	straightforward; the only thing to keep in mind is that we need it
done before unlocking the parent.
65/69	open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
	... especially with big comment to the effect that this is all
the call is going to do.
66/69	open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
	we move O_EXCL|O_CREAT check from the end of open_last_lookups()
(where we'd already decided we are not looking at symlink to follow) out
into do_open() (== what's left of do_last()).  That's one of the cases when
analysis is much longer than the patch itself; see commit message for
details.
67/69	open_last_lookups(): move complete_walk() into do_open()
	similar move for complete_walk()
68/69	atomic_open(): no need to pass struct open_flags anymore
	unused argument I'd missed back in 2016; should've removed it back
then.
69/69	lookup_open(): don't bother with fallbacks to lookup+create
	That's one of the payoffs we get from #66/69 - the reasons why
"we might not have permissions on parent/write access to the entire fs"
logics had been so complicated (sometimes we just trim O_CREAT and call
->atomic_open(), sometimes we forcibly fall back to lookup+create path)
no longer apply.  O_TRUNC side effects wouldn't have been a problem
for quite a while now (we call handle_truncate() in the end anyway, so
trimming O_TRUNC for ->atomic_open() would be fine).  O_CREAT|O_EXCL,
OTOH, _did_ require the fallback - trimming O_CREAT would not suffice,
since that would've disabled the checks in ->atomic_open() with nothing
downstream to catch it and fail with EEXIST.  Well, now we do have
something downstream - the combination of FMODE_CREATED not set,
while O_CREAT|O_EXCL is present will trigger just that.

	In addition to the simpler logics in lookup_open(), the last
commit opens the way to something more interesting, but that will have
to wait for the next cycle - I do have some stuff in that direction,
but it changes the ->atomic_open() calling conventions and it's too
late in this cycle for that.  I'll post a separate RFC later this
week.  Basically, switching ->atomic_open() to returning dentry with
the same interpretation of result as for ->lookup() would make it possible
to get rid both of the weird DENTRY_NOT_SET thing and of the irregularity
in refcounting we have when struct file is used as a vehicle for passing
the lookup result to the caller.  No extra arguments are needed,
and return d_in_lookup(dentry) ? foo_lookup(dir, dentry, flags) : NULL;
becomes a legitimate instance, open-coded in lookup_open() side by
side with ->atomic_open() call, with codepaths converging immediately
after that.  Anyway, that's a separate story and I'm still not entirely
sure about some of the details for calling conventions.  Definitely
not in this series...

^ permalink raw reply	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (12 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 14/69] step_into() callers: dismiss the symlink earlier Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-14  0:32         ` Linus Torvalds
  2020-03-13 23:53       ` [RFC][PATCH v4 16/69] fold handle_mounts() into step_into() Al Viro
                         ` (57 subsequent siblings)
  71 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
Allows to switch handle_lookup_down() to of step_into(), getting
all follow_managed() and step_into() calls paired.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3097edcb4a1a..e47b376cf442 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1827,7 +1827,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	return 1;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2};
+enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1839,7 +1839,8 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 			    int flags, struct inode *inode, unsigned seq)
 {
 	if (likely(!d_is_symlink(path->dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
+	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
+	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(path, nd);
 		nd->inode = inode;
@@ -2363,10 +2364,7 @@ static int handle_lookup_down(struct nameidata *nd)
 	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
 	if (unlikely(err < 0))
 		return err;
-	path_to_nameidata(&path, nd);
-	nd->inode = inode;
-	nd->seq = seq;
-	return 0;
+	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 16/69] fold handle_mounts() into step_into()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (13 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 17/69] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
                         ` (56 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The following is true:
	* calls of handle_mounts() and step_into() are always
paired in sequences like
	err = handle_mounts(nd, dentry, &path, &inode, &seq);
	if (unlikely(err < 0))
		return err;
	err = step_into(nd, &path, flags, inode, seq);
	* in all such sequences path is uninitialized before and
unused after this pair of calls
	* in all such sequences inode and seq are unused afterwards.

So the call of handle_mounts() can be shifted inside step_into(),
turning 'path' into a local variable in the combined function.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 41 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e47b376cf442..8f0dea52e9dc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1835,29 +1835,33 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static inline int step_into(struct nameidata *nd, struct path *path,
-			    int flags, struct inode *inode, unsigned seq)
+static int step_into(struct nameidata *nd, int flags,
+		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
-	if (likely(!d_is_symlink(path->dentry)) ||
+	struct path path;
+	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+
+	if (err < 0)
+		return err;
+	if (likely(!d_is_symlink(path.dentry)) ||
 	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
 	   flags & WALK_NOFOLLOW) {
 		/* not a symlink or should not follow */
-		path_to_nameidata(path, nd);
+		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
 		return 0;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
-		if (read_seqcount_retry(&path->dentry->d_seq, seq))
+		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return -ECHILD;
 	}
-	return pick_link(nd, path, inode, seq);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static int walk_component(struct nameidata *nd, int flags)
 {
-	struct path path;
 	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
@@ -1883,11 +1887,7 @@ static int walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-
-	err = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-			return err;
-	return step_into(nd, &path, flags, inode, seq);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2354,17 +2354,10 @@ static inline int lookup_last(struct nameidata *nd)
 
 static int handle_lookup_down(struct nameidata *nd)
 {
-	struct path path;
-	struct inode *inode = nd->inode;
-	unsigned seq = nd->seq;
-	int err;
-
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	err = handle_mounts(nd, nd->path.dentry, &path, &inode, &seq);
-	if (unlikely(err < 0))
-		return err;
-	return step_into(nd, &path, WALK_NOFOLLOW, inode, seq);
+	return step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq);
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3281,7 +3274,6 @@ static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path path;
 	struct dentry *dentry;
 	int error;
 
@@ -3384,10 +3376,7 @@ static int do_last(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	error = handle_mounts(nd, dentry, &path, &inode, &seq);
-	if (unlikely(error < 0))
-		return error;
-	error = step_into(nd, &path, 0, inode, seq);
+	error = step_into(nd, 0, dentry, inode, seq);
 	if (unlikely(error))
 		return error;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 17/69] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (14 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 16/69] fold handle_mounts() into step_into() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 18/69] expand the only remaining call of path_lookup_conditional() Al Viro
                         ` (55 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs/dev-ioctl.c |  6 ++--
 fs/internal.h         |  1 -
 fs/namei.c            | 89 ++++-----------------------------------------------
 fs/namespace.c        |  4 +--
 include/linux/namei.h |  2 +-
 5 files changed, 12 insertions(+), 90 deletions(-)

diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
index a3cdb0036c5d..f3a0f412b43b 100644
--- a/fs/autofs/dev-ioctl.c
+++ b/fs/autofs/dev-ioctl.c
@@ -186,7 +186,7 @@ static int find_autofs_mount(const char *pathname,
 	struct path path;
 	int err;
 
-	err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0);
+	err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path);
 	if (err)
 		return err;
 	err = -ENOENT;
@@ -519,8 +519,8 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
 
 	if (!fp || param->ioctlfd == -1) {
 		if (autofs_type_any(type))
-			err = kern_path_mountpoint(AT_FDCWD,
-						   name, &path, LOOKUP_FOLLOW);
+			err = kern_path(name, LOOKUP_FOLLOW | LOOKUP_MOUNTPOINT,
+					&path);
 		else
 			err = find_autofs_mount(name, &path,
 						test_by_type, &type);
diff --git a/fs/internal.h b/fs/internal.h
index f3f280b952a3..b108a8eb75ca 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -61,7 +61,6 @@ extern int finish_clean_context(struct fs_context *fc);
  */
 extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
 			   struct path *path, struct path *root);
-extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
 long do_mknodat(int dfd, const char __user *filename, umode_t mode,
diff --git a/fs/namei.c b/fs/namei.c
index 8f0dea52e9dc..193e076275a3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2382,6 +2382,10 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 	if (!err && nd->flags & LOOKUP_DIRECTORY)
 		if (!d_can_lookup(nd->path.dentry))
 			err = -ENOTDIR;
+	if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
+		err = handle_lookup_down(nd);
+		nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
+	}
 	if (!err) {
 		*path = nd->path;
 		nd->path.mnt = NULL;
@@ -2410,7 +2414,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 		retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);
 
 	if (likely(!retval))
-		audit_inode(name, path->dentry, 0);
+		audit_inode(name, path->dentry,
+			    flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
 	restore_nameidata();
 	putname(name);
 	return retval;
@@ -2688,88 +2693,6 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
 }
 EXPORT_SYMBOL(user_path_at_empty);
 
-/**
- * path_mountpoint - look up a path to be umounted
- * @nd:		lookup context
- * @flags:	lookup flags
- * @path:	pointer to container for result
- *
- * Look up the given name, but don't attempt to revalidate the last component.
- * Returns 0 and "path" will be valid on success; Returns error otherwise.
- */
-static int
-path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
-{
-	const char *s = path_init(nd, flags);
-	int err;
-
-	while (!(err = link_path_walk(s, nd)) &&
-		(err = lookup_last(nd)) > 0) {
-		s = trailing_symlink(nd);
-	}
-	if (!err && (nd->flags & LOOKUP_RCU))
-		err = unlazy_walk(nd);
-	if (!err)
-		err = handle_lookup_down(nd);
-	if (!err) {
-		*path = nd->path;
-		nd->path.mnt = NULL;
-		nd->path.dentry = NULL;
-	}
-	terminate_walk(nd);
-	return err;
-}
-
-static int
-filename_mountpoint(int dfd, struct filename *name, struct path *path,
-			unsigned int flags)
-{
-	struct nameidata nd;
-	int error;
-	if (IS_ERR(name))
-		return PTR_ERR(name);
-	set_nameidata(&nd, dfd, name);
-	error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
-	if (unlikely(error == -ECHILD))
-		error = path_mountpoint(&nd, flags, path);
-	if (unlikely(error == -ESTALE))
-		error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
-	if (likely(!error))
-		audit_inode(name, path->dentry, AUDIT_INODE_NOEVAL);
-	restore_nameidata();
-	putname(name);
-	return error;
-}
-
-/**
- * user_path_mountpoint_at - lookup a path from userland in order to umount it
- * @dfd:	directory file descriptor
- * @name:	pathname from userland
- * @flags:	lookup flags
- * @path:	pointer to container to hold result
- *
- * A umount is a special case for path walking. We're not actually interested
- * in the inode in this situation, and ESTALE errors can be a problem. We
- * simply want track down the dentry and vfsmount attached at the mountpoint
- * and avoid revalidating the last component.
- *
- * Returns 0 and populates "path" on success.
- */
-int
-user_path_mountpoint_at(int dfd, const char __user *name, unsigned int flags,
-			struct path *path)
-{
-	return filename_mountpoint(dfd, getname(name), path, flags);
-}
-
-int
-kern_path_mountpoint(int dfd, const char *name, struct path *path,
-			unsigned int flags)
-{
-	return filename_mountpoint(dfd, getname_kernel(name), path, flags);
-}
-EXPORT_SYMBOL(kern_path_mountpoint);
-
 int __check_sticky(struct inode *dir, struct inode *inode)
 {
 	kuid_t fsuid = current_fsuid();
diff --git a/fs/namespace.c b/fs/namespace.c
index 743980380a8f..a28e4db075ed 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1669,7 +1669,7 @@ int ksys_umount(char __user *name, int flags)
 	struct path path;
 	struct mount *mnt;
 	int retval;
-	int lookup_flags = 0;
+	int lookup_flags = LOOKUP_MOUNTPOINT;
 
 	if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
 		return -EINVAL;
@@ -1680,7 +1680,7 @@ int ksys_umount(char __user *name, int flags)
 	if (!(flags & UMOUNT_NOFOLLOW))
 		lookup_flags |= LOOKUP_FOLLOW;
 
-	retval = user_path_mountpoint_at(AT_FDCWD, name, lookup_flags, &path);
+	retval = user_path_at(AT_FDCWD, name, lookup_flags, &path);
 	if (retval)
 		goto out;
 	mnt = real_mount(path.mnt);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 0dd980d7318f..d9576a051808 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -23,6 +23,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_AUTOMOUNT	0x0004  /* force terminal automount */
 #define LOOKUP_EMPTY		0x4000	/* accept empty path [user_... only] */
 #define LOOKUP_DOWN		0x8000	/* follow mounts in the starting point */
+#define LOOKUP_MOUNTPOINT	0x0080	/* follow mounts in the end */
 
 #define LOOKUP_REVAL		0x0020	/* tell ->d_revalidate() to trust no cache */
 #define LOOKUP_RCU		0x0040	/* RCU pathwalk mode; semi-internal */
@@ -64,7 +65,6 @@ extern struct dentry *kern_path_create(int, const char *, struct path *, unsigne
 extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
 extern void done_path_create(struct path *, struct dentry *);
 extern struct dentry *kern_path_locked(const char *, struct path *);
-extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int);
 
 extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
 extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 18/69] expand the only remaining call of path_lookup_conditional()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (15 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 17/69] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 19/69] merging pick_link() with get_link(), part 1 Al Viro
                         ` (54 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 193e076275a3..db55de1b580b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -858,13 +858,6 @@ static int set_root(struct nameidata *nd)
 	return 0;
 }
 
-static void path_put_conditional(struct path *path, struct nameidata *nd)
-{
-	dput(path->dentry);
-	if (path->mnt != nd->path.mnt)
-		mntput(path->mnt);
-}
-
 static inline void path_to_nameidata(const struct path *path,
 					struct nameidata *nd)
 {
@@ -1312,8 +1305,11 @@ static int follow_managed(struct path *path, struct nameidata *nd)
 		ret = 1;
 	if (ret > 0 && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0))
-		path_put_conditional(path, nd);
+	if (unlikely(ret < 0)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	}
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 19/69] merging pick_link() with get_link(), part 1
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (16 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 18/69] expand the only remaining call of path_lookup_conditional() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2 Al Viro
                         ` (53 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
the call of get_link() (nothing _currently_ uses them in there).
That allows to moved the call of may_follow_link() into get_link()
as well, since now the presence of LOOKUP_PARENT distinguishes
the callers from each other (link_path_walk() has it, trailing_symlink()
doesn't).

Preparations for folding trailing_symlink() into callers (lookup_last()
and do_last()) and changing the calling conventions of those.  Next
stage after that will have get_link() call migrate into walk_component(),
then - into step_into().  It's tricky enough to warrant doing that
in stages, unfortunately...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db55de1b580b..cd91fc970830 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1115,6 +1115,12 @@ const char *get_link(struct nameidata *nd)
 	int error;
 	const char *res;
 
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
 	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
 		return ERR_PTR(-ELOOP);
 
@@ -2329,13 +2335,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static const char *trailing_symlink(struct nameidata *nd)
 {
-	const char *s;
-	int error = may_follow_link(nd);
-	if (unlikely(error))
-		return ERR_PTR(error);
+	const char *s = get_link(nd);
 	nd->flags |= LOOKUP_PARENT;
 	nd->stack[0].name = NULL;
-	s = get_link(nd);
 	return s ? s : "";
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (17 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 19/69] merging pick_link() with get_link(), part 1 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-14  0:40         ` Linus Torvalds
  2020-03-13 23:53       ` [RFC][PATCH v4 21/69] merging pick_link() with get_link(), part 3 Al Viro
                         ` (52 subsequent siblings)
  71 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Fold trailing_symlink() into lookup_last() and do_last(), change
the calling conventions of those two.  Rules change:
	success, we are done => NULL instead of 0
	error	=> ERR_PTR(-E...) instead of -E...
	got a symlink to follow => return the path to be followed instead of 1

The loops calling those (in path_lookupat() and path_openat()) adjusted.

A subtle change of control flow here: originally a pure-jump trailing
symlink ("/" or procfs one) would've passed through the upper level
loop once more, with "" for path to traverse.  That would've brought
us back to the lookup_last/do_last entry and we would've hit LAST_BIND
case (LAST_BIND left from get_link() called by trailing_symlink())
and pretty much skip to the point right after where we'd left the
sucker back when we picked that trailing symlink.

Now we don't bother with that extra pass through the upper level
loop - if get_link() says "I've just done a pure jump, nothing
else to do", we just treat that as non-symlink case.

Boilerplate added on that step will go away shortly - it'll migrate
into walk_component() and then to step_into(), collapsing into the
change of calling conventions for those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 68 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cd91fc970830..c4678f81113a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2333,21 +2333,26 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	return s;
 }
 
-static const char *trailing_symlink(struct nameidata *nd)
-{
-	const char *s = get_link(nd);
-	nd->flags |= LOOKUP_PARENT;
-	nd->stack[0].name = NULL;
-	return s ? s : "";
-}
-
-static inline int lookup_last(struct nameidata *nd)
+static inline const char *lookup_last(struct nameidata *nd)
 {
+	int err;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	return walk_component(nd, 0);
+	err = walk_component(nd, 0);
+	if (unlikely(err)) {
+		const char *s;
+		if (err < 0)
+			return PTR_ERR(err);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
+	return NULL;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
@@ -2370,10 +2375,9 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 			s = ERR_PTR(err);
 	}
 
-	while (!(err = link_path_walk(s, nd))
-		&& ((err = lookup_last(nd)) > 0)) {
-		s = trailing_symlink(nd);
-	}
+	while (!(err = link_path_walk(s, nd)) &&
+	       (s = lookup_last(nd)) != NULL)
+		;
 	if (!err)
 		err = complete_walk(nd);
 
@@ -3183,7 +3187,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 /*
  * Handle the last step of open()
  */
-static int do_last(struct nameidata *nd,
+static const char *do_last(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
@@ -3206,7 +3210,7 @@ static int do_last(struct nameidata *nd,
 			put_link(nd);
 		error = handle_dots(nd, nd->last_type);
 		if (unlikely(error))
-			return error;
+			return ERR_PTR(error);
 		goto finish_open;
 	}
 
@@ -3216,7 +3220,7 @@ static int do_last(struct nameidata *nd,
 		/* we _can_ be in RCU mode here */
 		dentry = lookup_fast(nd, &inode, &seq);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 		if (likely(dentry))
 			goto finish_lookup;
 
@@ -3231,12 +3235,12 @@ static int do_last(struct nameidata *nd,
 		 */
 		error = complete_walk(nd);
 		if (error)
-			return error;
+			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))
-			return -EISDIR;
+			return ERR_PTR(-EISDIR);
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
@@ -3298,18 +3302,28 @@ static int do_last(struct nameidata *nd,
 	if (nd->depth)
 		put_link(nd);
 	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error))
-		return error;
+	if (unlikely(error)) {
+		const char *s;
+		if (error < 0)
+			return ERR_PTR(error);
+		s = get_link(nd);
+		if (s) {
+			nd->flags |= LOOKUP_PARENT;
+			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+			nd->stack[0].name = NULL;
+			return s;
+		}
+	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
 		audit_inode(nd->name, nd->path.dentry, 0);
-		return -EEXIST;
+		return ERR_PTR(-EEXIST);
 	}
 finish_open:
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	error = complete_walk(nd);
 	if (error)
-		return error;
+		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		error = -EISDIR;
@@ -3351,7 +3365,7 @@ static int do_last(struct nameidata *nd,
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	return error;
+	return ERR_PTR(error);
 }
 
 struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
@@ -3454,10 +3468,8 @@ static struct file *path_openat(struct nameidata *nd,
 	} else {
 		const char *s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
-			(error = do_last(nd, file, op)) > 0) {
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			s = trailing_symlink(nd);
-		}
+		       (s = do_last(nd, file, op)) != NULL)
+			;
 		terminate_walk(nd);
 	}
 	if (likely(!error)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 21/69] merging pick_link() with get_link(), part 3
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (18 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4 Al Viro
                         ` (51 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c4678f81113a..edb37272f2cc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1154,7 +1154,9 @@ const char *get_link(struct nameidata *nd)
 		} else {
 			res = get(dentry, inode, &last->done);
 		}
-		if (IS_ERR_OR_NULL(res))
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
 			return res;
 	}
 	if (*res == '/') {
@@ -1164,9 +1166,11 @@ const char *get_link(struct nameidata *nd)
 		while (unlikely(*++res == '/'))
 			;
 	}
-	if (!*res)
-		res = NULL;
-	return res;
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 /*
@@ -2211,11 +2215,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 			if (IS_ERR(s))
 				return PTR_ERR(s);
-			err = 0;
-			if (unlikely(!s)) {
-				/* jumped */
-				put_link(nd);
-			} else {
+			if (likely(s)) {
 				nd->stack[nd->depth - 1].name = name;
 				name = s;
 				continue;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (19 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 21/69] merging pick_link() with get_link(), part 3 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5 Al Viro
                         ` (50 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Move the call of get_link() into walk_component().  Change the
calling conventions for walk_component() to returning the link
body to follow (if any).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 59 ++++++++++++++++++++++++++---------------------------------
 1 file changed, 26 insertions(+), 33 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index edb37272f2cc..a05de97e6893 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1866,7 +1866,7 @@ static int step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq);
 }
 
-static int walk_component(struct nameidata *nd, int flags)
+static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1881,19 +1881,24 @@ static int walk_component(struct nameidata *nd, int flags)
 		if (!(flags & WALK_MORE) && nd->depth)
 			put_link(nd);
 		err = handle_dots(nd, nd->last_type);
-		return err;
+		return ERR_PTR(err);
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
 	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
+		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
 		dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
 		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
+			return ERR_CAST(dentry);
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	return step_into(nd, flags, dentry, inode, seq);
+	err = step_into(nd, flags, dentry, inode, seq);
+	if (!err)
+		return NULL;
+	if (err > 0)
+		return get_link(nd);
+	return ERR_PTR(err);
 }
 
 /*
@@ -2145,6 +2150,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
+		const char *link;
 		u64 hash_len;
 		int type;
 
@@ -2202,24 +2208,18 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			err = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, WALK_FOLLOW);
 		} else {
 			/* not the last component */
-			err = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
 		}
-		if (err < 0)
-			return err;
-
-		if (err) {
-			const char *s = get_link(nd);
-
-			if (IS_ERR(s))
-				return PTR_ERR(s);
-			if (likely(s)) {
-				nd->stack[nd->depth - 1].name = name;
-				name = s;
-				continue;
-			}
+		if (unlikely(link)) {
+			if (IS_ERR(link))
+				return PTR_ERR(link);
+			/* a symlink to follow */
+			nd->stack[nd->depth - 1].name = name;
+			name = link;
+			continue;
 		}
 		if (unlikely(!d_can_lookup(nd->path.dentry))) {
 			if (nd->flags & LOOKUP_RCU) {
@@ -2335,24 +2335,17 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static inline const char *lookup_last(struct nameidata *nd)
 {
-	int err;
+	const char *link;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	err = walk_component(nd, 0);
-	if (unlikely(err)) {
-		const char *s;
-		if (err < 0)
-			return PTR_ERR(err);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	link = walk_component(nd, 0);
+	if (link) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->stack[0].name = NULL;
 	}
-	return NULL;
+	return link;
 }
 
 static int handle_lookup_down(struct nameidata *nd)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (20 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 24/69] merging pick_link() with get_link(), part 6 Al Viro
                         ` (49 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move get_link() call into step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 43 ++++++++++++++++++-------------------------
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a05de97e6893..04a76f6dd3fe 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1841,14 +1841,14 @@ enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
  * so we keep a cache of "no, this doesn't need follow_link"
  * for the common case.
  */
-static int step_into(struct nameidata *nd, int flags,
+static const char *step_into(struct nameidata *nd, int flags,
 		     struct dentry *dentry, struct inode *inode, unsigned seq)
 {
 	struct path path;
 	int err = handle_mounts(nd, dentry, &path, &inode, &seq);
 
 	if (err < 0)
-		return err;
+		return ERR_PTR(err);
 	if (likely(!d_is_symlink(path.dentry)) ||
 	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
 	   flags & WALK_NOFOLLOW) {
@@ -1856,14 +1856,17 @@ static int step_into(struct nameidata *nd, int flags,
 		path_to_nameidata(&path, nd);
 		nd->inode = inode;
 		nd->seq = seq;
-		return 0;
+		return NULL;
 	}
 	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 	}
-	return pick_link(nd, &path, inode, seq);
+	err = pick_link(nd, &path, inode, seq);
+	if (err > 0)
+		return get_link(nd);
+	return ERR_PTR(err);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
@@ -1893,12 +1896,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	}
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
-	err = step_into(nd, flags, dentry, inode, seq);
-	if (!err)
-		return NULL;
-	if (err > 0)
-		return get_link(nd);
-	return ERR_PTR(err);
+	return step_into(nd, flags, dentry, inode, seq);
 }
 
 /*
@@ -2352,8 +2350,8 @@ static int handle_lookup_down(struct nameidata *nd)
 {
 	if (!(nd->flags & LOOKUP_RCU))
 		dget(nd->path.dentry);
-	return step_into(nd, WALK_NOFOLLOW,
-			nd->path.dentry, nd->inode, nd->seq);
+	return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
+			nd->path.dentry, nd->inode, nd->seq));
 }
 
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -3193,6 +3191,7 @@ static const char *do_last(struct nameidata *nd,
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
+	const char *res;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3294,18 +3293,12 @@ static const char *do_last(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	error = step_into(nd, 0, dentry, inode, seq);
-	if (unlikely(error)) {
-		const char *s;
-		if (error < 0)
-			return ERR_PTR(error);
-		s = get_link(nd);
-		if (s) {
-			nd->flags |= LOOKUP_PARENT;
-			nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-			nd->stack[0].name = NULL;
-			return s;
-		}
+	res = step_into(nd, 0, dentry, inode, seq);
+	if (unlikely(res)) {
+		nd->flags |= LOOKUP_PARENT;
+		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+		nd->stack[0].name = NULL;
+		return res;
 	}
 
 	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 24/69] merging pick_link() with get_link(), part 6
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (21 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 25/69] finally fold get_link() into pick_link() Al Viro
                         ` (48 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

move the only remaining call of get_link() into pick_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 04a76f6dd3fe..969b99a88631 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1793,14 +1793,14 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static int pick_link(struct nameidata *nd, struct path *link,
+static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
 	int error;
 	struct saved *last;
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
-		return -ELOOP;
+		return ERR_PTR(-ELOOP);
 	}
 	if (!(nd->flags & LOOKUP_RCU)) {
 		if (link->mnt == nd->path.mnt)
@@ -1821,7 +1821,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 		}
 		if (error) {
 			path_put(link);
-			return error;
+			return ERR_PTR(error);
 		}
 	}
 
@@ -1830,7 +1830,7 @@ static int pick_link(struct nameidata *nd, struct path *link,
 	clear_delayed_call(&last->done);
 	nd->link_inode = inode;
 	last->seq = seq;
-	return 1;
+	return get_link(nd);
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
@@ -1863,10 +1863,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
 	}
-	err = pick_link(nd, &path, inode, seq);
-	if (err > 0)
-		return get_link(nd);
-	return ERR_PTR(err);
+	return pick_link(nd, &path, inode, seq);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 25/69] finally fold get_link() into pick_link()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (22 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 24/69] merging pick_link() with get_link(), part 6 Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
                         ` (47 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

kill nd->link_inode, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 135 ++++++++++++++++++++++++++++---------------------------------
 1 file changed, 61 insertions(+), 74 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 969b99a88631..9b08c64397c8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -503,7 +503,6 @@ struct nameidata {
 	} *stack, internal[EMBEDDED_LEVELS];
 	struct filename	*name;
 	struct nameidata *saved;
-	struct inode	*link_inode;
 	unsigned	root_seq;
 	int		dfd;
 } __randomize_layout;
@@ -962,9 +961,8 @@ int sysctl_protected_regular __read_mostly;
  *
  * Returns 0 if following the symlink is allowed, -ve on error.
  */
-static inline int may_follow_link(struct nameidata *nd)
+static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
 {
-	const struct inode *inode;
 	const struct inode *parent;
 	kuid_t puid;
 
@@ -972,7 +970,6 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if owner and follower match. */
-	inode = nd->link_inode;
 	if (uid_eq(current_cred()->fsuid, inode->i_uid))
 		return 0;
 
@@ -1106,73 +1103,6 @@ static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid,
 	return 0;
 }
 
-static __always_inline
-const char *get_link(struct nameidata *nd)
-{
-	struct saved *last = nd->stack + nd->depth - 1;
-	struct dentry *dentry = last->link.dentry;
-	struct inode *inode = nd->link_inode;
-	int error;
-	const char *res;
-
-	if (!(nd->flags & LOOKUP_PARENT)) {
-		error = may_follow_link(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-	}
-
-	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
-		return ERR_PTR(-ELOOP);
-
-	if (!(nd->flags & LOOKUP_RCU)) {
-		touch_atime(&last->link);
-		cond_resched();
-	} else if (atime_needs_update(&last->link, inode)) {
-		if (unlikely(unlazy_walk(nd)))
-			return ERR_PTR(-ECHILD);
-		touch_atime(&last->link);
-	}
-
-	error = security_inode_follow_link(dentry, inode,
-					   nd->flags & LOOKUP_RCU);
-	if (unlikely(error))
-		return ERR_PTR(error);
-
-	nd->last_type = LAST_BIND;
-	res = READ_ONCE(inode->i_link);
-	if (!res) {
-		const char * (*get)(struct dentry *, struct inode *,
-				struct delayed_call *);
-		get = inode->i_op->get_link;
-		if (nd->flags & LOOKUP_RCU) {
-			res = get(NULL, inode, &last->done);
-			if (res == ERR_PTR(-ECHILD)) {
-				if (unlikely(unlazy_walk(nd)))
-					return ERR_PTR(-ECHILD);
-				res = get(dentry, inode, &last->done);
-			}
-		} else {
-			res = get(dentry, inode, &last->done);
-		}
-		if (!res)
-			goto all_done;
-		if (IS_ERR(res))
-			return res;
-	}
-	if (*res == '/') {
-		error = nd_jump_root(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		while (unlikely(*++res == '/'))
-			;
-	}
-	if (*res)
-		return res;
-all_done: // pure jump
-	put_link(nd);
-	return NULL;
-}
-
 /*
  * follow_up - Find the mountpoint of path's vfsmount
  *
@@ -1796,8 +1726,10 @@ static inline int handle_dots(struct nameidata *nd, int type)
 static const char *pick_link(struct nameidata *nd, struct path *link,
 		     struct inode *inode, unsigned seq)
 {
-	int error;
 	struct saved *last;
+	const char *res;
+	int error;
+
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
 		path_to_nameidata(link, nd);
 		return ERR_PTR(-ELOOP);
@@ -1828,9 +1760,64 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-	nd->link_inode = inode;
 	last->seq = seq;
-	return get_link(nd);
+
+	if (!(nd->flags & LOOKUP_PARENT)) {
+		error = may_follow_link(nd, inode);
+		if (unlikely(error))
+			return ERR_PTR(error);
+	}
+
+	if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
+		return ERR_PTR(-ELOOP);
+
+	if (!(nd->flags & LOOKUP_RCU)) {
+		touch_atime(&last->link);
+		cond_resched();
+	} else if (atime_needs_update(&last->link, inode)) {
+		if (unlikely(unlazy_walk(nd)))
+			return ERR_PTR(-ECHILD);
+		touch_atime(&last->link);
+	}
+
+	error = security_inode_follow_link(link->dentry, inode,
+					   nd->flags & LOOKUP_RCU);
+	if (unlikely(error))
+		return ERR_PTR(error);
+
+	nd->last_type = LAST_BIND;
+	res = READ_ONCE(inode->i_link);
+	if (!res) {
+		const char * (*get)(struct dentry *, struct inode *,
+				struct delayed_call *);
+		get = inode->i_op->get_link;
+		if (nd->flags & LOOKUP_RCU) {
+			res = get(NULL, inode, &last->done);
+			if (res == ERR_PTR(-ECHILD)) {
+				if (unlikely(unlazy_walk(nd)))
+					return ERR_PTR(-ECHILD);
+				res = get(link->dentry, inode, &last->done);
+			}
+		} else {
+			res = get(link->dentry, inode, &last->done);
+		}
+		if (!res)
+			goto all_done;
+		if (IS_ERR(res))
+			return res;
+	}
+	if (*res == '/') {
+		error = nd_jump_root(nd);
+		if (unlikely(error))
+			return ERR_PTR(error);
+		while (unlikely(*++res == '/'))
+			;
+	}
+	if (*res)
+		return res;
+all_done: // pure jump
+	put_link(nd);
+	return NULL;
 }
 
 enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (23 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 25/69] finally fold get_link() into pick_link() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW Al Viro
                         ` (46 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are checked in walk_component(), lookup_last() and do_last().
They also get copied to the caller by filename_parentat().  In the last
3 cases the value is what we had at the return from link_path_walk().
In case of walk_component() it's either directly downstream from
assignment in link_path_walk() or, when called by lookup_last(), the
value we have at the return from link_path_walk().

The value at the entry into link_path_walk() can survive to return only
if the pathname contains nothing but slashes.  Note that pick_link()
never returns such - pure jumps are handled directly.  So for the calls
of link_path_walk() for trailing symlinks it does not matter what value
had been there at the entry; the value at the return won't depend upon it.

There are 3 call chains that might have pick_link() storing LAST_BIND:

1) pick_link() from step_into() from walk_component() from
link_path_walk().  In that case we will either be parsing the next
component immediately after return into link_path_walk(), which will
overwrite the ->last_type before anyone has a chance to look at it,
or we'll fail, in which case nobody will be looking at ->last_type at all.

2) pick_link() from step_into() from walk_component() from lookup_last().
The value is never looked at due to the above; it won't affect the value
seen at return from any link_path_walk().

3) pick_link() from step_into() from do_last().  Ditto.

In other words, assignemnt in pick_link() is pointless, and so is
LAST_BIND itself; nothing ever looks at that value.  Kill it off.
And make link_path_walk() _always_ assign ->last_type - in the only
case when the value at the entry might survive to the return that value
is always LAST_ROOT, inherited from path_init().  Move that assignment
from path_init() into the beginning of link_path_walk(), to consolidate
the things.

Historical note: LAST_BIND used to be used for the kludge with trailing
pure jump symlinks (extra iteration through the top-level loop).
No point keeping it anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/path-lookup.rst | 7 ++-----
 fs/namei.c                                | 3 +--
 include/linux/namei.h                     | 2 +-
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index a3216979298b..f46b05e9b96c 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -404,11 +404,8 @@ that is the "next" component in the pathname.
 ``int last_type``
 ~~~~~~~~~~~~~~~~~
 
-This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or
-``LAST_BIND``.  The ``last`` field is only valid if the type is
-``LAST_NORM``.  ``LAST_BIND`` is used when following a symlink and no
-components of the symlink have been processed yet.  Others should be
-fairly self-explanatory.
+This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT`` or ``LAST_DOTDOT``.
+The ``last`` field is only valid if the type is ``LAST_NORM``.
 
 ``struct path root``
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/fs/namei.c b/fs/namei.c
index 9b08c64397c8..438717b462fb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1785,7 +1785,6 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	if (unlikely(error))
 		return ERR_PTR(error);
 
-	nd->last_type = LAST_BIND;
 	res = READ_ONCE(inode->i_link);
 	if (!res) {
 		const char * (*get)(struct dentry *, struct inode *,
@@ -2123,6 +2122,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 {
 	int err;
 
+	nd->last_type = LAST_ROOT;
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 	while (*name=='/')
@@ -2224,7 +2224,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
 
-	nd->last_type = LAST_ROOT; /* if there are only slashes... */
 	nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
 	nd->depth = 0;
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d9576a051808..a4bb992623c4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -15,7 +15,7 @@ enum { MAX_NESTED_LINKS = 8 };
 /*
  * Type of the last component on LOOKUP_PARENT
  */
-enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
+enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 
 /* pathwalk mode */
 #define LOOKUP_FOLLOW		0x0001	/* follow links at the end */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (24 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-14  0:42         ` Linus Torvalds
  2020-03-13 23:53       ` [RFC][PATCH v4 28/69] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
                         ` (45 subsequent siblings)
  71 siblings, 1 reply; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
That's what that flag had really been used for.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 438717b462fb..5d25a3874a5b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1819,7 +1819,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	return NULL;
 }
 
-enum {WALK_FOLLOW = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 /*
  * Do we need to follow links? We _really_ want to be able
@@ -1836,8 +1836,8 @@ static const char *step_into(struct nameidata *nd, int flags,
 	if (err < 0)
 		return ERR_PTR(err);
 	if (likely(!d_is_symlink(path.dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
-	   flags & WALK_NOFOLLOW) {
+	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
+	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(&path, nd);
 		nd->inode = inode;
@@ -2190,10 +2190,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (!name)
 				return 0;
 			/* last component of nested symlink */
-			link = walk_component(nd, WALK_FOLLOW);
+			link = walk_component(nd, 0);
 		} else {
 			/* not the last component */
-			link = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+			link = walk_component(nd, WALK_MORE);
 		}
 		if (unlikely(link)) {
 			if (IS_ERR(link))
@@ -2321,7 +2321,7 @@ static inline const char *lookup_last(struct nameidata *nd)
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
 	nd->flags &= ~LOOKUP_PARENT;
-	link = walk_component(nd, 0);
+	link = walk_component(nd, WALK_TRAILING);
 	if (link) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->stack[0].name = NULL;
@@ -3276,7 +3276,7 @@ static const char *do_last(struct nameidata *nd,
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
-	res = step_into(nd, 0, dentry, inode, seq);
+	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
 	if (unlikely(res)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 28/69] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (25 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 29/69] link_path_walk(): simplify stack handling Al Viro
                         ` (44 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5d25a3874a5b..ff028f12cb95 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1723,8 +1723,10 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+
 static const char *pick_link(struct nameidata *nd, struct path *link,
-		     struct inode *inode, unsigned seq)
+		     struct inode *inode, unsigned seq, int flags)
 {
 	struct saved *last;
 	const char *res;
@@ -1762,7 +1764,7 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	clear_delayed_call(&last->done);
 	last->seq = seq;
 
-	if (!(nd->flags & LOOKUP_PARENT)) {
+	if (flags & WALK_TRAILING) {
 		error = may_follow_link(nd, inode);
 		if (unlikely(error))
 			return ERR_PTR(error);
@@ -1819,8 +1821,6 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	return NULL;
 }
 
-enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-
 /*
  * Do we need to follow links? We _really_ want to be able
  * to do this check without having to look at inode->i_op,
@@ -1849,7 +1849,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
 	}
-	return pick_link(nd, &path, inode, seq);
+	return pick_link(nd, &path, inode, seq, flags);
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 29/69] link_path_walk(): simplify stack handling
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (26 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 28/69] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 30/69] namei: have link_path_walk() maintain LOOKUP_PARENT Al Viro
                         ` (43 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

We use nd->stack to store two things: pinning down the symlinks
we are resolving and resuming the name traversal when a nested
symlink is finished.

Currently, nd->depth is used to keep track of both.  It's 0 when
we call link_path_walk() for the first time (for the pathname
itself) and 1 on all subsequent calls (for trailing symlinks,
if any).  That's fine, as far as pinning symlinks goes - when
handling a trailing symlink, the string we are interpreting
is the body of symlink pinned down in nd->stack[0].  It's
rather inconvenient with respect to handling nested symlinks,
though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case
we need to pick the string saved back when we started to interpret
that nested symlink and resume its traversal) or not (in which
case we are done with link_path_walk()).

Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when
running out of a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.

It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.

The things get simpler if we keep track of the second set's size
explicitly and always store it in nd->stack[0..depth-1].name.
We access the second set only inside link_path_walk(), so its
size can live in a local variable; that way the check becomes
trivial without the need of that kludge.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ff028f12cb95..04c1d798013f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2120,6 +2120,7 @@ static inline u64 hash_name(const void *salt, const char *name)
  */
 static int link_path_walk(const char *name, struct nameidata *nd)
 {
+	int depth = 0; // depth <= nd->depth
 	int err;
 
 	nd->last_type = LAST_ROOT;
@@ -2182,14 +2183,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		} while (unlikely(*name == '/'));
 		if (unlikely(!*name)) {
 OK:
-			/* pathname body, done */
-			if (!nd->depth)
-				return 0;
-			name = nd->stack[nd->depth - 1].name;
-			/* trailing symlink, done */
-			if (!name)
+			/* pathname or trailing symlink, done */
+			if (!depth)
 				return 0;
 			/* last component of nested symlink */
+			name = nd->stack[--depth].name;
 			link = walk_component(nd, 0);
 		} else {
 			/* not the last component */
@@ -2199,7 +2197,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			if (IS_ERR(link))
 				return PTR_ERR(link);
 			/* a symlink to follow */
-			nd->stack[nd->depth - 1].name = name;
+			nd->stack[depth++].name = name;
 			name = link;
 			continue;
 		}
@@ -2324,7 +2322,6 @@ static inline const char *lookup_last(struct nameidata *nd)
 	link = walk_component(nd, WALK_TRAILING);
 	if (link) {
 		nd->flags |= LOOKUP_PARENT;
-		nd->stack[0].name = NULL;
 	}
 	return link;
 }
@@ -3280,7 +3277,6 @@ static const char *do_last(struct nameidata *nd,
 	if (unlikely(res)) {
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-		nd->stack[0].name = NULL;
 		return res;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 30/69] namei: have link_path_walk() maintain LOOKUP_PARENT
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (27 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 29/69] link_path_walk(): simplify stack handling Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 31/69] massage __follow_mount_rcu() a bit Al Viro
                         ` (42 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

set on entry, clear when we get to the last component.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 04c1d798013f..b45ec86dc7b3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2124,6 +2124,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 	int err;
 
 	nd->last_type = LAST_ROOT;
+	nd->flags |= LOOKUP_PARENT;
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 	while (*name=='/')
@@ -2184,8 +2185,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		if (unlikely(!*name)) {
 OK:
 			/* pathname or trailing symlink, done */
-			if (!depth)
+			if (!depth) {
+				nd->flags &= ~LOOKUP_PARENT;
 				return 0;
+			}
 			/* last component of nested symlink */
 			name = nd->stack[--depth].name;
 			link = walk_component(nd, 0);
@@ -2222,7 +2225,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 	if (flags & LOOKUP_RCU)
 		rcu_read_lock();
 
-	nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
+	nd->flags = flags | LOOKUP_JUMPED;
 	nd->depth = 0;
 
 	nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount);
@@ -2314,16 +2317,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 
 static inline const char *lookup_last(struct nameidata *nd)
 {
-	const char *link;
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
-	nd->flags &= ~LOOKUP_PARENT;
-	link = walk_component(nd, WALK_TRAILING);
-	if (link) {
-		nd->flags |= LOOKUP_PARENT;
-	}
-	return link;
+	return walk_component(nd, WALK_TRAILING);
 }
 
 static int handle_lookup_down(struct nameidata *nd)
@@ -3174,7 +3171,6 @@ static const char *do_last(struct nameidata *nd,
 	const char *res;
 	int error;
 
-	nd->flags &= ~LOOKUP_PARENT;
 	nd->flags |= op->intent;
 
 	if (nd->last_type != LAST_NORM) {
@@ -3275,7 +3271,6 @@ static const char *do_last(struct nameidata *nd,
 		put_link(nd);
 	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
 	if (unlikely(res)) {
-		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		return res;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 31/69] massage __follow_mount_rcu() a bit
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (28 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 30/69] namei: have link_path_walk() maintain LOOKUP_PARENT Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 32/69] new helper: traverse_mounts() Al Viro
                         ` (41 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

make the loop more similar to that in follow_managed(), with
explicit tracking of flags, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 70 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b45ec86dc7b3..18c46b8db244 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1269,12 +1269,6 @@ int follow_down_one(struct path *path)
 }
 EXPORT_SYMBOL(follow_down_one);
 
-static inline int managed_dentry_rcu(const struct path *path)
-{
-	return (path->dentry->d_flags & DCACHE_MANAGE_TRANSIT) ?
-		path->dentry->d_op->d_manage(path, true) : 0;
-}
-
 /*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
@@ -1282,43 +1276,49 @@ static inline int managed_dentry_rcu(const struct path *path)
 static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
 			       struct inode **inode, unsigned *seqp)
 {
+	struct dentry *dentry = path->dentry;
+	unsigned int flags = dentry->d_flags;
+
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
+		return true;
+
+	if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+		return false;
+
 	for (;;) {
-		struct mount *mounted;
 		/*
 		 * Don't forget we might have a non-mountpoint managed dentry
 		 * that wants to block transit.
 		 */
-		switch (managed_dentry_rcu(path)) {
-		case -ECHILD:
-		default:
-			return false;
-		case -EISDIR:
-			return true;
-		case 0:
-			break;
+		if (unlikely(flags & DCACHE_MANAGE_TRANSIT)) {
+			int res = dentry->d_op->d_manage(path, true);
+			if (res)
+				return res == -EISDIR;
+			flags = dentry->d_flags;
 		}
 
-		if (!d_mountpoint(path->dentry))
-			return !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
-
-		mounted = __lookup_mnt(path->mnt, path->dentry);
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return false;
-		path->mnt = &mounted->mnt;
-		path->dentry = mounted->mnt.mnt_root;
-		nd->flags |= LOOKUP_JUMPED;
-		*seqp = read_seqcount_begin(&path->dentry->d_seq);
-		/*
-		 * Update the inode too. We don't need to re-check the
-		 * dentry sequence number here after this d_inode read,
-		 * because a mount-point is always pinned.
-		 */
-		*inode = path->dentry->d_inode;
+		if (flags & DCACHE_MOUNTED) {
+			struct mount *mounted = __lookup_mnt(path->mnt, dentry);
+			if (mounted) {
+				path->mnt = &mounted->mnt;
+				dentry = path->dentry = mounted->mnt.mnt_root;
+				nd->flags |= LOOKUP_JUMPED;
+				*seqp = read_seqcount_begin(&dentry->d_seq);
+				*inode = dentry->d_inode;
+				/*
+				 * We don't need to re-check ->d_seq after this
+				 * ->d_inode read - there will be an RCU delay
+				 * between mount hash removal and ->mnt_root
+				 * becoming unpinned.
+				 */
+				flags = dentry->d_flags;
+				continue;
+			}
+			if (read_seqretry(&mount_lock, nd->m_seq))
+				return false;
+		}
+		return !(flags & DCACHE_NEED_AUTOMOUNT);
 	}
-	return !read_seqretry(&mount_lock, nd->m_seq) &&
-		!(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
 }
 
 static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 32/69] new helper: traverse_mounts()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (29 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 31/69] massage __follow_mount_rcu() a bit Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 33/69] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
                         ` (40 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

common guts of follow_down() and follow_managed() taken to a new
helper - traverse_mounts().  The remnants of follow_managed()
are folded into its sole remaining caller (handle_mounts()).
Calling conventions of handle_mounts() slightly sanitized -
instead of the weird "1 for success, -E... for failure" that used
to be imposed by the calling conventions of walk_component() et.al.
we can use the normal "0 for success, -E... for failure".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 177 +++++++++++++++++++++++++------------------------------------
 1 file changed, 72 insertions(+), 105 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 18c46b8db244..40d5f7abfa54 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1168,91 +1168,79 @@ static int follow_automount(struct path *path, int *count, unsigned lookup_flags
 }
 
 /*
- * Handle a dentry that is managed in some way.
- * - Flagged for transit management (autofs)
- * - Flagged as mountpoint
- * - Flagged as automount point
- *
- * This may only be called in refwalk mode.
- * On success path->dentry is known positive.
- *
- * Serialization is taken care of in namespace.c
+ * mount traversal - out-of-line part.  One note on ->d_flags accesses -
+ * dentries are pinned but not locked here, so negative dentry can go
+ * positive right under us.  Use of smp_load_acquire() provides a barrier
+ * sufficient for ->d_inode and ->d_flags consistency.
  */
-static int follow_managed(struct path *path, struct nameidata *nd)
+static int __traverse_mounts(struct path *path, unsigned flags, bool *jumped,
+			     int *count, unsigned lookup_flags)
 {
-	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
-	unsigned flags;
+	struct vfsmount *mnt = path->mnt;
 	bool need_mntput = false;
 	int ret = 0;
 
-	/* Given that we're not holding a lock here, we retain the value in a
-	 * local variable for each dentry as we look at it so that we don't see
-	 * the components of that value change under us */
-	while (flags = smp_load_acquire(&path->dentry->d_flags),
-	       unlikely(flags & DCACHE_MANAGED_DENTRY)) {
+	while (flags & DCACHE_MANAGED_DENTRY) {
 		/* Allow the filesystem to manage the transit without i_mutex
 		 * being held. */
 		if (flags & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
 			ret = path->dentry->d_op->d_manage(path, false);
 			flags = smp_load_acquire(&path->dentry->d_flags);
 			if (ret < 0)
 				break;
 		}
 
-		/* Transit to a mounted filesystem. */
-		if (flags & DCACHE_MOUNTED) {
+		if (flags & DCACHE_MOUNTED) {	// something's mounted on it..
 			struct vfsmount *mounted = lookup_mnt(path);
-			if (mounted) {
+			if (mounted) {		// ... in our namespace
 				dput(path->dentry);
 				if (need_mntput)
 					mntput(path->mnt);
 				path->mnt = mounted;
 				path->dentry = dget(mounted->mnt_root);
+				// here we know it's positive
+				flags = path->dentry->d_flags;
 				need_mntput = true;
 				continue;
 			}
-
-			/* Something is mounted on this dentry in another
-			 * namespace and/or whatever was mounted there in this
-			 * namespace got unmounted before lookup_mnt() could
-			 * get it */
 		}
 
-		/* Handle an automount point */
-		if (flags & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, &nd->total_link_count,
-						nd->flags);
-			if (ret < 0)
-				break;
-			continue;
-		}
+		if (!(flags & DCACHE_NEED_AUTOMOUNT))
+			break;
 
-		/* We didn't change the current path point */
-		break;
+		// uncovered automount point
+		ret = follow_automount(path, count, lookup_flags);
+		flags = smp_load_acquire(&path->dentry->d_flags);
+		if (ret < 0)
+			break;
 	}
 
-	if (need_mntput) {
-		if (path->mnt == mnt)
-			mntput(path->mnt);
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			ret = -EXDEV;
-		else
-			nd->flags |= LOOKUP_JUMPED;
-	}
-	if (ret == -EISDIR || !ret)
-		ret = 1;
-	if (ret > 0 && unlikely(d_flags_negative(flags)))
+	if (ret == -EISDIR)
+		ret = 0;
+	// possible if you race with several mount --move
+	if (need_mntput && path->mnt == mnt)
+		mntput(path->mnt);
+	if (!ret && unlikely(d_flags_negative(flags)))
 		ret = -ENOENT;
-	if (unlikely(ret < 0)) {
-		dput(path->dentry);
-		if (path->mnt != nd->path.mnt)
-			mntput(path->mnt);
-	}
+	*jumped = need_mntput;
 	return ret;
 }
 
+static inline int traverse_mounts(struct path *path, bool *jumped,
+				  int *count, unsigned lookup_flags)
+{
+	unsigned flags = smp_load_acquire(&path->dentry->d_flags);
+
+	/* fastpath */
+	if (likely(!(flags & DCACHE_MANAGED_DENTRY))) {
+		*jumped = false;
+		if (unlikely(d_flags_negative(flags)))
+			return -ENOENT;
+		return 0;
+	}
+	return __traverse_mounts(path, flags, jumped, count, lookup_flags);
+}
+
 int follow_down_one(struct path *path)
 {
 	struct vfsmount *mounted;
@@ -1270,6 +1258,23 @@ int follow_down_one(struct path *path)
 EXPORT_SYMBOL(follow_down_one);
 
 /*
+ * Follow down to the covering mount currently visible to userspace.  At each
+ * point, the filesystem owning that dentry may be queried as to whether the
+ * caller is permitted to proceed or not.
+ */
+int follow_down(struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	bool jumped;
+	int ret = traverse_mounts(path, &jumped, NULL, 0);
+
+	if (path->mnt != mnt)
+		mntput(mnt);
+	return ret;
+}
+EXPORT_SYMBOL(follow_down);
+
+/*
  * Try to skip to top of mountpoint pile in rcuwalk mode.  Fail if
  * we meet a managed dentry that would need blocking.
  */
@@ -1325,6 +1330,7 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 			  struct path *path, struct inode **inode,
 			  unsigned int *seqp)
 {
+	bool jumped;
 	int ret;
 
 	path->mnt = nd->path.mnt;
@@ -1334,15 +1340,25 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 		if (unlikely(!*inode))
 			return -ENOENT;
 		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
-			return 1;
+			return 0;
 		if (unlazy_child(nd, dentry, seq))
 			return -ECHILD;
 		// *path might've been clobbered by __follow_mount_rcu()
 		path->mnt = nd->path.mnt;
 		path->dentry = dentry;
 	}
-	ret = follow_managed(path, nd);
-	if (likely(ret >= 0)) {
+	ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
+	if (jumped) {
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			ret = -EXDEV;
+		else
+			nd->flags |= LOOKUP_JUMPED;
+	}
+	if (unlikely(ret)) {
+		dput(path->dentry);
+		if (path->mnt != nd->path.mnt)
+			mntput(path->mnt);
+	} else {
 		*inode = d_backing_inode(path->dentry);
 		*seqp = 0; /* out of RCU mode, so the value doesn't matter */
 	}
@@ -1411,55 +1427,6 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 }
 
 /*
- * Follow down to the covering mount currently visible to userspace.  At each
- * point, the filesystem owning that dentry may be queried as to whether the
- * caller is permitted to proceed or not.
- */
-int follow_down(struct path *path)
-{
-	unsigned managed;
-	int ret;
-
-	while (managed = READ_ONCE(path->dentry->d_flags),
-	       unlikely(managed & DCACHE_MANAGED_DENTRY)) {
-		/* Allow the filesystem to manage the transit without i_mutex
-		 * being held.
-		 *
-		 * We indicate to the filesystem if someone is trying to mount
-		 * something here.  This gives autofs the chance to deny anyone
-		 * other than its daemon the right to mount on its
-		 * superstructure.
-		 *
-		 * The filesystem may sleep at this point.
-		 */
-		if (managed & DCACHE_MANAGE_TRANSIT) {
-			BUG_ON(!path->dentry->d_op);
-			BUG_ON(!path->dentry->d_op->d_manage);
-			ret = path->dentry->d_op->d_manage(path, false);
-			if (ret < 0)
-				return ret == -EISDIR ? 0 : ret;
-		}
-
-		/* Transit to a mounted filesystem. */
-		if (managed & DCACHE_MOUNTED) {
-			struct vfsmount *mounted = lookup_mnt(path);
-			if (!mounted)
-				break;
-			dput(path->dentry);
-			mntput(path->mnt);
-			path->mnt = mounted;
-			path->dentry = dget(mounted->mnt_root);
-			continue;
-		}
-
-		/* Don't handle automount points here */
-		break;
-	}
-	return 0;
-}
-EXPORT_SYMBOL(follow_down);
-
-/*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
 static void follow_mount(struct path *path)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 33/69] atomic_open(): return the right dentry in FMODE_OPENED case
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (30 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 32/69] new helper: traverse_mounts() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 34/69] atomic_open(): lift the call of may_open() into do_last() Al Viro
                         ` (39 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

->atomic_open() might have used a different alias than the one we'd
passed to it; in "not opened" case we take care of that, in "opened"
one we don't.  Currently we don't care downstream of "opened" case
which alias to return; however, that will change shortly when we
get to unifying may_open() calls.

It's not hard to get right in all cases, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 40d5f7abfa54..88f985aff4f8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2949,11 +2949,15 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
+			int acc_mode = op->acc_mode;
+			if (unlikely(dentry != file->f_path.dentry)) {
+				dput(dentry);
+				dentry = dget(file->f_path.dentry);
+			}
 			/*
 			 * We didn't have the inode before the open, so check open
 			 * permission here.
 			 */
-			int acc_mode = op->acc_mode;
 			if (file->f_mode & FMODE_CREATED) {
 				WARN_ON(!(open_flag & O_CREAT));
 				fsnotify_create(dir, dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 34/69] atomic_open(): lift the call of may_open() into do_last()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (31 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 33/69] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 35/69] do_last(): merge the may_open() calls Al Viro
                         ` (38 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

there we'll be able to merge it with its counterparts in other
cases, and there's no reason to do it before the parent has
been unlocked

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 88f985aff4f8..7e932d9a71a9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2949,23 +2949,12 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	d_lookup_done(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
-			int acc_mode = op->acc_mode;
 			if (unlikely(dentry != file->f_path.dentry)) {
 				dput(dentry);
 				dentry = dget(file->f_path.dentry);
 			}
-			/*
-			 * We didn't have the inode before the open, so check open
-			 * permission here.
-			 */
-			if (file->f_mode & FMODE_CREATED) {
-				WARN_ON(!(open_flag & O_CREAT));
+			if (file->f_mode & FMODE_CREATED)
 				fsnotify_create(dir, dentry);
-				acc_mode = 0;
-			}
-			error = may_open(&file->f_path, acc_mode, open_flag);
-			if (WARN_ON(error > 0))
-				error = -EINVAL;
 		} else if (WARN_ON(file->f_path.dentry == DENTRY_NOT_SET)) {
 			error = -EIO;
 		} else {
@@ -3208,12 +3197,19 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_OPENED) {
-		if ((file->f_mode & FMODE_CREATED) ||
-		    !S_ISREG(file_inode(file)->i_mode))
+		if (file->f_mode & FMODE_CREATED) {
+			open_flag &= ~O_TRUNC;
+			will_truncate = false;
+			acc_mode = 0;
+		} else if (!S_ISREG(file_inode(file)->i_mode))
 			will_truncate = false;
 
 		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(dentry);
+		dput(nd->path.dentry);
+		nd->path.dentry = dentry;
+		error = may_open(&nd->path, acc_mode, open_flag);
+		if (error)
+			goto out;
 		goto opened;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 35/69] do_last(): merge the may_open() calls
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (32 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 34/69] atomic_open(): lift the call of may_open() into do_last() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 36/69] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
                         ` (37 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

have FMODE_OPENED case rejoin the main path at earlier point

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7e932d9a71a9..2f8a5d3be784 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3207,10 +3207,7 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		error = may_open(&nd->path, acc_mode, open_flag);
-		if (error)
-			goto out;
-		goto opened;
+		goto finish_open_created;
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
@@ -3277,11 +3274,10 @@ static const char *do_last(struct nameidata *nd,
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-	BUG_ON(file->f_mode & FMODE_OPENED); /* once it's opened, it's opened */
-	error = vfs_open(&nd->path, file);
+	if (!(file->f_mode & FMODE_OPENED))
+		error = vfs_open(&nd->path, file);
 	if (error)
 		goto out;
-opened:
 	error = ima_file_check(file, op->acc_mode);
 	if (!error && will_truncate)
 		error = handle_truncate(file);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 36/69] do_last(): don't bother with keeping got_write in FMODE_OPENED case
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (33 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 35/69] do_last(): merge the may_open() calls Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 37/69] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
                         ` (36 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

it's easier to drop it right after lookup_open() and regain if
needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
path we do that anyway.  In case of FMODE_CREATED we won't be
needing it.  And it's easier to prove correctness that way,
especially since the initial failure to get write access is not
always fatal; proving that we'll never end up truncating in that
case is rather convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2f8a5d3be784..2eef41f505db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3191,11 +3191,14 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (IS_ERR(dentry)) {
-		error = PTR_ERR(dentry);
-		goto out;
+	if (got_write) {
+		mnt_drop_write(nd->path.mnt);
+		got_write = false;
 	}
 
+	if (IS_ERR(dentry))
+		return ERR_CAST(dentry);
+
 	if (file->f_mode & FMODE_OPENED) {
 		if (file->f_mode & FMODE_CREATED) {
 			open_flag &= ~O_TRUNC;
@@ -3220,16 +3223,6 @@ static const char *do_last(struct nameidata *nd,
 		goto finish_open_created;
 	}
 
-	/*
-	 * If atomic_open() acquired write access it is dropped now due to
-	 * possible mount and symlink following (this might be optimized away if
-	 * necessary...)
-	 */
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
@@ -3250,27 +3243,25 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(error);
 	audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-		error = -EISDIR;
 		if (d_is_dir(nd->path.dentry))
-			goto out;
+			return ERR_PTR(-EISDIR);
 		error = may_create_in_sticky(dir_mode, dir_uid,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
-			goto out;
+			return ERR_PTR(error);
 	}
-	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
-		goto out;
+		return ERR_PTR(-ENOTDIR);
 	if (!d_is_reg(nd->path.dentry))
 		will_truncate = false;
 
+finish_open_created:
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-			goto out;
+			return ERR_PTR(error);
 		got_write = true;
 	}
-finish_open_created:
 	error = may_open(&nd->path, acc_mode, open_flag);
 	if (error)
 		goto out;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 37/69] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (34 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 36/69] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created Al Viro
                         ` (35 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2eef41f505db..c85fdfd6b33d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3200,13 +3200,6 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_CAST(dentry);
 
 	if (file->f_mode & FMODE_OPENED) {
-		if (file->f_mode & FMODE_CREATED) {
-			open_flag &= ~O_TRUNC;
-			will_truncate = false;
-			acc_mode = 0;
-		} else if (!S_ISREG(file_inode(file)->i_mode))
-			will_truncate = false;
-
 		audit_inode(nd->name, file->f_path.dentry, 0);
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
@@ -3214,10 +3207,6 @@ static const char *do_last(struct nameidata *nd,
 	}
 
 	if (file->f_mode & FMODE_CREATED) {
-		/* Don't check for write permission, don't truncate */
-		open_flag &= ~O_TRUNC;
-		will_truncate = false;
-		acc_mode = 0;
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3252,10 +3241,16 @@ static const char *do_last(struct nameidata *nd,
 	}
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
-	if (!d_is_reg(nd->path.dentry))
-		will_truncate = false;
 
 finish_open_created:
+	if (file->f_mode & FMODE_CREATED) {
+		/* Don't check for write permission, don't truncate */
+		open_flag &= ~O_TRUNC;
+		will_truncate = false;
+		acc_mode = 0;
+	} else if (!d_is_reg(nd->path.dentry)) {
+		will_truncate = false;
+	}
 	if (will_truncate) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (35 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 37/69] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 39/69] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
                         ` (34 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Don't mess with got_write there - it is guaranteed to be false on
entry and it will be set true if and only if we decide to go for
truncation and manage to get write access for that.

Don't carry acc_mode through the entire thing - it's only used
in that part.  And don't bother with gotos in there - compiler is
quite capable of optimizing that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c85fdfd6b33d..8cdf8ef41194 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3122,9 +3122,9 @@ static const char *do_last(struct nameidata *nd,
 	kuid_t dir_uid = nd->inode->i_uid;
 	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool will_truncate = (open_flag & O_TRUNC) != 0;
+	bool do_truncate;
 	bool got_write = false;
-	int acc_mode = op->acc_mode;
+	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3243,36 +3243,30 @@ static const char *do_last(struct nameidata *nd,
 		return ERR_PTR(-ENOTDIR);
 
 finish_open_created:
+	do_truncate = false;
+	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
 		/* Don't check for write permission, don't truncate */
 		open_flag &= ~O_TRUNC;
-		will_truncate = false;
 		acc_mode = 0;
-	} else if (!d_is_reg(nd->path.dentry)) {
-		will_truncate = false;
-	}
-	if (will_truncate) {
+	} else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
 			return ERR_PTR(error);
-		got_write = true;
+		do_truncate = true;
 	}
 	error = may_open(&nd->path, acc_mode, open_flag);
-	if (error)
-		goto out;
-	if (!(file->f_mode & FMODE_OPENED))
+	if (!error && !(file->f_mode & FMODE_OPENED))
 		error = vfs_open(&nd->path, file);
-	if (error)
-		goto out;
-	error = ima_file_check(file, op->acc_mode);
-	if (!error && will_truncate)
+	if (!error)
+		error = ima_file_check(file, op->acc_mode);
+	if (!error && do_truncate)
 		error = handle_truncate(file);
-out:
 	if (unlikely(error > 0)) {
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	if (got_write)
+	if (do_truncate)
 		mnt_drop_write(nd->path.mnt);
 	return ERR_PTR(error);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 39/69] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (36 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 40/69] split the lookup-related parts of do_last() into a separate helper Al Viro
                         ` (33 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... getting may_create_in_sticky() checks in FMODE_OPENED case as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8cdf8ef41194..798672577367 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3199,14 +3199,7 @@ static const char *do_last(struct nameidata *nd,
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 
-	if (file->f_mode & FMODE_OPENED) {
-		audit_inode(nd->name, file->f_path.dentry, 0);
-		dput(nd->path.dentry);
-		nd->path.dentry = dentry;
-		goto finish_open_created;
-	}
-
-	if (file->f_mode & FMODE_CREATED) {
+	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
 		goto finish_open_created;
@@ -3230,7 +3223,9 @@ static const char *do_last(struct nameidata *nd,
 	error = complete_walk(nd);
 	if (error)
 		return ERR_PTR(error);
-	audit_inode(nd->name, nd->path.dentry, 0);
+finish_open_created:
+	if (!(file->f_mode & FMODE_CREATED))
+		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		if (d_is_dir(nd->path.dentry))
 			return ERR_PTR(-EISDIR);
@@ -3242,7 +3237,6 @@ static const char *do_last(struct nameidata *nd,
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		return ERR_PTR(-ENOTDIR);
 
-finish_open_created:
 	do_truncate = false;
 	acc_mode = op->acc_mode;
 	if (file->f_mode & FMODE_CREATED) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 40/69] split the lookup-related parts of do_last() into a separate helper
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (37 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 39/69] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 41/69] path_connected(): pass mount and dentry separately Al Viro
                         ` (32 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 798672577367..6530e3cbd486 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3112,19 +3112,12 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	return ERR_PTR(error);
 }
 
-/*
- * Handle the last step of open()
- */
-static const char *do_last(struct nameidata *nd,
+static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	struct dentry *dir = nd->path.dentry;
-	kuid_t dir_uid = nd->inode->i_uid;
-	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
-	bool do_truncate;
 	bool got_write = false;
-	int acc_mode;
 	unsigned seq;
 	struct inode *inode;
 	struct dentry *dentry;
@@ -3137,9 +3130,9 @@ static const char *do_last(struct nameidata *nd,
 		if (nd->depth)
 			put_link(nd);
 		error = handle_dots(nd, nd->last_type);
-		if (unlikely(error))
-			return ERR_PTR(error);
-		goto finish_open;
+		if (likely(!error))
+			error = complete_walk(nd);
+		return ERR_PTR(error);
 	}
 
 	if (!(open_flag & O_CREAT)) {
@@ -3152,7 +3145,6 @@ static const char *do_last(struct nameidata *nd,
 		if (likely(dentry))
 			goto finish_lookup;
 
-		BUG_ON(nd->inode != dir->d_inode);
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
 		/* create side of things */
@@ -3162,7 +3154,7 @@ static const char *do_last(struct nameidata *nd,
 		 * about to look up
 		 */
 		error = complete_walk(nd);
-		if (error)
+		if (unlikely(error))
 			return ERR_PTR(error);
 
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
@@ -3191,10 +3183,8 @@ static const char *do_last(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
-	if (got_write) {
+	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
 
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
@@ -3202,7 +3192,7 @@ static const char *do_last(struct nameidata *nd,
 	if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
 		dput(nd->path.dentry);
 		nd->path.dentry = dentry;
-		goto finish_open_created;
+		return NULL;
 	}
 
 finish_lookup:
@@ -3218,12 +3208,29 @@ static const char *do_last(struct nameidata *nd,
 		audit_inode(nd->name, nd->path.dentry, 0);
 		return ERR_PTR(-EEXIST);
 	}
-finish_open:
+
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
-	error = complete_walk(nd);
-	if (error)
-		return ERR_PTR(error);
-finish_open_created:
+	return ERR_PTR(complete_walk(nd));
+}
+
+/*
+ * Handle the last step of open()
+ */
+static const char *do_last(struct nameidata *nd,
+		   struct file *file, const struct open_flags *op)
+{
+	kuid_t dir_uid = nd->inode->i_uid;
+	umode_t dir_mode = nd->inode->i_mode;
+	int open_flag = op->open_flag;
+	bool do_truncate;
+	int acc_mode;
+	const char *link;
+	int error;
+
+	link = open_last_lookups(nd, file, op);
+	if (unlikely(link))
+		return link;
+
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 41/69] path_connected(): pass mount and dentry separately
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (38 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 40/69] split the lookup-related parts of do_last() into a separate helper Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 42/69] path_parent_directory(): leave changing path->dentry to callers Al Viro
                         ` (31 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

eventually we'll want to do that check *before* mangling
nd->path.dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6530e3cbd486..2bf9f605c46f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -550,22 +550,20 @@ static int __nd_alloc_stack(struct nameidata *nd)
 }
 
 /**
- * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
- * @path: nameidate to verify
+ * path_connected - Verify that a dentry is below mnt.mnt_root
  *
  * Rename can sometimes move a file or directory outside of a bind
  * mount, path_connected allows those cases to be detected.
  */
-static bool path_connected(const struct path *path)
+static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
 {
-	struct vfsmount *mnt = path->mnt;
 	struct super_block *sb = mnt->mnt_sb;
 
 	/* Bind mounts and multi-root filesystems can have disconnected paths */
 	if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
 		return true;
 
-	return is_subdir(path->dentry, mnt->mnt_root);
+	return is_subdir(dentry, mnt->mnt_root);
 }
 
 static inline int nd_alloc_stack(struct nameidata *nd)
@@ -1386,7 +1384,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 				return -ECHILD;
 			nd->path.dentry = parent;
 			nd->seq = seq;
-			if (unlikely(!path_connected(&nd->path)))
+			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return -ECHILD;
 			break;
 		} else {
@@ -1448,7 +1446,7 @@ static int path_parent_directory(struct path *path)
 	/* rare case of legitimate dget_parent()... */
 	path->dentry = dget_parent(path->dentry);
 	dput(old);
-	if (unlikely(!path_connected(path)))
+	if (unlikely(!path_connected(path->mnt, path->dentry)))
 		return -ENOENT;
 	return 0;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 42/69] path_parent_directory(): leave changing path->dentry to callers
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (39 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 41/69] path_connected(): pass mount and dentry separately Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 43/69] expand path_parent_directory() in its callers Al Viro
                         ` (30 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Instead of returning 0, return new dentry; instead of returning
-ENOENT, return NULL.  Adjust the callers accordingly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2bf9f605c46f..49b2a08105c7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1440,19 +1440,21 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static int path_parent_directory(struct path *path)
+static struct dentry *path_parent_directory(struct path *path)
 {
-	struct dentry *old = path->dentry;
 	/* rare case of legitimate dget_parent()... */
-	path->dentry = dget_parent(path->dentry);
-	dput(old);
-	if (unlikely(!path_connected(path->mnt, path->dentry)))
-		return -ENOENT;
-	return 0;
+	struct dentry *parent = dget_parent(path->dentry);
+
+	if (unlikely(!path_connected(path->mnt, parent))) {
+		dput(parent);
+		parent = NULL;
+	}
+	return parent;
 }
 
 static int follow_dotdot(struct nameidata *nd)
 {
+	struct dentry *parent;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
 			if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1460,9 +1462,11 @@ static int follow_dotdot(struct nameidata *nd)
 			break;
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			int ret = path_parent_directory(&nd->path);
-			if (ret)
-				return ret;
+			parent = path_parent_directory(&nd->path);
+			if (!parent)
+				return -ENOENT;
+			dput(nd->path.dentry);
+			nd->path.dentry = parent;
 			break;
 		}
 		if (!follow_up(&nd->path))
@@ -2600,13 +2604,13 @@ int path_pts(struct path *path)
 	 */
 	struct dentry *child, *parent;
 	struct qstr this;
-	int ret;
 
-	ret = path_parent_directory(path);
-	if (ret)
-		return ret;
+	parent = path_parent_directory(path);
+	if (!parent)
+		return -ENOENT;
 
-	parent = path->dentry;
+	dput(path->dentry);
+	path->dentry = parent;
 	this.name = "pts";
 	this.len = 3;
 	child = d_hash_and_lookup(parent, &this);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 43/69] expand path_parent_directory() in its callers
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (40 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 42/69] path_parent_directory(): leave changing path->dentry to callers Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 44/69] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
                         ` (29 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 49b2a08105c7..88e5d7920540 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1440,18 +1440,6 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static struct dentry *path_parent_directory(struct path *path)
-{
-	/* rare case of legitimate dget_parent()... */
-	struct dentry *parent = dget_parent(path->dentry);
-
-	if (unlikely(!path_connected(path->mnt, parent))) {
-		dput(parent);
-		parent = NULL;
-	}
-	return parent;
-}
-
 static int follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent;
@@ -1462,9 +1450,13 @@ static int follow_dotdot(struct nameidata *nd)
 			break;
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			parent = path_parent_directory(&nd->path);
-			if (!parent)
+			/* rare case of legitimate dget_parent()... */
+			parent = dget_parent(nd->path.dentry);
+
+			if (unlikely(!path_connected(nd->path.mnt, parent))) {
+				dput(parent);
 				return -ENOENT;
+			}
 			dput(nd->path.dentry);
 			nd->path.dentry = parent;
 			break;
@@ -2602,13 +2594,14 @@ int path_pts(struct path *path)
 	/* Find something mounted on "pts" in the same directory as
 	 * the input path.
 	 */
-	struct dentry *child, *parent;
+	struct dentry *parent = dget_parent(path->dentry);
+	struct dentry *child;
 	struct qstr this;
 
-	parent = path_parent_directory(path);
-	if (!parent)
+	if (unlikely(!path_connected(path->mnt, parent))) {
+		dput(parent);
 		return -ENOENT;
-
+	}
 	dput(path->dentry);
 	path->dentry = parent;
 	this.name = "pts";
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 44/69] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (41 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 43/69] expand path_parent_directory() in its callers Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 45/69] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
                         ` (28 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 88e5d7920540..772d82daf3b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1365,7 +1365,9 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 
 static int follow_dotdot_rcu(struct nameidata *nd)
 {
+	struct dentry *parent = NULL;
 	struct inode *inode = nd->inode;
+	unsigned seq;
 
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
@@ -1375,15 +1377,12 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
-			struct dentry *parent = old->d_parent;
-			unsigned seq;
 
+			parent = old->d_parent;
 			inode = parent->d_inode;
 			seq = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 				return -ECHILD;
-			nd->path.dentry = parent;
-			nd->seq = seq;
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return -ECHILD;
 			break;
@@ -1406,6 +1405,10 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			nd->seq = seq;
 		}
 	}
+	if (likely(parent)) {
+		nd->path.dentry = parent;
+		nd->seq = seq;
+	}
 	while (unlikely(d_mountpoint(nd->path.dentry))) {
 		struct mount *mounted;
 		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
@@ -1442,7 +1445,7 @@ static void follow_mount(struct path *path)
 
 static int follow_dotdot(struct nameidata *nd)
 {
-	struct dentry *parent;
+	struct dentry *parent = NULL;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root)) {
 			if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1452,13 +1455,10 @@ static int follow_dotdot(struct nameidata *nd)
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
 			parent = dget_parent(nd->path.dentry);
-
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
 				return -ENOENT;
 			}
-			dput(nd->path.dentry);
-			nd->path.dentry = parent;
 			break;
 		}
 		if (!follow_up(&nd->path))
@@ -1466,6 +1466,10 @@ static int follow_dotdot(struct nameidata *nd)
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return -EXDEV;
 	}
+	if (likely(parent)) {
+		dput(nd->path.dentry);
+		nd->path.dentry = parent;
+	}
 	follow_mount(&nd->path);
 	nd->inode = nd->path.dentry->d_inode;
 	return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 45/69] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (42 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 44/69] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 46/69] move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into() Al Viro
                         ` (27 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root
yields an error even if it's not the process' root.  That's
possible only if you'd managed to escape chroot jail by way of
procfs symlinks, but IMO the resulting behaviour is not worse -
more consistent and easier to describe:
	".." in root is "stay where you are", uness LOOKUP_BENEATH
	has been given, in which case it's "fail with EXDEV".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 772d82daf3b4..577dc541a4d4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1370,11 +1370,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 	unsigned seq;
 
 	while (1) {
-		if (path_equal(&nd->path, &nd->root)) {
-			if (unlikely(nd->flags & LOOKUP_BENEATH))
-				return -ECHILD;
+		if (path_equal(&nd->path, &nd->root))
 			break;
-		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
 
@@ -1405,7 +1402,10 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			nd->seq = seq;
 		}
 	}
-	if (likely(parent)) {
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -ECHILD;
+	} else {
 		nd->path.dentry = parent;
 		nd->seq = seq;
 	}
@@ -1447,11 +1447,8 @@ static int follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent = NULL;
 	while (1) {
-		if (path_equal(&nd->path, &nd->root)) {
-			if (unlikely(nd->flags & LOOKUP_BENEATH))
-				return -EXDEV;
+		if (path_equal(&nd->path, &nd->root))
 			break;
-		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
 			parent = dget_parent(nd->path.dentry);
@@ -1466,7 +1463,10 @@ static int follow_dotdot(struct nameidata *nd)
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return -EXDEV;
 	}
-	if (likely(parent)) {
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -EXDEV;
+	} else {
 		dput(nd->path.dentry);
 		nd->path.dentry = parent;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 46/69] move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (43 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 45/69] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 47/69] handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
                         ` (26 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

pure move; we are going to have step_into() called by that bunch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 260 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 130 insertions(+), 130 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 577dc541a4d4..3ee3b8719505 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1363,70 +1363,6 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	return ret;
 }
 
-static int follow_dotdot_rcu(struct nameidata *nd)
-{
-	struct dentry *parent = NULL;
-	struct inode *inode = nd->inode;
-	unsigned seq;
-
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			struct dentry *old = nd->path.dentry;
-
-			parent = old->d_parent;
-			inode = parent->d_inode;
-			seq = read_seqcount_begin(&parent->d_seq);
-			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return -ECHILD;
-			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return -ECHILD;
-			break;
-		} else {
-			struct mount *mnt = real_mount(nd->path.mnt);
-			struct mount *mparent = mnt->mnt_parent;
-			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode2 = mountpoint->d_inode;
-			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return -ECHILD;
-			if (&mparent->mnt == nd->path.mnt)
-				break;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return -ECHILD;
-			/* we know that mountpoint was pinned */
-			nd->path.dentry = mountpoint;
-			nd->path.mnt = &mparent->mnt;
-			inode = inode2;
-			nd->seq = seq;
-		}
-	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -ECHILD;
-	} else {
-		nd->path.dentry = parent;
-		nd->seq = seq;
-	}
-	while (unlikely(d_mountpoint(nd->path.dentry))) {
-		struct mount *mounted;
-		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
-		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return -ECHILD;
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -ECHILD;
-		nd->path.mnt = &mounted->mnt;
-		nd->path.dentry = mounted->mnt.mnt_root;
-		inode = nd->path.dentry->d_inode;
-		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
-	}
-	nd->inode = inode;
-	return 0;
-}
-
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1443,38 +1379,6 @@ static void follow_mount(struct path *path)
 	}
 }
 
-static int follow_dotdot(struct nameidata *nd)
-{
-	struct dentry *parent = NULL;
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			/* rare case of legitimate dget_parent()... */
-			parent = dget_parent(nd->path.dentry);
-			if (unlikely(!path_connected(nd->path.mnt, parent))) {
-				dput(parent);
-				return -ENOENT;
-			}
-			break;
-		}
-		if (!follow_up(&nd->path))
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -EXDEV;
-	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -EXDEV;
-	} else {
-		dput(nd->path.dentry);
-		nd->path.dentry = parent;
-	}
-	follow_mount(&nd->path);
-	nd->inode = nd->path.dentry->d_inode;
-	return 0;
-}
-
 /*
  * This looks up the name in dcache and possibly revalidates the found dentry.
  * NULL is returned if the dentry does not exist in the cache.
@@ -1654,40 +1558,6 @@ static inline int may_lookup(struct nameidata *nd)
 	return inode_permission(nd->inode, MAY_EXEC);
 }
 
-static inline int handle_dots(struct nameidata *nd, int type)
-{
-	if (type == LAST_DOTDOT) {
-		int error = 0;
-
-		if (!nd->root.mnt) {
-			error = set_root(nd);
-			if (error)
-				return error;
-		}
-		if (nd->flags & LOOKUP_RCU)
-			error = follow_dotdot_rcu(nd);
-		else
-			error = follow_dotdot(nd);
-		if (error)
-			return error;
-
-		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
-			/*
-			 * If there was a racing rename or mount along our
-			 * path, then we can't be sure that ".." hasn't jumped
-			 * above nd->root (and so userspace should retry or use
-			 * some fallback).
-			 */
-			smp_rmb();
-			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
-				return -EAGAIN;
-			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
-				return -EAGAIN;
-		}
-	}
-	return 0;
-}
-
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
@@ -1817,6 +1687,136 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
+static int follow_dotdot_rcu(struct nameidata *nd)
+{
+	struct dentry *parent = NULL;
+	struct inode *inode = nd->inode;
+	unsigned seq;
+
+	while (1) {
+		if (path_equal(&nd->path, &nd->root))
+			break;
+		if (nd->path.dentry != nd->path.mnt->mnt_root) {
+			struct dentry *old = nd->path.dentry;
+
+			parent = old->d_parent;
+			inode = parent->d_inode;
+			seq = read_seqcount_begin(&parent->d_seq);
+			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+				return -ECHILD;
+			if (unlikely(!path_connected(nd->path.mnt, parent)))
+				return -ECHILD;
+			break;
+		} else {
+			struct mount *mnt = real_mount(nd->path.mnt);
+			struct mount *mparent = mnt->mnt_parent;
+			struct dentry *mountpoint = mnt->mnt_mountpoint;
+			struct inode *inode2 = mountpoint->d_inode;
+			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
+			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+				return -ECHILD;
+			if (&mparent->mnt == nd->path.mnt)
+				break;
+			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+				return -ECHILD;
+			/* we know that mountpoint was pinned */
+			nd->path.dentry = mountpoint;
+			nd->path.mnt = &mparent->mnt;
+			inode = inode2;
+			nd->seq = seq;
+		}
+	}
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -ECHILD;
+	} else {
+		nd->path.dentry = parent;
+		nd->seq = seq;
+	}
+	while (unlikely(d_mountpoint(nd->path.dentry))) {
+		struct mount *mounted;
+		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
+		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+			return -ECHILD;
+		if (!mounted)
+			break;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return -ECHILD;
+		nd->path.mnt = &mounted->mnt;
+		nd->path.dentry = mounted->mnt.mnt_root;
+		inode = nd->path.dentry->d_inode;
+		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+	}
+	nd->inode = inode;
+	return 0;
+}
+
+static int follow_dotdot(struct nameidata *nd)
+{
+	struct dentry *parent = NULL;
+	while (1) {
+		if (path_equal(&nd->path, &nd->root))
+			break;
+		if (nd->path.dentry != nd->path.mnt->mnt_root) {
+			/* rare case of legitimate dget_parent()... */
+			parent = dget_parent(nd->path.dentry);
+			if (unlikely(!path_connected(nd->path.mnt, parent))) {
+				dput(parent);
+				return -ENOENT;
+			}
+			break;
+		}
+		if (!follow_up(&nd->path))
+			break;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return -EXDEV;
+	}
+	if (unlikely(!parent)) {
+		if (unlikely(nd->flags & LOOKUP_BENEATH))
+			return -EXDEV;
+	} else {
+		dput(nd->path.dentry);
+		nd->path.dentry = parent;
+	}
+	follow_mount(&nd->path);
+	nd->inode = nd->path.dentry->d_inode;
+	return 0;
+}
+
+static inline int handle_dots(struct nameidata *nd, int type)
+{
+	if (type == LAST_DOTDOT) {
+		int error = 0;
+
+		if (!nd->root.mnt) {
+			error = set_root(nd);
+			if (error)
+				return error;
+		}
+		if (nd->flags & LOOKUP_RCU)
+			error = follow_dotdot_rcu(nd);
+		else
+			error = follow_dotdot(nd);
+		if (error)
+			return error;
+
+		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
+			/*
+			 * If there was a racing rename or mount along our
+			 * path, then we can't be sure that ".." hasn't jumped
+			 * above nd->root (and so userspace should retry or use
+			 * some fallback).
+			 */
+			smp_rmb();
+			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
+				return -EAGAIN;
+			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
+				return -EAGAIN;
+		}
+	}
+	return 0;
+}
+
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 47/69] handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (44 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 46/69] move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 48/69] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
                         ` (25 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Right now the tail ends of follow_dotdot{,_rcu}() are pretty
much the open-coded analogues of step_into().  The differences:
	* the lack of proper LOOKUP_NO_XDEV handling in non-RCU case
(arguably a bug)
	* the lack of ->d_manage() handling (again, arguably a bug)

Adjust the calling conventions so that on the next step with could
just switch those functions to returning step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 52 +++++++++++++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3ee3b8719505..1749e435edc7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1687,7 +1687,7 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static int follow_dotdot_rcu(struct nameidata *nd)
+static const char *follow_dotdot_rcu(struct nameidata *nd)
 {
 	struct dentry *parent = NULL;
 	struct inode *inode = nd->inode;
@@ -1703,9 +1703,9 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			inode = parent->d_inode;
 			seq = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			break;
 		} else {
 			struct mount *mnt = real_mount(nd->path.mnt);
@@ -1714,11 +1714,11 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 			struct inode *inode2 = mountpoint->d_inode;
 			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			if (&mparent->mnt == nd->path.mnt)
 				break;
 			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return -ECHILD;
+				return ERR_PTR(-ECHILD);
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
@@ -1728,7 +1728,7 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 	} else {
 		nd->path.dentry = parent;
 		nd->seq = seq;
@@ -1737,21 +1737,21 @@ static int follow_dotdot_rcu(struct nameidata *nd)
 		struct mount *mounted;
 		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
 		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		if (!mounted)
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -ECHILD;
+			return ERR_PTR(-ECHILD);
 		nd->path.mnt = &mounted->mnt;
 		nd->path.dentry = mounted->mnt.mnt_root;
 		inode = nd->path.dentry->d_inode;
 		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
 	}
 	nd->inode = inode;
-	return 0;
+	return NULL;
 }
 
-static int follow_dotdot(struct nameidata *nd)
+static const char *follow_dotdot(struct nameidata *nd)
 {
 	struct dentry *parent = NULL;
 	while (1) {
@@ -1762,34 +1762,34 @@ static int follow_dotdot(struct nameidata *nd)
 			parent = dget_parent(nd->path.dentry);
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
-				return -ENOENT;
+				return ERR_PTR(-ENOENT);
 			}
 			break;
 		}
 		if (!follow_up(&nd->path))
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return -EXDEV;
+			return ERR_PTR(-EXDEV);
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return -EXDEV;
+			return ERR_PTR(-EXDEV);
 	} else {
 		dput(nd->path.dentry);
 		nd->path.dentry = parent;
 	}
 	follow_mount(&nd->path);
 	nd->inode = nd->path.dentry->d_inode;
-	return 0;
+	return NULL;
 }
 
-static inline int handle_dots(struct nameidata *nd, int type)
+static const char *handle_dots(struct nameidata *nd, int type)
 {
 	if (type == LAST_DOTDOT) {
-		int error = 0;
+		const char *error = NULL;
 
 		if (!nd->root.mnt) {
-			error = set_root(nd);
+			error = ERR_PTR(set_root(nd));
 			if (error)
 				return error;
 		}
@@ -1809,12 +1809,12 @@ static inline int handle_dots(struct nameidata *nd, int type)
 			 */
 			smp_rmb();
 			if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
-				return -EAGAIN;
+				return ERR_PTR(-EAGAIN);
 			if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
-				return -EAGAIN;
+				return ERR_PTR(-EAGAIN);
 		}
 	}
-	return 0;
+	return NULL;
 }
 
 static const char *walk_component(struct nameidata *nd, int flags)
@@ -1822,7 +1822,6 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	struct dentry *dentry;
 	struct inode *inode;
 	unsigned seq;
-	int err;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -1831,8 +1830,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	if (unlikely(nd->last_type != LAST_NORM)) {
 		if (!(flags & WALK_MORE) && nd->depth)
 			put_link(nd);
-		err = handle_dots(nd, nd->last_type);
-		return ERR_PTR(err);
+		return handle_dots(nd, nd->last_type);
 	}
 	dentry = lookup_fast(nd, &inode, &seq);
 	if (IS_ERR(dentry))
@@ -3128,10 +3126,10 @@ static const char *open_last_lookups(struct nameidata *nd,
 	if (nd->last_type != LAST_NORM) {
 		if (nd->depth)
 			put_link(nd);
-		error = handle_dots(nd, nd->last_type);
-		if (likely(!error))
-			error = complete_walk(nd);
-		return ERR_PTR(error);
+		res = handle_dots(nd, nd->last_type);
+		if (likely(!res))
+			res = ERR_PTR(complete_walk(nd));
+		return res;
 	}
 
 	if (!(open_flag & O_CREAT)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 48/69] follow_dotdot{,_rcu}(): switch to use of step_into()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (45 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 47/69] handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 49/69] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
                         ` (24 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

gets the regular mount crossing on result of ..

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 31 +++++++------------------------
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1749e435edc7..9c775013368a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1722,33 +1722,18 @@ static const char *follow_dotdot_rcu(struct nameidata *nd)
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
-			inode = inode2;
+			inode = nd->inode = inode2;
 			nd->seq = seq;
 		}
 	}
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
 			return ERR_PTR(-ECHILD);
+		return step_into(nd, WALK_NOFOLLOW,
+				 nd->path.dentry, nd->inode, nd->seq);
 	} else {
-		nd->path.dentry = parent;
-		nd->seq = seq;
+		return step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
 	}
-	while (unlikely(d_mountpoint(nd->path.dentry))) {
-		struct mount *mounted;
-		mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
-		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-			return ERR_PTR(-ECHILD);
-		if (!mounted)
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return ERR_PTR(-ECHILD);
-		nd->path.mnt = &mounted->mnt;
-		nd->path.dentry = mounted->mnt.mnt_root;
-		inode = nd->path.dentry->d_inode;
-		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
-	}
-	nd->inode = inode;
-	return NULL;
 }
 
 static const char *follow_dotdot(struct nameidata *nd)
@@ -1774,13 +1759,11 @@ static const char *follow_dotdot(struct nameidata *nd)
 	if (unlikely(!parent)) {
 		if (unlikely(nd->flags & LOOKUP_BENEATH))
 			return ERR_PTR(-EXDEV);
+		return step_into(nd, WALK_NOFOLLOW,
+				 dget(nd->path.dentry), nd->inode, nd->seq);
 	} else {
-		dput(nd->path.dentry);
-		nd->path.dentry = parent;
+		return step_into(nd, WALK_NOFOLLOW, parent, parent->d_inode, 0);
 	}
-	follow_mount(&nd->path);
-	nd->inode = nd->path.dentry->d_inode;
-	return NULL;
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 49/69] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (46 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 48/69] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 50/69] follow_dotdot{,_rcu}(): massage loops Al Viro
                         ` (23 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

lift step_into() into handle_dots() (where they merge with each other);
have follow_... return dentry and pass inode/seq to the caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 72 ++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 37 insertions(+), 35 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9c775013368a..2f3d04c6f0b2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1687,31 +1687,29 @@ static const char *step_into(struct nameidata *nd, int flags,
 	return pick_link(nd, &path, inode, seq, flags);
 }
 
-static const char *follow_dotdot_rcu(struct nameidata *nd)
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
+					struct inode **inodep,
+					unsigned *seqp)
 {
-	struct dentry *parent = NULL;
-	struct inode *inode = nd->inode;
-	unsigned seq;
-
 	while (1) {
 		if (path_equal(&nd->path, &nd->root))
 			break;
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
+			struct dentry *parent = old->d_parent;
 
-			parent = old->d_parent;
-			inode = parent->d_inode;
-			seq = read_seqcount_begin(&parent->d_seq);
+			*inodep = parent->d_inode;
+			*seqp = read_seqcount_begin(&parent->d_seq);
 			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
 				return ERR_PTR(-ECHILD);
 			if (unlikely(!path_connected(nd->path.mnt, parent)))
 				return ERR_PTR(-ECHILD);
-			break;
+			return parent;
 		} else {
 			struct mount *mnt = real_mount(nd->path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode2 = mountpoint->d_inode;
+			struct inode *inode = mountpoint->d_inode;
 			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 				return ERR_PTR(-ECHILD);
@@ -1722,54 +1720,51 @@ static const char *follow_dotdot_rcu(struct nameidata *nd)
 			/* we know that mountpoint was pinned */
 			nd->path.dentry = mountpoint;
 			nd->path.mnt = &mparent->mnt;
-			inode = nd->inode = inode2;
+			nd->inode = inode;
 			nd->seq = seq;
 		}
 	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return ERR_PTR(-ECHILD);
-		return step_into(nd, WALK_NOFOLLOW,
-				 nd->path.dentry, nd->inode, nd->seq);
-	} else {
-		return step_into(nd, WALK_NOFOLLOW, parent, inode, seq);
-	}
+	if (unlikely(nd->flags & LOOKUP_BENEATH))
+		return ERR_PTR(-ECHILD);
+	return NULL;
 }
 
-static const char *follow_dotdot(struct nameidata *nd)
+static struct dentry *follow_dotdot(struct nameidata *nd,
+				 struct inode **inodep,
+				 unsigned *seqp)
 {
-	struct dentry *parent = NULL;
 	while (1) {
 		if (path_equal(&nd->path, &nd->root))
 			break;
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			/* rare case of legitimate dget_parent()... */
-			parent = dget_parent(nd->path.dentry);
+			struct dentry *parent = dget_parent(nd->path.dentry);
 			if (unlikely(!path_connected(nd->path.mnt, parent))) {
 				dput(parent);
 				return ERR_PTR(-ENOENT);
 			}
-			break;
+			*seqp = 0;
+			*inodep = parent->d_inode;
+			return parent;
 		}
 		if (!follow_up(&nd->path))
 			break;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-EXDEV);
 	}
-	if (unlikely(!parent)) {
-		if (unlikely(nd->flags & LOOKUP_BENEATH))
-			return ERR_PTR(-EXDEV);
-		return step_into(nd, WALK_NOFOLLOW,
-				 dget(nd->path.dentry), nd->inode, nd->seq);
-	} else {
-		return step_into(nd, WALK_NOFOLLOW, parent, parent->d_inode, 0);
-	}
+	if (unlikely(nd->flags & LOOKUP_BENEATH))
+		return ERR_PTR(-EXDEV);
+	dget(nd->path.dentry);
+	return NULL;
 }
 
 static const char *handle_dots(struct nameidata *nd, int type)
 {
 	if (type == LAST_DOTDOT) {
 		const char *error = NULL;
+		struct dentry *parent;
+		struct inode *inode;
+		unsigned seq;
 
 		if (!nd->root.mnt) {
 			error = ERR_PTR(set_root(nd));
@@ -1777,11 +1772,18 @@ static const char *handle_dots(struct nameidata *nd, int type)
 				return error;
 		}
 		if (nd->flags & LOOKUP_RCU)
-			error = follow_dotdot_rcu(nd);
+			parent = follow_dotdot_rcu(nd, &inode, &seq);
 		else
-			error = follow_dotdot(nd);
-		if (error)
-			return error;
+			parent = follow_dotdot(nd, &inode, &seq);
+		if (IS_ERR(parent))
+			return ERR_CAST(parent);
+		if (unlikely(!parent)) {
+			error = step_into(nd, WALK_NOFOLLOW,
+					 nd->path.dentry, nd->inode, nd->seq);
+		} else {
+			error = step_into(nd, WALK_NOFOLLOW,
+					 parent, inode, seq);
+		}
 
 		if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
 			/*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 50/69] follow_dotdot{,_rcu}(): massage loops
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (47 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 49/69] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 51/69] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
                         ` (22 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The logics in both of them is the same:
	while true
		if in process' root	// uncommon
			break
		if *not* in mount root	// normal case
			find the parent
			return
		if at absolute root	// very uncommon
			break
		move to underlying mountpoint
	report that we are in root

Pull the common path out of the loop:
	if in process' root		// uncommon
		goto in_root
	if unlikely(in mount root)
		while true
			if at absolute root
				goto in_root
			move to underlying mountpoint
			if in process' root
				goto in_root
			if in mount root
				break;
	find the parent	// we are not in mount root
	return
in_root:
	report that we are in root

The reason for that transformation is that we get to keep the
common path straight *and* get a separate block for "move
through underlying mountpoints", which will allow to sanitize
NO_XDEV handling there.  What's more, the pared-down loops
will be easier to deal with - in particular, non-RCU case
has no need to grab mount_lock and rewriting it to the
form that wouldn't do that is a non-trivial change.  Better
do that with less stuff getting in the way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 77 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2f3d04c6f0b2..3d37434cadcc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1691,21 +1691,12 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 					struct inode **inodep,
 					unsigned *seqp)
 {
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			struct dentry *old = nd->path.dentry;
-			struct dentry *parent = old->d_parent;
+	struct dentry *parent, *old;
 
-			*inodep = parent->d_inode;
-			*seqp = read_seqcount_begin(&parent->d_seq);
-			if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
-				return ERR_PTR(-ECHILD);
-			if (unlikely(!path_connected(nd->path.mnt, parent)))
-				return ERR_PTR(-ECHILD);
-			return parent;
-		} else {
+	if (path_equal(&nd->path, &nd->root))
+		goto in_root;
+	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		while (1) {
 			struct mount *mnt = real_mount(nd->path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
@@ -1714,7 +1705,7 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
 				return ERR_PTR(-ECHILD);
 			if (&mparent->mnt == nd->path.mnt)
-				break;
+				goto in_root;
 			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 				return ERR_PTR(-ECHILD);
 			/* we know that mountpoint was pinned */
@@ -1722,8 +1713,22 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 			nd->path.mnt = &mparent->mnt;
 			nd->inode = inode;
 			nd->seq = seq;
+			if (path_equal(&nd->path, &nd->root))
+				goto in_root;
+			if (nd->path.dentry != nd->path.mnt->mnt_root)
+				break;
 		}
 	}
+	old = nd->path.dentry;
+	parent = old->d_parent;
+	*inodep = parent->d_inode;
+	*seqp = read_seqcount_begin(&parent->d_seq);
+	if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+		return ERR_PTR(-ECHILD);
+	if (unlikely(!path_connected(nd->path.mnt, parent)))
+		return ERR_PTR(-ECHILD);
+	return parent;
+in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	return NULL;
@@ -1733,25 +1738,33 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 				 struct inode **inodep,
 				 unsigned *seqp)
 {
-	while (1) {
-		if (path_equal(&nd->path, &nd->root))
-			break;
-		if (nd->path.dentry != nd->path.mnt->mnt_root) {
-			/* rare case of legitimate dget_parent()... */
-			struct dentry *parent = dget_parent(nd->path.dentry);
-			if (unlikely(!path_connected(nd->path.mnt, parent))) {
-				dput(parent);
-				return ERR_PTR(-ENOENT);
-			}
-			*seqp = 0;
-			*inodep = parent->d_inode;
-			return parent;
+	struct dentry *parent;
+
+	if (path_equal(&nd->path, &nd->root))
+		goto in_root;
+	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		while (1) {
+			if (!follow_up(&nd->path))
+				goto in_root;
+			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+				return ERR_PTR(-EXDEV);
+			if (path_equal(&nd->path, &nd->root))
+				goto in_root;
+			if (nd->path.dentry != nd->path.mnt->mnt_root)
+				break;
 		}
-		if (!follow_up(&nd->path))
-			break;
-		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-			return ERR_PTR(-EXDEV);
 	}
+	/* rare case of legitimate dget_parent()... */
+	parent = dget_parent(nd->path.dentry);
+	if (unlikely(!path_connected(nd->path.mnt, parent))) {
+		dput(parent);
+		return ERR_PTR(-ENOENT);
+	}
+	*seqp = 0;
+	*inodep = parent->d_inode;
+	return parent;
+
+in_root:
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-EXDEV);
 	dget(nd->path.dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 51/69] follow_dotdot_rcu(): be lazy about changing nd->path
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (48 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 50/69] follow_dotdot{,_rcu}(): massage loops Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 52/69] follow_dotdot(): " Al Viro
                         ` (21 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Change nd->path only after the loop is done and only in case we hadn't
ended up finding ourselves in root.  Same for NO_XDEV check.  Don't
recheck mount_lock on each step either.

That separates the "check how far back do we need to go through the
mount stack" logics from the rest of .. traversal.

Note that the sequence for d_seq/d_inode here is
	* sample mount_lock seqcount
...
	* sample d_seq
	* fetch d_inode
	* verify mount_lock seqcount
The last step makes sure that d_inode value we'd got matches d_seq -
it dentry is guaranteed to have been a mountpoint through the
entire thing, so its d_inode must have been stable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3d37434cadcc..e569ccdd3513 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1696,28 +1696,31 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		struct path path = nd->path;
+		unsigned seq;
+
 		while (1) {
-			struct mount *mnt = real_mount(nd->path.mnt);
+			struct mount *mnt = real_mount(path.mnt);
 			struct mount *mparent = mnt->mnt_parent;
 			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			struct inode *inode = mountpoint->d_inode;
-			unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
-				return ERR_PTR(-ECHILD);
-			if (&mparent->mnt == nd->path.mnt)
+			seq = read_seqcount_begin(&mountpoint->d_seq);
+			if (&mparent->mnt == path.mnt)
 				goto in_root;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return ERR_PTR(-ECHILD);
-			/* we know that mountpoint was pinned */
-			nd->path.dentry = mountpoint;
-			nd->path.mnt = &mparent->mnt;
-			nd->inode = inode;
-			nd->seq = seq;
-			if (path_equal(&nd->path, &nd->root))
+			path.dentry = mountpoint;
+			path.mnt = &mparent->mnt;
+			if (path_equal(&path, &nd->root))
 				goto in_root;
-			if (nd->path.dentry != nd->path.mnt->mnt_root)
+			if (path.dentry != path.mnt->mnt_root)
 				break;
 		}
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return ERR_PTR(-ECHILD);
+		nd->path = path;
+		nd->inode = path.dentry->d_inode;
+		nd->seq = seq;
+		if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+			return ERR_PTR(-ECHILD);
+		/* we know that mountpoint was pinned */
 	}
 	old = nd->path.dentry;
 	parent = old->d_parent;
@@ -1729,6 +1732,8 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 		return ERR_PTR(-ECHILD);
 	return parent;
 in_root:
+	if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+		return ERR_PTR(-ECHILD);
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
 		return ERR_PTR(-ECHILD);
 	return NULL;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 52/69] follow_dotdot(): be lazy about changing nd->path
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (49 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 51/69] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 53/69] helper for mount rootwards traversal Al Viro
                         ` (20 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Change nd->path only after the loop is done and only in case we hadn't
ended up finding ourselves in root.  Same for NO_XDEV check.

That separates the "check how far back do we need to go through the
mount stack" logics from the rest of .. traversal.

NOTE: path_get/path_put introduced here are temporary.  They will
go away later in the series.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e569ccdd3513..bda303090467 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1748,16 +1748,24 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+		struct path path = nd->path;
+		path_get(&path);
 		while (1) {
-			if (!follow_up(&nd->path))
+			if (!follow_up(&path)) {
+				path_put(&path);
 				goto in_root;
-			if (unlikely(nd->flags & LOOKUP_NO_XDEV))
-				return ERR_PTR(-EXDEV);
-			if (path_equal(&nd->path, &nd->root))
+			}
+			if (path_equal(&path, &nd->root)) {
+				path_put(&path);
 				goto in_root;
-			if (nd->path.dentry != nd->path.mnt->mnt_root)
+			}
+			if (path.dentry != nd->path.mnt->mnt_root)
 				break;
 		}
+		path_put(&nd->path);
+		nd->path = path;
+		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+			return ERR_PTR(-EXDEV);
 	}
 	/* rare case of legitimate dget_parent()... */
 	parent = dget_parent(nd->path.dentry);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 53/69] helper for mount rootwards traversal
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (50 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 52/69] follow_dotdot(): " Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 54/69] non-RCU analogue of the previous commit Al Viro
                         ` (19 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The loops in follow_dotdot{_rcu()} are doing the same thing:
we have a mount and we want to find out how far up the chain
of mounts do we need to go.

We follow the chain of mount until we find one that is not
directly overmounting the root of another mount.  If such
a mount is found, we want the location it's mounted upon.
If we run out of chain (i.e. get to a mount that is not
mounted on anything else) or run into process' root, we
report failure.

On success, we want (in RCU case) d_seq of resulting location
sampled or (in non-RCU case) references to that location
acquired.

This commit introduces such primitive for RCU case and
switches follow_dotdot_rcu() to it; non-RCU case will be
go in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bda303090467..caa109941788 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1363,6 +1363,26 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
 	return ret;
 }
 
+static bool choose_mountpoint_rcu(struct mount *m, const struct path *root,
+				  struct path *path, unsigned *seqp)
+{
+	while (mnt_has_parent(m)) {
+		struct dentry *mountpoint = m->mnt_mountpoint;
+
+		m = m->mnt_parent;
+		if (unlikely(root->dentry == mountpoint &&
+			     root->mnt == &m->mnt))
+			break;
+		if (mountpoint != m->mnt.mnt_root) {
+			path->mnt = &m->mnt;
+			path->dentry = mountpoint;
+			*seqp = read_seqcount_begin(&mountpoint->d_seq);
+			return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1696,23 +1716,11 @@ static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
-		struct path path = nd->path;
+		struct path path;
 		unsigned seq;
-
-		while (1) {
-			struct mount *mnt = real_mount(path.mnt);
-			struct mount *mparent = mnt->mnt_parent;
-			struct dentry *mountpoint = mnt->mnt_mountpoint;
-			seq = read_seqcount_begin(&mountpoint->d_seq);
-			if (&mparent->mnt == path.mnt)
-				goto in_root;
-			path.dentry = mountpoint;
-			path.mnt = &mparent->mnt;
-			if (path_equal(&path, &nd->root))
-				goto in_root;
-			if (path.dentry != path.mnt->mnt_root)
-				break;
-		}
+		if (!choose_mountpoint_rcu(real_mount(nd->path.mnt),
+					   &nd->root, &path, &seq))
+			goto in_root;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-ECHILD);
 		nd->path = path;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 54/69] non-RCU analogue of the previous commit
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (51 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 53/69] helper for mount rootwards traversal Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 55/69] fs/namei.c: kill follow_mount() Al Viro
                         ` (18 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

new helper: choose_mountpoint().  Wrapper around choose_mountpoint_rcu(),
similar to lookup_mnt() vs. __lookup_mnt().  follow_dotdot() switched to
it.  Now we don't grab mount_lock exclusive anymore; note that the
primitive used non-RCU mount traversals in other direction (lookup_mnt())
doesn't bother with that either - it uses mount_lock seqcount instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index caa109941788..a9622e629e76 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -605,10 +605,9 @@ static void terminate_walk(struct nameidata *nd)
 }
 
 /* path_put is needed afterwards regardless of success or failure */
-static bool legitimize_path(struct nameidata *nd,
-			    struct path *path, unsigned seq)
+static bool __legitimize_path(struct path *path, unsigned seq, unsigned mseq)
 {
-	int res = __legitimize_mnt(path->mnt, nd->m_seq);
+	int res = __legitimize_mnt(path->mnt, mseq);
 	if (unlikely(res)) {
 		if (res > 0)
 			path->mnt = NULL;
@@ -622,6 +621,12 @@ static bool legitimize_path(struct nameidata *nd,
 	return !read_seqcount_retry(&path->dentry->d_seq, seq);
 }
 
+static inline bool legitimize_path(struct nameidata *nd,
+			    struct path *path, unsigned seq)
+{
+	return __legitimize_path(path, nd->m_seq, seq);
+}
+
 static bool legitimize_links(struct nameidata *nd)
 {
 	int i;
@@ -1383,6 +1388,31 @@ static bool choose_mountpoint_rcu(struct mount *m, const struct path *root,
 	return false;
 }
 
+static bool choose_mountpoint(struct mount *m, const struct path *root,
+			      struct path *path)
+{
+	bool found;
+
+	rcu_read_lock();
+	while (1) {
+		unsigned seq, mseq = read_seqbegin(&mount_lock);
+
+		found = choose_mountpoint_rcu(m, root, path, &seq);
+		if (unlikely(!found)) {
+			if (!read_seqretry(&mount_lock, mseq))
+				break;
+		} else {
+			if (likely(__legitimize_path(path, seq, mseq)))
+				break;
+			rcu_read_unlock();
+			path_put(path);
+			rcu_read_lock();
+		}
+	}
+	rcu_read_unlock();
+	return found;
+}
+
 /*
  * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
  */
@@ -1756,22 +1786,14 @@ static struct dentry *follow_dotdot(struct nameidata *nd,
 	if (path_equal(&nd->path, &nd->root))
 		goto in_root;
 	if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
-		struct path path = nd->path;
-		path_get(&path);
-		while (1) {
-			if (!follow_up(&path)) {
-				path_put(&path);
-				goto in_root;
-			}
-			if (path_equal(&path, &nd->root)) {
-				path_put(&path);
-				goto in_root;
-			}
-			if (path.dentry != nd->path.mnt->mnt_root)
-				break;
-		}
+		struct path path;
+
+		if (!choose_mountpoint(real_mount(nd->path.mnt),
+				       &nd->root, &path))
+			goto in_root;
 		path_put(&nd->path);
 		nd->path = path;
+		nd->inode = path.dentry->d_inode;
 		if (unlikely(nd->flags & LOOKUP_NO_XDEV))
 			return ERR_PTR(-EXDEV);
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 55/69] fs/namei.c: kill follow_mount()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (52 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 54/69] non-RCU analogue of the previous commit Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 56/69] pick_link(): more straightforward handling of allocation failures Al Viro
                         ` (17 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

The only remaining caller (path_pts()) should be using follow_down()
anyway.  And clean path_pts() a bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 22 ++--------------------
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a9622e629e76..fc946d6f02d6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1414,22 +1414,6 @@ static bool choose_mountpoint(struct mount *m, const struct path *root,
 }
 
 /*
- * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
- */
-static void follow_mount(struct path *path)
-{
-	while (d_mountpoint(path->dentry)) {
-		struct vfsmount *mounted = lookup_mnt(path);
-		if (!mounted)
-			break;
-		dput(path->dentry);
-		mntput(path->mnt);
-		path->mnt = mounted;
-		path->dentry = dget(mounted->mnt_root);
-	}
-}
-
-/*
  * This looks up the name in dcache and possibly revalidates the found dentry.
  * NULL is returned if the dentry does not exist in the cache.
  */
@@ -2639,7 +2623,7 @@ int path_pts(struct path *path)
 	 */
 	struct dentry *parent = dget_parent(path->dentry);
 	struct dentry *child;
-	struct qstr this;
+	struct qstr this = QSTR_INIT("pts", 3);
 
 	if (unlikely(!path_connected(path->mnt, parent))) {
 		dput(parent);
@@ -2647,15 +2631,13 @@ int path_pts(struct path *path)
 	}
 	dput(path->dentry);
 	path->dentry = parent;
-	this.name = "pts";
-	this.len = 3;
 	child = d_hash_and_lookup(parent, &this);
 	if (!child)
 		return -ENOENT;
 
 	path->dentry = child;
 	dput(parent);
-	follow_mount(path);
+	follow_down(path);
 	return 0;
 }
 #endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 56/69] pick_link(): more straightforward handling of allocation failures
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (53 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 55/69] fs/namei.c: kill follow_mount() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 57/69] pick_link(): pass it struct path already with normal refcounting rules Al Viro
                         ` (16 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

pick_link() needs to push onto stack; we start with using two-element
array embedded into struct nameidata and the first time we need
more than that we switch to separately allocated array.

Allocation can fail, of course, and handling of that would be simple
enough - we need to drop 'link' and bugger off.  However, the things
get more complicated in RCU mode.  There we must do GFP_ATOMIC
allocation.  If that fails, we try to switch to non-RCU mode and
repeat the allocation.

To switch to non-RCU mode we need to grab references to 'link' and
to everything in nameidata.  The latter done by unlazy_walk();
the former - legitimize_path().  'link' must go first - after
unlazy_walk() we are out of RCU-critical period and it's too
late to call legitimize_path() since the references in link->mnt
and link->dentry might be pointing to freed and reused memory.

So we do legitimize_path(), then unlazy_walk().  And that's where
it gets too subtle: what to do if the former fails?  We MUST
do path_put(link) to avoid leaks.  And we can't do that under
rcu_read_lock().  Solution in mainline was to empty then nameidata
manually, drop out of RCU mode and then do put_path().

In effect, we open-code the things eventual terminate_walk()
would've done on error in RCU mode.  That looks badly out of place
and confusing.  We could add a comment along the lines of the
explanation above, but... there's a simpler solution.  Call
unlazy_walk() even if legitimaze_path() fails.  It will take
us out of RCU mode, so we'll be able to do path_put(link).

Yes, it will do unnecessary work - attempt to grab references
on the stuff in nameidata, only to have them dropped as soon
as we return the error to upper layer and get terminate_walk()
called there.  So what?  We are thoroughly off the fast path
by that point - we had GFP_ATOMIC allocation fail, we had
->d_seq or mount_lock mismatch and we are about to try walking
the same path from scratch in non-RCU mode.  Which will need
to do the same allocation, this time with GFP_KERNEL, so it will
be able to apply memory pressure for blocking stuff.

Compared to that the cost of several lockref_get_not_dead()
is noise.  And the logics become much easier to understand
that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index fc946d6f02d6..a2b5dbe432d6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1612,14 +1612,13 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	error = nd_alloc_stack(nd);
 	if (unlikely(error)) {
 		if (error == -ECHILD) {
-			if (unlikely(!legitimize_path(nd, link, seq))) {
-				drop_links(nd);
-				nd->depth = 0;
-				nd->flags &= ~LOOKUP_RCU;
-				nd->path.mnt = NULL;
-				nd->path.dentry = NULL;
-				rcu_read_unlock();
-			} else if (likely(unlazy_walk(nd)) == 0)
+			// we must grab link first
+			bool grabbed_link = legitimize_path(nd, link, seq);
+			// ... and we must unlazy to be able to clean up
+			error = unlazy_walk(nd);
+			if (unlikely(!grabbed_link))
+				error = -ECHILD;
+			if (!error)
 				error = nd_alloc_stack(nd);
 		}
 		if (error) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 57/69] pick_link(): pass it struct path already with normal refcounting rules
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (54 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 56/69] pick_link(): more straightforward handling of allocation failures Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 58/69] fold path_to_nameidata() into its only remaining caller Al Viro
                         ` (15 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

step_into() tries to avoid grabbing and dropping mount references
on the steps that do not involve crossing mountpoints (which is
obviously the majority of cases).  So it uses a local struct path
with unusual refcounting rules - path.mnt is pinned if and only if
it's not equal to nd->path.mnt.

We used to have similar beasts all over the place and we had quite
a few bugs crop up in their handling - it's easy to get confused
when changing e.g. cleanup on failure exits (or adding a new check,
etc.)

Now that's mostly gone - the step_into() instance (which is what
we need them for) is the only one left.  It is exposed to mount
traversal and it's (shortly) seen by pick_link().  Since pick_link()
needs to store it in link stack, where the normal rules apply,
it has to make sure that mount is pinned regardless of nd->path.mnt
value.  That's done on all calls of pick_link() and very early
in those.  Let's do that in the caller (step_into()) instead -
that way the fewer places need to be aware of such struct path
instances.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a2b5dbe432d6..a94ff3d58b51 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1602,13 +1602,10 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 	int error;
 
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
-		path_to_nameidata(link, nd);
+		if (!(nd->flags & LOOKUP_RCU))
+			path_put(link);
 		return ERR_PTR(-ELOOP);
 	}
-	if (!(nd->flags & LOOKUP_RCU)) {
-		if (link->mnt == nd->path.mnt)
-			mntget(link->mnt);
-	}
 	error = nd_alloc_stack(nd);
 	if (unlikely(error)) {
 		if (error == -ECHILD) {
@@ -1712,10 +1709,13 @@ static const char *step_into(struct nameidata *nd, int flags,
 		nd->seq = seq;
 		return NULL;
 	}
-	/* make sure that d_is_symlink above matches inode */
 	if (nd->flags & LOOKUP_RCU) {
+		/* make sure that d_is_symlink above matches inode */
 		if (read_seqcount_retry(&path.dentry->d_seq, seq))
 			return ERR_PTR(-ECHILD);
+	} else {
+		if (path.mnt == nd->path.mnt)
+			mntget(path.mnt);
 	}
 	return pick_link(nd, &path, inode, seq, flags);
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 58/69] fold path_to_nameidata() into its only remaining caller
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (55 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 57/69] pick_link(): pass it struct path already with normal refcounting rules Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 59/69] pick_link(): take reserving space on stack into a new helper Al Viro
                         ` (14 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a94ff3d58b51..d58e447db2f1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -860,18 +860,6 @@ static int set_root(struct nameidata *nd)
 	return 0;
 }
 
-static inline void path_to_nameidata(const struct path *path,
-					struct nameidata *nd)
-{
-	if (!(nd->flags & LOOKUP_RCU)) {
-		dput(nd->path.dentry);
-		if (nd->path.mnt != path->mnt)
-			mntput(nd->path.mnt);
-	}
-	nd->path.mnt = path->mnt;
-	nd->path.dentry = path->dentry;
-}
-
 static int nd_jump_root(struct nameidata *nd)
 {
 	if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1704,7 +1692,12 @@ static const char *step_into(struct nameidata *nd, int flags,
 	   ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
 	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
-		path_to_nameidata(&path, nd);
+		if (!(nd->flags & LOOKUP_RCU)) {
+			dput(nd->path.dentry);
+			if (nd->path.mnt != path.mnt)
+				mntput(nd->path.mnt);
+		}
+		nd->path = path;
 		nd->inode = inode;
 		nd->seq = seq;
 		return NULL;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 59/69] pick_link(): take reserving space on stack into a new helper
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (56 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 58/69] fold path_to_nameidata() into its only remaining caller Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 60/69] reserve_stack(): switch to __nd_alloc_stack() Al Viro
                         ` (13 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d58e447db2f1..c27f657c6b71 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1580,6 +1580,28 @@ static inline int may_lookup(struct nameidata *nd)
 	return inode_permission(nd->inode, MAY_EXEC);
 }
 
+static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+{
+	int error;
+
+	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
+		return -ELOOP;
+	error = nd_alloc_stack(nd);
+	if (likely(!error))
+		return 0;
+	if (error == -ECHILD) {
+		// we must grab link first
+		bool grabbed_link = legitimize_path(nd, link, seq);
+		// ... and we must unlazy to be able to clean up
+		error = unlazy_walk(nd);
+		if (unlikely(!grabbed_link))
+			error = -ECHILD;
+		if (!error)
+			error = nd_alloc_stack(nd);
+	}
+	return error;
+}
+
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
 
 static const char *pick_link(struct nameidata *nd, struct path *link,
@@ -1587,31 +1609,13 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 {
 	struct saved *last;
 	const char *res;
-	int error;
+	int error = reserve_stack(nd, link, seq);
 
-	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
-		if (!(nd->flags & LOOKUP_RCU))
-			path_put(link);
-		return ERR_PTR(-ELOOP);
-	}
-	error = nd_alloc_stack(nd);
 	if (unlikely(error)) {
-		if (error == -ECHILD) {
-			// we must grab link first
-			bool grabbed_link = legitimize_path(nd, link, seq);
-			// ... and we must unlazy to be able to clean up
-			error = unlazy_walk(nd);
-			if (unlikely(!grabbed_link))
-				error = -ECHILD;
-			if (!error)
-				error = nd_alloc_stack(nd);
-		}
-		if (error) {
+		if (!(nd->flags & LOOKUP_RCU))
 			path_put(link);
-			return ERR_PTR(error);
-		}
+		return ERR_PTR(error);
 	}
-
 	last = nd->stack + nd->depth++;
 	last->link = *link;
 	clear_delayed_call(&last->done);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 60/69] reserve_stack(): switch to __nd_alloc_stack()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (57 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 59/69] pick_link(): take reserving space on stack into a new helper Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 61/69] __nd_alloc_stack(): make it return bool Al Viro
                         ` (12 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

expand the call of nd_alloc_stack() into it (and don't
recheck the depth on the second call)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c27f657c6b71..01ffbc58e761 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -566,15 +566,6 @@ static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
 	return is_subdir(dentry, mnt->mnt_root);
 }
 
-static inline int nd_alloc_stack(struct nameidata *nd)
-{
-	if (likely(nd->depth != EMBEDDED_LEVELS))
-		return 0;
-	if (likely(nd->stack != nd->internal))
-		return 0;
-	return __nd_alloc_stack(nd);
-}
-
 static void drop_links(struct nameidata *nd)
 {
 	int i = nd->depth;
@@ -1586,7 +1577,13 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
-	error = nd_alloc_stack(nd);
+
+	if (likely(nd->depth != EMBEDDED_LEVELS))
+		return 0;
+	if (likely(nd->stack != nd->internal))
+		return 0;
+
+	error = __nd_alloc_stack(nd);
 	if (likely(!error))
 		return 0;
 	if (error == -ECHILD) {
@@ -1597,7 +1594,7 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 		if (unlikely(!grabbed_link))
 			error = -ECHILD;
 		if (!error)
-			error = nd_alloc_stack(nd);
+			error = __nd_alloc_stack(nd);
 	}
 	return error;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 61/69] __nd_alloc_stack(): make it return bool
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (58 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 60/69] reserve_stack(): switch to __nd_alloc_stack() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 62/69] link_path_walk(): sample parent's i_uid and i_mode for the last component Al Viro
                         ` (11 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

... and adjust the caller (reserve_stack()).  Rename to nd_alloc_stack(),
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 45 ++++++++++++++++++---------------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 01ffbc58e761..c4b6e3c969b7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -529,24 +529,17 @@ static void restore_nameidata(void)
 		kfree(now->stack);
 }
 
-static int __nd_alloc_stack(struct nameidata *nd)
+static bool nd_alloc_stack(struct nameidata *nd)
 {
 	struct saved *p;
 
-	if (nd->flags & LOOKUP_RCU) {
-		p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
-				  GFP_ATOMIC);
-		if (unlikely(!p))
-			return -ECHILD;
-	} else {
-		p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
-				  GFP_KERNEL);
-		if (unlikely(!p))
-			return -ENOMEM;
-	}
+	p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
+			 nd->flags & LOOKUP_RCU ? GFP_ATOMIC : GFP_KERNEL);
+	if (unlikely(!p))
+		return false;
 	memcpy(p, nd->internal, sizeof(nd->internal));
 	nd->stack = p;
-	return 0;
+	return true;
 }
 
 /**
@@ -1573,8 +1566,6 @@ static inline int may_lookup(struct nameidata *nd)
 
 static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 {
-	int error;
-
 	if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
 		return -ELOOP;
 
@@ -1582,21 +1573,21 @@ static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
 		return 0;
 	if (likely(nd->stack != nd->internal))
 		return 0;
-
-	error = __nd_alloc_stack(nd);
-	if (likely(!error))
+	if (likely(nd_alloc_stack(nd)))
 		return 0;
-	if (error == -ECHILD) {
-		// we must grab link first
+
+	if (nd->flags & LOOKUP_RCU) {
+		// we need to grab link before we do unlazy.  And we can't skip
+		// unlazy even if we fail to grab the link - cleanup needs it
 		bool grabbed_link = legitimize_path(nd, link, seq);
-		// ... and we must unlazy to be able to clean up
-		error = unlazy_walk(nd);
-		if (unlikely(!grabbed_link))
-			error = -ECHILD;
-		if (!error)
-			error = __nd_alloc_stack(nd);
+
+		if (unlazy_walk(nd) != 0 || !grabbed_link)
+			return -ECHILD;
+
+		if (nd_alloc_stack(nd))
+			return 0;
 	}
-	return error;
+	return -ENOMEM;
 }
 
 enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 62/69] link_path_walk(): sample parent's i_uid and i_mode for the last component
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (59 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 61/69] __nd_alloc_stack(): make it return bool Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 63/69] take post-lookup part of do_last() out of loop Al Viro
                         ` (10 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c4b6e3c969b7..aa1a74c5f52d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -505,6 +505,8 @@ struct nameidata {
 	struct nameidata *saved;
 	unsigned	root_seq;
 	int		dfd;
+	kuid_t		dir_uid;
+	umode_t		dir_mode;
 } __randomize_layout;
 
 static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
@@ -938,9 +940,6 @@ int sysctl_protected_regular __read_mostly;
  */
 static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
 {
-	const struct inode *parent;
-	kuid_t puid;
-
 	if (!sysctl_protected_symlinks)
 		return 0;
 
@@ -949,13 +948,11 @@ static inline int may_follow_link(struct nameidata *nd, const struct inode *inod
 		return 0;
 
 	/* Allowed if parent directory not sticky and world-writable. */
-	parent = nd->inode;
-	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
+	if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	puid = parent->i_uid;
-	if (uid_valid(puid) && uid_eq(puid, inode->i_uid))
+	if (uid_valid(nd->dir_uid) && uid_eq(nd->dir_uid, inode->i_uid))
 		return 0;
 
 	if (nd->flags & LOOKUP_RCU)
@@ -2158,6 +2155,8 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 OK:
 			/* pathname or trailing symlink, done */
 			if (!depth) {
+				nd->dir_uid = nd->inode->i_uid;
+				nd->dir_mode = nd->inode->i_mode;
 				nd->flags &= ~LOOKUP_PARENT;
 				return 0;
 			}
@@ -3223,8 +3222,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 static const char *do_last(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
-	kuid_t dir_uid = nd->inode->i_uid;
-	umode_t dir_mode = nd->inode->i_mode;
 	int open_flag = op->open_flag;
 	bool do_truncate;
 	int acc_mode;
@@ -3240,7 +3237,7 @@ static const char *do_last(struct nameidata *nd,
 	if (open_flag & O_CREAT) {
 		if (d_is_dir(nd->path.dentry))
 			return ERR_PTR(-EISDIR);
-		error = may_create_in_sticky(dir_mode, dir_uid,
+		error = may_create_in_sticky(nd->dir_mode, nd->dir_uid,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
 			return ERR_PTR(error);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 63/69] take post-lookup part of do_last() out of loop
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (60 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 62/69] link_path_walk(): sample parent's i_uid and i_mode for the last component Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 64/69] open_last_lookups(): consolidate fsnotify_create() calls Al Viro
                         ` (9 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

now we can have open_last_lookups() directly from the loop in
path_openat() - the rest of do_last() never returns a symlink
to follow, so we can bloody well leave the loop first.

Rename the rest of that thing from do_last() to do_open() and
make it return an int.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index aa1a74c5f52d..da64fa0b2f6d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3219,31 +3219,26 @@ static const char *open_last_lookups(struct nameidata *nd,
 /*
  * Handle the last step of open()
  */
-static const char *do_last(struct nameidata *nd,
+static int do_open(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
 	int open_flag = op->open_flag;
 	bool do_truncate;
 	int acc_mode;
-	const char *link;
 	int error;
 
-	link = open_last_lookups(nd, file, op);
-	if (unlikely(link))
-		return link;
-
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		if (d_is_dir(nd->path.dentry))
-			return ERR_PTR(-EISDIR);
+			return -EISDIR;
 		error = may_create_in_sticky(nd->dir_mode, nd->dir_uid,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
-			return ERR_PTR(error);
+			return error;
 	}
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
-		return ERR_PTR(-ENOTDIR);
+		return -ENOTDIR;
 
 	do_truncate = false;
 	acc_mode = op->acc_mode;
@@ -3254,7 +3249,7 @@ static const char *do_last(struct nameidata *nd,
 	} else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
 		error = mnt_want_write(nd->path.mnt);
 		if (error)
-			return ERR_PTR(error);
+			return error;
 		do_truncate = true;
 	}
 	error = may_open(&nd->path, acc_mode, open_flag);
@@ -3270,7 +3265,7 @@ static const char *do_last(struct nameidata *nd,
 	}
 	if (do_truncate)
 		mnt_drop_write(nd->path.mnt);
-	return ERR_PTR(error);
+	return error;
 }
 
 struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
@@ -3373,8 +3368,10 @@ static struct file *path_openat(struct nameidata *nd,
 	} else {
 		const char *s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
-		       (s = do_last(nd, file, op)) != NULL)
+		       (s = open_last_lookups(nd, file, op)) != NULL)
 			;
+		if (!error)
+			error = do_open(nd, file, op);
 		terminate_walk(nd);
 	}
 	if (likely(!error)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 64/69] open_last_lookups(): consolidate fsnotify_create() calls
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (61 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 63/69] take post-lookup part of do_last() out of loop Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 65/69] open_last_lookups(): don't abuse complete_walk() when all we want is unlazy Al Viro
                         ` (8 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index da64fa0b2f6d..a86ee06e637d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2956,8 +2956,6 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 				dput(dentry);
 				dentry = dget(file->f_path.dentry);
 			}
-			if (file->f_mode & FMODE_CREATED)
-				fsnotify_create(dir, dentry);
 		} else if (WARN_ON(file->f_path.dentry == DENTRY_NOT_SET)) {
 			error = -EIO;
 		} else {
@@ -2965,8 +2963,6 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 				dput(dentry);
 				dentry = file->f_path.dentry;
 			}
-			if (file->f_mode & FMODE_CREATED)
-				fsnotify_create(dir, dentry);
 			if (unlikely(d_is_negative(dentry)))
 				error = -ENOENT;
 		}
@@ -3102,7 +3098,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 						open_flag & O_EXCL);
 		if (error)
 			goto out_dput;
-		fsnotify_create(dir_inode, dentry);
 	}
 	if (unlikely(create_error) && !dentry->d_inode) {
 		error = create_error;
@@ -3181,6 +3176,8 @@ static const char *open_last_lookups(struct nameidata *nd,
 	else
 		inode_lock_shared(dir->d_inode);
 	dentry = lookup_open(nd, file, op, got_write);
+	if (!IS_ERR(dentry) && (file->f_mode & FMODE_CREATED))
+		fsnotify_create(dir->d_inode, dentry);
 	if (open_flag & O_CREAT)
 		inode_unlock(dir->d_inode);
 	else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 65/69] open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (62 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 64/69] open_last_lookups(): consolidate fsnotify_create() calls Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 66/69] open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() Al Viro
                         ` (7 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a86ee06e637d..7269da642391 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3146,15 +3146,11 @@ static const char *open_last_lookups(struct nameidata *nd,
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
 		/* create side of things */
-		/*
-		 * This will *only* deal with leaving RCU mode - LOOKUP_JUMPED
-		 * has been cleared when we got to the last component we are
-		 * about to look up
-		 */
-		error = complete_walk(nd);
-		if (unlikely(error))
-			return ERR_PTR(error);
-
+		if (nd->flags & LOOKUP_RCU) {
+			error = unlazy_walk(nd);
+			if (unlikely(error))
+				return ERR_PTR(error);
+		}
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 66/69] open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (63 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 65/69] open_last_lookups(): don't abuse complete_walk() when all we want is unlazy Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 67/69] open_last_lookups(): move complete_walk() " Al Viro
                         ` (6 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one
of the ways out of open_last_lookups().  There are 4 cases:
	1) the last component is . or ..; check is not done.
	2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open();
check is not done.
	3) symlink to be traversed is found; check is not done (nor
should it be)
	4) everything else: check done (before complete_walk(), even).

In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's
	open("/tmp/.", O_CREAT|O_EXCL, 0600)
Note that in the same conditions
	open("/tmp", O_CREAT|O_EXCL, 0600)
would have yielded EEXIST.  Either error is allowed, switching to -EEXIST
in these cases would've been more consistent.

Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the
object hadn't existed prior to the call.  The check should not be done in
such a case.  The rest is problematic, though - we have
	FMODE_OPENED set (i.e. it went through ->atomic_open() and got
successfully opened there)
	FMODE_CREATED is *NOT* set
	O_CREAT and O_EXCL are both set.
Any such case is a bug - either we failed to set FMODE_CREATED when we
had, in fact, created an object (no such instances in the tree) or
we have opened a pre-existing file despite having had both O_CREAT and
O_EXCL passed.  One of those was, in fact caught (and fixed) while
sorting out this mess (gfs2 on cold dcache).  And in such situations
we should fail with EEXIST.

Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing
in handle_dots() to set it, for (4) we'd explicitly checked that.

And (1), (2) and (4) are exactly the cases when we leave the loop in
the caller, with do_open() called immediately after that loop.  IOW, we
can move the check over there, and make it

	If we have O_CREAT|O_EXCL and after successful pathname resolution
FMODE_CREATED is *not* set, we must have run into a preexisting file and
should fail with EEXIST.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7269da642391..3e1573373773 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3200,11 +3200,6 @@ static const char *open_last_lookups(struct nameidata *nd,
 		return res;
 	}
 
-	if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
-		audit_inode(nd->name, nd->path.dentry, 0);
-		return ERR_PTR(-EEXIST);
-	}
-
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 	return ERR_PTR(complete_walk(nd));
 }
@@ -3223,6 +3218,8 @@ static int do_open(struct nameidata *nd,
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
+		if ((open_flag & O_EXCL) && !(file->f_mode & FMODE_CREATED))
+			return -EEXIST;
 		if (d_is_dir(nd->path.dentry))
 			return -EISDIR;
 		error = may_create_in_sticky(nd->dir_mode, nd->dir_uid,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 67/69] open_last_lookups(): move complete_walk() into do_open()
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (64 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 66/69] open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 68/69] atomic_open(): no need to pass struct open_flags anymore Al Viro
                         ` (5 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3e1573373773..88ff59dcfd47 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3127,10 +3127,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	if (nd->last_type != LAST_NORM) {
 		if (nd->depth)
 			put_link(nd);
-		res = handle_dots(nd, nd->last_type);
-		if (likely(!res))
-			res = ERR_PTR(complete_walk(nd));
-		return res;
+		return handle_dots(nd, nd->last_type);
 	}
 
 	if (!(open_flag & O_CREAT)) {
@@ -3195,13 +3192,9 @@ static const char *open_last_lookups(struct nameidata *nd,
 	if (nd->depth)
 		put_link(nd);
 	res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
-	if (unlikely(res)) {
+	if (unlikely(res))
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-		return res;
-	}
-
-	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
-	return ERR_PTR(complete_walk(nd));
+	return res;
 }
 
 /*
@@ -3215,6 +3208,11 @@ static int do_open(struct nameidata *nd,
 	int acc_mode;
 	int error;
 
+	if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
+		error = complete_walk(nd);
+		if (error)
+			return error;
+	}
 	if (!(file->f_mode & FMODE_CREATED))
 		audit_inode(nd->name, nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 68/69] atomic_open(): no need to pass struct open_flags anymore
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (65 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 67/69] open_last_lookups(): move complete_walk() " Al Viro
@ 2020-03-13 23:53       ` Al Viro
  2020-03-13 23:53       ` [RFC][PATCH v4 69/69] lookup_open(): don't bother with fallbacks to lookup+create Al Viro
                         ` (4 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

argument had been unused since 1643b43fbd052 (lookup_open(): lift the
"fallback to !O_CREAT" logics from atomic_open()) back in 2016

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 88ff59dcfd47..36b15f5b09bd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2932,7 +2932,6 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
  */
 static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 				  struct file *file,
-				  const struct open_flags *op,
 				  int open_flag, umode_t mode)
 {
 	struct dentry *const DENTRY_NOT_SET = (void *) -1UL;
@@ -3065,7 +3064,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	}
 
 	if (dir_inode->i_op->atomic_open) {
-		dentry = atomic_open(nd, dentry, file, op, open_flag, mode);
+		dentry = atomic_open(nd, dentry, file, open_flag, mode);
 		if (unlikely(create_error) && dentry == ERR_PTR(-ENOENT))
 			dentry = ERR_PTR(create_error);
 		return dentry;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* [RFC][PATCH v4 69/69] lookup_open(): don't bother with fallbacks to lookup+create
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
                         ` (66 preceding siblings ...)
  2020-03-13 23:53       ` [RFC][PATCH v4 68/69] atomic_open(): no need to pass struct open_flags anymore Al Viro
@ 2020-03-13 23:53       ` Al Viro
       [not found]       ` <20200315094148.12872-1-hdanton@sina.com>
                         ` (3 subsequent siblings)
  71 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-13 23:53 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Linus Torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

We fall back to lookup+create (instead of atomic_open) in several cases:
	1) we don't have write access to filesystem and O_TRUNC is
present in the flags.  It's not something we want ->atomic_open() to
see - it just might go ahead and truncate the file.  However, we can
pass it the flags sans O_TRUNC - eventually do_open() will call
handle_truncate() anyway.
	2) we have O_CREAT | O_EXCL and we can't write to parent.
That's going to be an error, of course, but we want to know _which_
error should that be - might be EEXIST (if file exists), might be
EACCES or EROFS.  Simply stripping O_CREAT (and checking if we see
ENOENT) would suffice, if not for O_EXCL.  However, we used to have
->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL
on existing file and just stripping O_CREAT would've disarmed
those checks.  With nothing downstream to catch the problem -
FMODE_OPENED used to be "don't bother with EEXIST checks,
->atomic_open() has done those".  Now EEXIST checks downstream
are skipped only if FMODE_CREATED is set - FMODE_OPENED alone
is not enough.  That has eliminated the need to fall back onto
lookup+create path in this case.
	3) O_WRONLY or O_RDWR when we have no write access to
filesystem, with nothing else objectionable.  Fallback is
(and had always been) pointless.

IOW, we don't really need that fallback; all we need in such
cases is to trim O_TRUNC and O_CREAT properly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 36b15f5b09bd..f4ac81b60a95 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2938,9 +2938,6 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	struct inode *dir =  nd->path.dentry->d_inode;
 	int error;
 
-	if (!(~open_flag & (O_EXCL | O_CREAT)))	/* both O_EXCL and O_CREAT */
-		open_flag &= ~O_TRUNC;
-
 	if (nd->flags & LOOKUP_DIRECTORY)
 		open_flag |= O_DIRECTORY;
 
@@ -3037,32 +3034,20 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	 * Another problem is returing the "right" error value (e.g. for an
 	 * O_EXCL open we want to return EEXIST not EROFS).
 	 */
+	if (unlikely(!got_write))
+		open_flag &= ~O_TRUNC;
 	if (open_flag & O_CREAT) {
+		if (open_flag & O_EXCL)
+			open_flag &= ~O_TRUNC;
 		if (!IS_POSIXACL(dir->d_inode))
 			mode &= ~current_umask();
-		if (unlikely(!got_write)) {
-			create_error = -EROFS;
-			open_flag &= ~O_CREAT;
-			if (open_flag & (O_EXCL | O_TRUNC))
-				goto no_open;
-			/* No side effects, safe to clear O_CREAT */
-		} else {
+		if (likely(got_write))
 			create_error = may_o_create(&nd->path, dentry, mode);
-			if (create_error) {
-				open_flag &= ~O_CREAT;
-				if (open_flag & O_EXCL)
-					goto no_open;
-			}
-		}
-	} else if ((open_flag & (O_TRUNC|O_WRONLY|O_RDWR)) &&
-		   unlikely(!got_write)) {
-		/*
-		 * No O_CREATE -> atomicity not a requirement -> fall
-		 * back to lookup + open
-		 */
-		goto no_open;
+		else
+			create_error = -EROFS;
 	}
-
+	if (create_error)
+		open_flag &= ~O_CREAT;
 	if (dir_inode->i_op->atomic_open) {
 		dentry = atomic_open(nd, dentry, file, open_flag, mode);
 		if (unlikely(create_error) && dentry == ERR_PTR(-ENOENT))
@@ -3070,7 +3055,6 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		return dentry;
 	}
 
-no_open:
 	if (d_in_lookup(dentry)) {
 		struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
 							     nd->flags);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case
  2020-03-13 23:52       ` [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case Al Viro
@ 2020-03-14  0:25         ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:25 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 4:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -                       if (unlikely(negative))
> +                       if (unlikely(!inode))
>                                 return -ENOENT;

Isn't that buggy?

Despite the name, 'inode' isn't an inode pointer. It's a pointer to
the return location.

I think the test should be

        if (unlikely(!*inode))
                return -ENOENT;

and I also suspect that the argument name should be fixed (maybe
"inodepp", maybe something better).

Because the "inode" pointer itself always exists. The callers will
have something like

        struct inode *inode;

and then pass in "&inode" to the function.

And it's possible that I'm talking complete garbage.

               Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode
  2020-03-13 23:53       ` [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode Al Viro
@ 2020-03-14  0:28         ` Linus Torvalds
  2020-03-14  1:00           ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

Oh, and here you accidentally fix the problem I pointed out about
patch 11, as you move the code:

On Fri, Mar 13, 2020 at 4:54 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +               if (unlikely(!*inode))
> +                       return -ENOENT;

Correct test added.

> -                       if (unlikely(!inode))
> -                               return -ENOENT;

Incorrect test removed.

And again, maybe I'm misreading the patch. But it does look like it's
wrong in the middle of the series, which would make bisection if
there's some related bug "interesting".

                Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW
  2020-03-13 23:53       ` [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW Al Viro
@ 2020-03-14  0:32         ` Linus Torvalds
  2020-03-14  1:06           ` Al Viro
  0 siblings, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

I mentioned this last time (perhaps for a different sequence):

On Fri, Mar 13, 2020 at 4:54 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         if (likely(!d_is_symlink(path->dentry)) ||
> -          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
> +          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
> +          flags & WALK_NOFOLLOW) {

Yes, I know that bitwise operations have higher precedence than the
logical ones. And I know & (and &&) have higher precedence than | (and
||).

But I have to _think_ about it every time I see code like this.

I'd really prefer to see

   if ((a & BIT) || (b & ANOTHER_BIT))

over the "equivalent" and shorter

   if (a & BIT || b & ANOTHER_BIT)

Please make it explicit. It wasn't before either, but it _could_ be.

              Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2
  2020-03-13 23:53       ` [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2 Al Viro
@ 2020-03-14  0:40         ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

Hmm..

On Fri, Mar 13, 2020 at 4:55 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>@@ -2370,10 +2375,9 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
> +       while (!(err = link_path_walk(s, nd)) &&
> +              (s = lookup_last(nd)) != NULL)
> +               ;

There's two copies of that loop (the other being in path_openat()). Is
there a reason why it's written that odd way?

Why is the loop body empty, when the *natural* way to write that would
seem to be

        while (!(err = link_path_walk(s, nd))) {
                s = lookup_last(nd));
                if (!s)
                        break;
        }

which may be a few lines longer, but a lot more legible.

I don't think you should use assignments in tests, unless strictly
required. Yes, that "err = ..." part almost has to be written that
way, but the "s = ..." part doesn't seem to have any reason for being
in the conditional.

And I'm only reading the patches, so once again: maybe I'm messing up
by mis-reading something. And maybe you have some reason for that
pattern.

                     Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW
  2020-03-13 23:53       ` [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW Al Viro
@ 2020-03-14  0:42         ` Linus Torvalds
  0 siblings, 0 replies; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:42 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 4:56 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> -          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
> -          flags & WALK_NOFOLLOW) {
> +          ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
> +          (flags & WALK_NOFOLLOW)) {

Oh, and here you fix the implicit precedence thing I was complaining
about earlier.

Maybe my original complaint had happened at this point, and I'd missed
the original place.

Likely.

            Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v4)
  2020-03-13 23:53   ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Al Viro
  2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
@ 2020-03-14  0:50     ` Linus Torvalds
  2020-03-14  2:27       ` Al Viro
  1 sibling, 1 reply; 198+ messages in thread
From: Linus Torvalds @ 2020-03-14  0:50 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 4:53 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         Review and testing would be _very_ welcome;

I didn't notice anythign else than the few things I sent out to
individual patches.

But I have to say, 69 patches is too long of a series to review. I
think you could send them out as multiple series (you describe them
that way anyway - parts 1-7) with a day in between.

Because my eyes were starting to glaze over about halfway in the series.

But don't do it for this version. If you do a #5. But it would be good
to be in -next regardless of whether you do a #5 or not.

                 Linus

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode
  2020-03-14  0:28         ` Linus Torvalds
@ 2020-03-14  1:00           ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-14  1:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 05:28:12PM -0700, Linus Torvalds wrote:
> Oh, and here you accidentally fix the problem I pointed out about
> patch 11, as you move the code:
> 
> On Fri, Mar 13, 2020 at 4:54 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > +               if (unlikely(!*inode))
> > +                       return -ENOENT;
> 
> Correct test added.
> 
> > -                       if (unlikely(!inode))
> > -                               return -ENOENT;
> 
> Incorrect test removed.
> 
> And again, maybe I'm misreading the patch. But it does look like it's
> wrong in the middle of the series, which would make bisection if
> there's some related bug "interesting".

Bisect hazard on botched reordering, actually.  Fixed (IOW, that should've
been if (!*inode) already in #11).

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW
  2020-03-14  0:32         ` Linus Torvalds
@ 2020-03-14  1:06           ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-14  1:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 05:32:32PM -0700, Linus Torvalds wrote:
> I mentioned this last time (perhaps for a different sequence):
> 
> On Fri, Mar 13, 2020 at 4:54 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         if (likely(!d_is_symlink(path->dentry)) ||
> > -          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
> > +          !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
> > +          flags & WALK_NOFOLLOW) {
> 
> Yes, I know that bitwise operations have higher precedence than the
> logical ones. And I know & (and &&) have higher precedence than | (and
> ||).
> 
> But I have to _think_ about it every time I see code like this.
> 
> I'd really prefer to see
> 
>    if ((a & BIT) || (b & ANOTHER_BIT))
> 
> over the "equivalent" and shorter
> 
>    if (a & BIT || b & ANOTHER_BIT)
> 
> Please make it explicit. It wasn't before either, but it _could_ be.

Not a problem (actually, I'd done that several commits later when I was
rewriting the expression anyway).  Folded the following into it now:

diff --git a/fs/namei.c b/fs/namei.c
index e47b376cf442..79f06be7f5d4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1839,8 +1839,8 @@ static inline int step_into(struct nameidata *nd, struct path *path,
 			    int flags, struct inode *inode, unsigned seq)
 {
 	if (likely(!d_is_symlink(path->dentry)) ||
-	   !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW) ||
-	   flags & WALK_NOFOLLOW) {
+	   !((flags & WALK_FOLLOW) || (nd->flags & LOOKUP_FOLLOW)) ||
+	   (flags & WALK_NOFOLLOW)) {
 		/* not a symlink or should not follow */
 		path_to_nameidata(path, nd);
 		nd->inode = inode;

^ permalink raw reply related	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCHSET] sanitized pathwalk machinery (v4)
  2020-03-14  0:50     ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Linus Torvalds
@ 2020-03-14  2:27       ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-14  2:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Fri, Mar 13, 2020 at 05:50:51PM -0700, Linus Torvalds wrote:
> On Fri, Mar 13, 2020 at 4:53 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         Review and testing would be _very_ welcome;
> 
> I didn't notice anythign else than the few things I sent out to
> individual patches.
> 
> But I have to say, 69 patches is too long of a series to review. I
> think you could send them out as multiple series (you describe them
> that way anyway - parts 1-7) with a day in between.

A week of fun?  OK...

> Because my eyes were starting to glaze over about halfway in the series.
> 
> But don't do it for this version. If you do a #5. But it would be good
> to be in -next regardless of whether you do a #5 or not.

FWIW, I've dealt with bisect hazards and I'll probably reorder #56 after
#57/#58, to get the stuff that deals with stack allocation (#56, #59..61)
together, without "reduce the exposure to weird struct path instances"
(57 and 58) mixed in the middle of that.

As for the rest...  I'm not sure that choose_mountpoint{,_rcu}()
is inserted into the right place in text - might be better next to
follow_up().  There's also a couple of pick_link() pieces worth separate
helpers, but I'd rather leave that for the next cycle - the series is
bloody long as it is.

I'm not going to throw the immediate prereqs for ->atomic_open() calling
conventions change into that pile - they don't harm anything, but they
are unmotivated without the next step (method signature change) and it's
really too late in the cycle for that.  That's going to be a separate
series, probably for the next cycle.  Changes to instances are not
huge; ceph is the worst by far and that's only +27/-22 lines.  So I don't
think there will be a lot of conflicts to cope with in the next cycle,
especially since ceph side of things looks like we want to do some
refactoring first, with much smaller changeover on top of that.  That
refactoring itself won't have prereqs at all, so that can be dealt with
sanely and that'll soak most of the potential conflicts in.

There are other potential refactorings/cleanups, but that's definitely
not for this cycle.  So... short of regressions found in that series
that's probably close to what I'll have for the coming window in
this branch.  If I see something else in there that can be usefully
cleaned up, I'll keep it for after -rc1...

Next: context switch to uaccess series and getting that patchbomb ready.
Oh, well...

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4
       [not found]       ` <20200315094148.12872-1-hdanton@sina.com>
@ 2020-03-15 14:23         ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-15 14:23 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Sun, Mar 15, 2020 at 05:41:48PM +0800, Hillf Danton wrote:
> > +	link = walk_component(nd, 0);
> > +	if (link) {
> 
> Need to check that link is not err.

What for?  It will do the right thing anyway.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5
       [not found]       ` <20200315095323.220-1-hdanton@sina.com>
@ 2020-03-15 14:24         ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-15 14:24 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Sun, Mar 15, 2020 at 05:53:23PM +0800, Hillf Danton wrote:

> > +	res = step_into(nd, 0, dentry, inode, seq);
> > +	if (unlikely(res)) {
> 
> Need to check that res is not err.
> 
> > +		nd->flags |= LOOKUP_PARENT;
> > +		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
> > +		nd->stack[0].name = NULL;
> > +		return res;
> >  	}

Not really...

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created
       [not found]       ` <20200315124007.1504-1-hdanton@sina.com>
@ 2020-03-15 14:34         ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-15 14:34 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Sun, Mar 15, 2020 at 08:40:07PM +0800, Hillf Danton wrote:
> 
> On Fri, 13 Mar 2020 23:53:26 +0000
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Don't mess with got_write there - it is guaranteed to be false on
> > entry and it will be set true if and only if we decide to go for
> > truncation and manage to get write access for that.
> > 
> > Don't carry acc_mode through the entire thing - it's only used
> > in that part.  And don't bother with gotos in there - compiler is
> > quite capable of optimizing that.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  fs/namei.c | 28 +++++++++++-----------------
> >  1 file changed, 11 insertions(+), 17 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index c85fdfd6b33d..8cdf8ef41194 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3122,9 +3122,9 @@ static const char *do_last(struct nameidata *nd,
> >  	kuid_t dir_uid = nd->inode->i_uid;
> >  	umode_t dir_mode = nd->inode->i_mode;
> >  	int open_flag = op->open_flag;
> > -	bool will_truncate = (open_flag & O_TRUNC) != 0;
> > +	bool do_truncate;
> >  	bool got_write = false;
> 
> Without got_write cut,
> 
> > -	int acc_mode = op->acc_mode;
> > +	int acc_mode;
> >  	unsigned seq;
> >  	struct inode *inode;
> >  	struct dentry *dentry;
> > @@ -3243,36 +3243,30 @@ static const char *do_last(struct nameidata *nd,
> >  		return ERR_PTR(-ENOTDIR);
> >  
> >  finish_open_created:
> > +	do_truncate = false;
> > +	acc_mode = op->acc_mode;
> >  	if (file->f_mode & FMODE_CREATED) {
> >  		/* Don't check for write permission, don't truncate */
> >  		open_flag &= ~O_TRUNC;
> > -		will_truncate = false;
> >  		acc_mode = 0;
> > -	} else if (!d_is_reg(nd->path.dentry)) {
> > -		will_truncate = false;
> > -	}
> > -	if (will_truncate) {
> > +	} else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
> >  		error = mnt_want_write(nd->path.mnt);
> >  		if (error)
> >  			return ERR_PTR(error);
> > -		got_write = true;
> > +		do_truncate = true;
> 
> replacing its role with do_truncate probably makes it easier to maintain
> do_last() leaving readers baffling and digging in logs after going up
> and down the code but failing to grip what is inteded in this patch.
> 
> And that kink can be fixed perhaps simply by not axing this got_write.

I'm sorry, I'd been unable to parse that ;-/  What do you mean?

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND
       [not found]       ` <20200315102905.12468-1-hdanton@sina.com>
@ 2020-03-15 14:37         ` Al Viro
  0 siblings, 0 replies; 198+ messages in thread
From: Al Viro @ 2020-03-15 14:37 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Sun, Mar 15, 2020 at 06:29:05PM +0800, Hillf Danton wrote:

> > And make link_path_walk() _always_ assign ->last_type - in the only
> > case when the value at the entry might survive to the return that value
> > is always LAST_ROOT, inherited from path_init().  Move that assignment
> > from path_init() into the beginning of link_path_walk(), to consolidate
> > the things.
> > 
> If you agree that the move is not a huge change, drop it for the sake of
> init in order to close the time window of a random last_type.

What time window of a random last_type and what does "drop for the sake of
init" mean?  Confused...

^ permalink raw reply	[flat|nested] 198+ messages in thread

end of thread, other threads:[~2020-03-15 14:37 UTC | newest]

Thread overview: 198+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-23  1:11 [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
2020-02-23  1:15 ` [RFC][PATCH v2 01/34] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
2020-02-23  1:15   ` [RFC][PATCH v2 02/34] fix automount/automount race properly Al Viro
2020-02-23  2:07     ` Linus Torvalds
2020-02-27 19:43       ` Al Viro
2020-02-27 20:00         ` Linus Torvalds
2020-02-23  1:15   ` [RFC][PATCH v2 03/34] follow_automount(): get rid of dead^Wstillborn code Al Viro
2020-02-23  1:15   ` [RFC][PATCH v2 04/34] follow_automount() doesn't need the entire nameidata Al Viro
2020-02-23  1:15   ` [RFC][PATCH v2 05/34] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
2020-02-23  1:15   ` [RFC][PATCH v2 06/34] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
2020-02-23  1:15   ` [RFC][PATCH v2 07/34] atomic_open(): saner calling conventions (return dentry on success) Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 08/34] lookup_open(): " Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 09/34] do_last(): collapse the call of path_to_nameidata() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 10/34] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 11/34] lookup_fast(): consolidate the RCU success case Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 12/34] teach handle_mounts() to handle RCU mode Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 13/34] lookup_fast(): take mount traversal into callers Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 14/34] new step_into() flag: WALK_NOFOLLOW Al Viro
2020-02-23  2:14     ` Linus Torvalds
2020-02-23 22:07       ` Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 15/34] fold handle_mounts() into step_into() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 16/34] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 17/34] expand the only remaining call of path_lookup_conditional() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 18/34] merging pick_link() with get_link(), part 1 Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 19/34] merging pick_link() with get_link(), part 2 Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 20/34] merging pick_link() with get_link(), part 3 Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 21/34] merging pick_link() with get_link(), part 4 Al Viro
2020-02-23  2:19     ` Linus Torvalds
2020-02-23  1:16   ` [RFC][PATCH v2 22/34] merging pick_link() with get_link(), part 5 Al Viro
2020-02-23  2:22     ` Linus Torvalds
2020-02-23  1:16   ` [RFC][PATCH v2 23/34] merging pick_link() with get_link(), part 6 Al Viro
2020-02-23  2:24     ` Linus Torvalds
2020-02-23  1:16   ` [RFC][PATCH v2 24/34] finally fold get_link() into pick_link() Al Viro
2020-02-23  2:31     ` Linus Torvalds
2020-02-23  1:16   ` [RFC][PATCH v2 25/34] massage __follow_mount_rcu() a bit Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 26/34] new helper: traverse_mounts() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 27/34] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 28/34] atomic_open(): lift the call of may_open() into do_last() Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 29/34] do_last(): merge the may_open() calls Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 30/34] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 31/34] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 32/34] do_last(): simplify the liveness analysis past finish_open_created Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 33/34] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-02-23  1:16   ` [RFC][PATCH v2 34/34] split the lookup-related parts of do_last() into a separate helper Al Viro
2020-02-25  1:24 ` [RFC][PATCHSET] sanitized pathwalk machinery (v2) Al Viro
2020-02-25  2:03   ` Al Viro
2020-02-28  1:24   ` Aleksa Sarai
2020-02-28  3:02     ` Al Viro
2020-03-01 21:51 ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Al Viro
2020-03-01 21:51   ` [RFC][PATCH v3 01/55] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 02/55] fix automount/automount race properly Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 03/55] follow_automount(): get rid of dead^Wstillborn code Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 04/55] follow_automount() doesn't need the entire nameidata Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 05/55] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 06/55] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 07/55] atomic_open(): saner calling conventions (return dentry on success) Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 08/55] lookup_open(): " Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 09/55] do_last(): collapse the call of path_to_nameidata() Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 10/55] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 11/55] lookup_fast(): consolidate the RCU success case Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 12/55] teach handle_mounts() to handle RCU mode Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 13/55] lookup_fast(): take mount traversal into callers Al Viro
2020-03-01 21:51     ` [RFC][PATCH v3 14/55] new step_into() flag: WALK_NOFOLLOW Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 15/55] fold handle_mounts() into step_into() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 16/55] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 17/55] expand the only remaining call of path_lookup_conditional() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 18/55] merging pick_link() with get_link(), part 1 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 19/55] merging pick_link() with get_link(), part 2 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 20/55] merging pick_link() with get_link(), part 3 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 21/55] merging pick_link() with get_link(), part 4 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 22/55] merging pick_link() with get_link(), part 5 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 23/55] merging pick_link() with get_link(), part 6 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 24/55] finally fold get_link() into pick_link() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 25/55] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 26/55] namei: invert the meaning of WALK_FOLLOW Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 27/55] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 28/55] link_path_walk(): simplify stack handling Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 29/55] massage __follow_mount_rcu() a bit Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 30/55] new helper: traverse_mounts() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 31/55] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 32/55] atomic_open(): lift the call of may_open() into do_last() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 33/55] do_last(): merge the may_open() calls Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 34/55] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 35/55] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 36/55] do_last(): simplify the liveness analysis past finish_open_created Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 37/55] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 38/55] split the lookup-related parts of do_last() into a separate helper Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 39/55] path_connected(): pass mount and dentry separately Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 40/55] path_parent_directory(): leave changing path->dentry to callers Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 41/55] follow_dotdot(): expand the call of path_parent_directory() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 42/55] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 43/55] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 44/55] move put_link() into handle_dots() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 45/55] handle_dots(): return ERR_PTR/NULL instead of -E.../0 Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 46/55] move follow_dotdot() and follow_dotdot_rcu() towards handle_dots() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 47/55] follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 48/55] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 49/55] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 50/55] follow_dotdot{,_rcu}(): massage loops Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 51/55] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 52/55] follow_dotdot(): " Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 53/55] helper for mount rootwards traversal Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 54/55] non-RCU analogue of the previous commit Al Viro
2020-03-01 21:52     ` [RFC][PATCH v3 55/55] fs/namei.c: kill follow_mount() Al Viro
2020-03-01 22:34   ` [RFC][PATCHSET] sanitized pathwalk machinery (v3) Linus Torvalds
2020-03-02  0:39     ` Al Viro
2020-03-03 23:48       ` Eric W. Biederman
2020-03-04  0:24         ` Al Viro
2020-03-04  5:23           ` Eric W. Biederman
2020-03-04  6:55             ` Al Viro
2020-03-04 13:28               ` Matthew Wilcox
2020-03-04 16:20                 ` Al Viro
2020-03-04 20:54                   ` Al Viro
     [not found]               ` <20200304105946.4xseo3jokcnpptrj@yavin>
2020-03-04 21:00                 ` Al Viro
2020-03-05  1:11                   ` Matthew Wilcox
2020-03-13 23:53   ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Al Viro
2020-03-13 23:52     ` [RFC][PATCH v4 01/69] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 02/69] fix automount/automount race properly Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 03/69] follow_automount(): get rid of dead^Wstillborn code Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 04/69] follow_automount() doesn't need the entire nameidata Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 05/69] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 06/69] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 07/69] atomic_open(): saner calling conventions (return dentry on success) Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 08/69] lookup_open(): " Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 09/69] do_last(): collapse the call of path_to_nameidata() Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 10/69] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
2020-03-13 23:52       ` [RFC][PATCH v4 11/69] lookup_fast(): consolidate the RCU success case Al Viro
2020-03-14  0:25         ` Linus Torvalds
2020-03-13 23:53       ` [RFC][PATCH v4 12/69] teach handle_mounts() to handle RCU mode Al Viro
2020-03-14  0:28         ` Linus Torvalds
2020-03-14  1:00           ` Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 13/69] lookup_fast(): take mount traversal into callers Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 14/69] step_into() callers: dismiss the symlink earlier Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 15/69] new step_into() flag: WALK_NOFOLLOW Al Viro
2020-03-14  0:32         ` Linus Torvalds
2020-03-14  1:06           ` Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 16/69] fold handle_mounts() into step_into() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 17/69] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 18/69] expand the only remaining call of path_lookup_conditional() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 19/69] merging pick_link() with get_link(), part 1 Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 20/69] merging pick_link() with get_link(), part 2 Al Viro
2020-03-14  0:40         ` Linus Torvalds
2020-03-13 23:53       ` [RFC][PATCH v4 21/69] merging pick_link() with get_link(), part 3 Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4 Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5 Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 24/69] merging pick_link() with get_link(), part 6 Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 25/69] finally fold get_link() into pick_link() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 27/69] namei: invert the meaning of WALK_FOLLOW Al Viro
2020-03-14  0:42         ` Linus Torvalds
2020-03-13 23:53       ` [RFC][PATCH v4 28/69] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 29/69] link_path_walk(): simplify stack handling Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 30/69] namei: have link_path_walk() maintain LOOKUP_PARENT Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 31/69] massage __follow_mount_rcu() a bit Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 32/69] new helper: traverse_mounts() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 33/69] atomic_open(): return the right dentry in FMODE_OPENED case Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 34/69] atomic_open(): lift the call of may_open() into do_last() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 35/69] do_last(): merge the may_open() calls Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 36/69] do_last(): don't bother with keeping got_write in FMODE_OPENED case Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 37/69] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 39/69] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 40/69] split the lookup-related parts of do_last() into a separate helper Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 41/69] path_connected(): pass mount and dentry separately Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 42/69] path_parent_directory(): leave changing path->dentry to callers Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 43/69] expand path_parent_directory() in its callers Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 44/69] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 45/69] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks " Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 46/69] move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 47/69] handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 48/69] follow_dotdot{,_rcu}(): switch to use of step_into() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 49/69] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 50/69] follow_dotdot{,_rcu}(): massage loops Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 51/69] follow_dotdot_rcu(): be lazy about changing nd->path Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 52/69] follow_dotdot(): " Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 53/69] helper for mount rootwards traversal Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 54/69] non-RCU analogue of the previous commit Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 55/69] fs/namei.c: kill follow_mount() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 56/69] pick_link(): more straightforward handling of allocation failures Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 57/69] pick_link(): pass it struct path already with normal refcounting rules Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 58/69] fold path_to_nameidata() into its only remaining caller Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 59/69] pick_link(): take reserving space on stack into a new helper Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 60/69] reserve_stack(): switch to __nd_alloc_stack() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 61/69] __nd_alloc_stack(): make it return bool Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 62/69] link_path_walk(): sample parent's i_uid and i_mode for the last component Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 63/69] take post-lookup part of do_last() out of loop Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 64/69] open_last_lookups(): consolidate fsnotify_create() calls Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 65/69] open_last_lookups(): don't abuse complete_walk() when all we want is unlazy Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 66/69] open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 67/69] open_last_lookups(): move complete_walk() " Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 68/69] atomic_open(): no need to pass struct open_flags anymore Al Viro
2020-03-13 23:53       ` [RFC][PATCH v4 69/69] lookup_open(): don't bother with fallbacks to lookup+create Al Viro
     [not found]       ` <20200315094148.12872-1-hdanton@sina.com>
2020-03-15 14:23         ` [RFC][PATCH v4 22/69] merging pick_link() with get_link(), part 4 Al Viro
     [not found]       ` <20200315095323.220-1-hdanton@sina.com>
2020-03-15 14:24         ` [RFC][PATCH v4 23/69] merging pick_link() with get_link(), part 5 Al Viro
     [not found]       ` <20200315124007.1504-1-hdanton@sina.com>
2020-03-15 14:34         ` [RFC][PATCH v4 38/69] do_last(): simplify the liveness analysis past finish_open_created Al Viro
     [not found]       ` <20200315102905.12468-1-hdanton@sina.com>
2020-03-15 14:37         ` [RFC][PATCH v4 26/69] sanitize handling of nd->last_type, kill LAST_BIND Al Viro
2020-03-14  0:50     ` [RFC][PATCHSET] sanitized pathwalk machinery (v4) Linus Torvalds
2020-03-14  2:27       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).