linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>,
	David Howells <dhowells@redhat.com>,
	Eric Biederman <ebiederm@xmission.com>,
	stable <stable@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Serge Hallyn <serge@hallyn.com>,
	dev@opencontainers.org,
	Linux Containers <containers@lists.linux-foundation.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ian Kent <raven@themaw.net>
Subject: Re: [PATCH RFC 0/1] mount: universally disallow mounting over symlinks
Date: Fri, 10 Jan 2020 04:15:23 +0000	[thread overview]
Message-ID: <20200110041523.GK8904@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAHk-=wiq11+thoe60qhsSHk_nbRF2TRL1Wnf6eHcYObjhJmsww@mail.gmail.com>

On Thu, Jan 09, 2020 at 04:08:16PM -0800, Linus Torvalds wrote:
> On Wed, Jan 8, 2020 at 1:34 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > The point is, we'd never followed mounts on /proc/self/cwd et.al.
> > I hadn't checked 2.0, but 2.1.100 ('97, before any changes from me)
> > is that way.
> 
> Hmm. If that's the case, maybe they should be marked implicitly as
> O_PATH when opened?

I thought you wanted O_PATH as starting point to have mounts traversed?
Confused...

> > Actually, scratch that - 2.0 behaves the same way
> > (mountpoint crossing is done in iget() there; is that Minix influence
> > or straight from the Lions' book?)
> 
> I don't think I ever had access to Lions' - I've _seen_ a printout of
> it later, and obviously maybe others did,
> 
> More likely it's from Maurice Bach: the Design of the Unix Operating
> System. I'm pretty sure that's where a lot of the FS layer stuff came
> from.  Certainly the bad old buffer head interfaces, and quite likely
> the iget() stuff too.
>
> > 0.10: forward traversal in iget(), back traversal in fs/namei.c:find_entry()
> 
> Whee, you _really_ went back in time.
> 
> So I did too.
> 
> And looking at that code in iget(), I doubt it came from anywhere.
> Christ. It's just looping over a fixed-size array, both when finding
> the inode, and finding the superblock.
> 
> Cute, but unbelievably stupid. It was a more innocent time.
> 
> In other words, I think you can chalk it up to just me, because
> blaming anybody else for that garbage would be very very unfair indeed
> ;)

See https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/sys/iget.c
Exactly the same algorithm, complete with linear searches over those
fixed-sized array.

<grabs Bach> Right, he simply transcribes v7 iget().

So I suspect that you are right - your variant of iget was pretty much
one-to-one implementation of Bach's description of v7 iget.

Your namei wasn't - Bach has 'if the entry points to root and you are
in the root and name is "..", find mount table entry (by device number),
drop your directory inode, grab the inode of mountpount and restart
the search for ".." in there', which gives back traversals to arbitrary
depth.  And v7 namei() (as Bach mentions) uses iget() for starting point
as well as for each component.  You kept pointers instead, which is where
the other difference has come from (no mount traversal at the starting
point)...

Actually, I've misread your code in 0.10 - it does unlimited forward
traversals; it's back traversals that go only one level.  The forward
ones got limited to one level in 0.95, but then mount-over-root had
been banned all along.  I'd read the pre-dcache variant of iget(),
seen it go pretty much all the way back to beginning and hadn't
sorted out the 0.12 -> 0.95 transition...

> > How would your proposal deal with access("/proc/self/fd/42/foo", MAY_READ)
> > vs. faccessat(42, "foo", MAY_READ)?
> 
> I think that in a perfect world, the O_PATH'ness of '42' would be the
> deciding factor. Wouldn't those be the best and most consistent
> semantics?
> 
> And then 'cwd'/'root' always have the O_PATH behavior.

See above - unless I'm misparsing you, you wanted mount traversals in the
starting point if it's ...at() with O_PATH fd.  With O_PATH open() not
doing them.

For cwd and root the situation is opposite - we do NOT traverse mounts
for those.  And that's really too late to change.

> > The latter would trigger automount,
> > the former would not...  Or would you extend that to "traverse mounts
> > upon following procfs links, if the file in question had been opened with
> > O_PATH"?
> 
> Exactly.
> 
> But you know what? I do not believe this is all that important, and I
> doubt it will matter to anybody.

FWIW, digging through the automount-related parts of that stuff has
caught several fun issues.  One (and I'm rather embarrassed by it)
should've been caught back in commit 8aef18845266 (VFS: Fix vfsmount
overput on simultaneous automount).  To quote the commit message:
    The problem is that lock_mount() drops the caller's reference to the
    mountpoint's vfsmount in the case where it finds something already mounted on
    the mountpoint as it transits to the mounted filesystem and replaces path->mnt
    with the new mountpoint vfsmount.
    
    During a pathwalk, however, we don't take a reference on the vfsmount if it is
    the same as the one in the nameidata struct, but do_add_mount() doesn't know
    this.
At which point I should've gone "what the fuck?" - lock_mount() does, indeed,
drop path->mnt in this situation and replaces it with the whatever's come to
cover it.  For mount(2) that's the right thing to do - we _want_ to mount
on top of whatever we have at the mountpoint.  For automounts we very much
don't want that - it's either "mount right on top of the automount trigger"
or discard whatever we'd been about to mount and walk into whatever's got
mounted there (presumably the same thing triggered by another process).
We kinda-sorta get that effect, but in a very convoluted way: do_add_mount()
will refuse to mount something on top of itself -
        /* Refuse the same filesystem on the same mount point */
        err = -EBUSY;
        if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
            path->mnt->mnt_root == path->dentry)
                goto unlock;
which will end up with -EBUSY returned (and recognized by follow_automount()).

First of all, that's unreliable.  If somebody not only has triggered that
automount, but managed to _mount_ something else on top (for example,
has triggered it by lookup of mountpoint-to-be in mount(2)), we'll end
up not triggering that check.  In which case we'll get something like
nfs referral point under nfs automounted there under tmpfs from explicit
overmount under same nfs mount we'd automounted there - identical to what's
been buried under tmpfs.  It's hard to hit, but not impossibly so.

What's more, the whole solution is a kludge - the root of problem is
that lock_mount() is the wrong thing to do in case of finish_automount().
We don't want to go into whatever's overmounting us there, both for
the reasons above *and* because it's a PITA for the caller.  So the
right solution is
	* lift lock_mount() call from do_add_mount() into its callers
(all 2 of them); while we are at it, lift unlock_mount() as well
(makes for simpler failure exits in do_add_mount()).
	* replace the call of lock_mount() in finish_automount()
with variant that doesn't do "unlock, walk deeper and retry locking",
returning ERR_PTR(-EBUSY) in such case.
	* get rid of the kludge introduced in that commit.  Better
yet, don't bother with traversing into the covering mount in case
of success - let the caller of follow_automount() do that.  Which
eliminates the need to pass need_mntput to the sucker and suggests
an even better solution - have this analogue of lock_mount()
return NULL instead of ERR_PTR(-EBUSY) and treat it in finish_automount()
as "OK, discard what we wanted to mount and return 0".  That gets
rid of the entire
        err = finish_automount(mnt, path);
        switch (err) {
        case -EBUSY:
                /* Someone else made a mount here whilst we were busy */
                return 0;
        case 0:
                path_put(path);
                path->mnt = mnt;
                path->dentry = dget(mnt->mnt_root);
                return 0;
        default:
                return err;
        }
chunk in follow_automount() - it would just be
	return finish_automount(mnt, path);

Another thing (in the same area) is not a bug per se, but...
after the call of ->d_automount() we have this:
        if (IS_ERR(mnt)) {
                /*
                 * The filesystem is allowed to return -EISDIR here to indicate
                 * it doesn't want to automount.  For instance, autofs would do
                 * this so that its userspace daemon can mount on this dentry.
                 *
                 * However, we can only permit this if it's a terminal point in
                 * the path being looked up; if it wasn't then the remainder of
                 * the path is inaccessible and we should say so.
                 */
                if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
                        return -EREMOTE;
                return PTR_ERR(mnt);
	}
Except that not a single instance of ->d_automount() has ever returned
-EISDIR.  Certainly not autofs one, despite the what the comment says.
That chunk has come from dhowells, back when the whole mount trap series
had been merged.  After talking that thing over (fun: trying to figure
out what had been intended nearly 9 years ago, when people involved are
in UK, US east coast and AU west coast respectively.  The only way it
could suck more would've been if I were on the west coast - then all
timezone deltas would be 8-hour ones)...  looks like it's a rudiment
of plans that got superseded during the series development, nobody
quite remembers exact details.  Conclusion: it's not even dead, it's
stillborn; bury it.

Unfortunately, there are other interesting questions related to
autofs-specific bits (->d_manage()) and the timezone-related fun
is, of course, still there.  I hope to sort that out today or
tomorrow, at least enough to do a reasonable set of backportable
fixes to put in front of follow_managed()/step_into() queue.
Oh, well...

  reply	other threads:[~2020-01-10  4:15 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-30  5:20 [PATCH RFC 0/1] mount: universally disallow mounting over symlinks Aleksa Sarai
2019-12-30  5:20 ` [PATCH RFC 1/1] " Aleksa Sarai
2019-12-30  7:34   ` Linus Torvalds
2019-12-30  8:28     ` Aleksa Sarai
2020-01-08  4:39       ` Andy Lutomirski
2019-12-30  5:44 ` [PATCH RFC 0/1] " Al Viro
2019-12-30  5:49   ` Aleksa Sarai
     [not found]     ` <20191230072959.62kcojxpthhdwmfa@yavin.dot.cyphar.com>
2019-12-30  7:53       ` Linus Torvalds
2019-12-30  8:32         ` Aleksa Sarai
2020-01-02  8:58           ` David Laight
2020-01-02  9:09             ` Aleksa Sarai
2020-01-01  0:43       ` Al Viro
2020-01-01  0:54         ` Al Viro
2020-01-01  3:08           ` Al Viro
2020-01-01 14:44             ` Aleksa Sarai
2020-01-01 23:40               ` Al Viro
2020-01-02  3:59                 ` Aleksa Sarai
2020-01-03  1:49                   ` Al Viro
2020-01-04  4:46                     ` Ian Kent
2020-01-08  3:13                     ` Al Viro
2020-01-08  3:54                       ` Linus Torvalds
2020-01-08 21:34                         ` Al Viro
2020-01-10  0:08                           ` Linus Torvalds
2020-01-10  4:15                             ` Al Viro [this message]
2020-01-10  5:03                               ` Linus Torvalds
2020-01-10  6:20                               ` Ian Kent
2020-01-12 21:33                                 ` Al Viro
2020-01-13  2:59                                   ` Ian Kent
2020-01-14  0:25                                     ` Ian Kent
2020-01-14  4:39                                       ` Al Viro
2020-01-14  5:01                                         ` Ian Kent
2020-01-14  5:59                                           ` Ian Kent
2020-01-10 21:07                         ` Aleksa Sarai
2020-01-14  4:57                           ` Al Viro
2020-01-14  5:12                             ` Al Viro
2020-01-14 20:01                             ` Aleksa Sarai
2020-01-15 14:25                               ` Al Viro
2020-01-15 14:29                                 ` Aleksa Sarai
2020-01-15 14:34                                   ` Aleksa Sarai
2020-01-15 14:48                                     ` Al Viro
2020-01-18 12:07                                       ` [PATCH v3 0/2] openat2: minor uapi cleanups Aleksa Sarai
2020-01-18 12:07                                         ` [PATCH v3 1/2] open: introduce openat2(2) syscall Aleksa Sarai
2020-01-18 12:08                                         ` [PATCH v3 2/2] selftests: add openat2(2) selftests Aleksa Sarai
2020-01-18 15:28                                         ` [PATCH v3 0/2] openat2: minor uapi cleanups Al Viro
2020-01-18 18:09                                           ` Al Viro
2020-01-18 23:03                                             ` Aleksa Sarai
2020-01-19  1:12                                               ` Al Viro
2020-01-15 13:57                             ` [PATCH RFC 0/1] mount: universally disallow mounting over symlinks Aleksa Sarai
2020-01-19  3:14                               ` [RFC][PATCHSET][CFT] pathwalk cleanups and fixes Al Viro
2020-01-19  3:17                                 ` [PATCH 01/17] do_add_mount(): lift lock_mount/unlock_mount into callers Al Viro
2020-01-19  3:17                                   ` [PATCH 02/17] fix automount/automount race properly Al Viro
2020-01-30 14:34                                     ` Christian Brauner
2020-01-19  3:17                                   ` [PATCH 03/17] follow_automount(): get rid of dead^Wstillborn code Al Viro
2020-01-30 14:38                                     ` Christian Brauner
2020-01-19  3:17                                   ` [PATCH 04/17] follow_automount() doesn't need the entire nameidata Al Viro
2020-01-30 14:45                                     ` Christian Brauner
2020-01-30 15:38                                       ` Al Viro
2020-01-30 15:55                                         ` Al Viro
2020-01-19  3:17                                   ` [PATCH 05/17] make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW Al Viro
2020-01-19  3:17                                   ` [PATCH 06/17] handle_mounts(): start building a sane wrapper for follow_managed() Al Viro
2020-01-19  3:17                                   ` [PATCH 07/17] atomic_open(): saner calling conventions (return dentry on success) Al Viro
2020-01-19  3:17                                   ` [PATCH 08/17] lookup_open(): " Al Viro
2020-01-19  3:17                                   ` [PATCH 09/17] do_last(): collapse the call of path_to_nameidata() Al Viro
2020-01-19  3:17                                   ` [PATCH 10/17] handle_mounts(): pass dentry in, turn path into a pure out argument Al Viro
2020-01-19  3:17                                   ` [PATCH 11/17] lookup_fast(): consolidate the RCU success case Al Viro
2020-01-19  3:17                                   ` [PATCH 12/17] teach handle_mounts() to handle RCU mode Al Viro
2020-01-19  3:17                                   ` [PATCH 13/17] lookup_fast(): take mount traversal into callers Al Viro
2020-01-19  3:17                                   ` [PATCH 14/17] new step_into() flag: WALK_NOFOLLOW Al Viro
2020-01-19  3:17                                   ` [PATCH 15/17] fold handle_mounts() into step_into() Al Viro
2020-01-19  3:17                                   ` [PATCH 16/17] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() Al Viro
2020-01-19  3:17                                   ` [PATCH 17/17] expand the only remaining call of path_lookup_conditional() Al Viro
2020-01-19  3:17                                   ` [PATCH 1/9] merging pick_link() with get_link(), part 1 Al Viro
2020-01-19  3:17                                   ` [PATCH 2/9] merging pick_link() with get_link(), part 2 Al Viro
2020-01-19  3:17                                   ` [PATCH 3/9] merging pick_link() with get_link(), part 3 Al Viro
2020-01-19  3:17                                   ` [PATCH 4/9] merging pick_link() with get_link(), part 4 Al Viro
2020-01-19  3:17                                   ` [PATCH 5/9] merging pick_link() with get_link(), part 5 Al Viro
2020-01-19  3:17                                   ` [PATCH 6/9] merging pick_link() with get_link(), part 6 Al Viro
2020-01-19  3:17                                   ` [PATCH 7/9] finally fold get_link() into pick_link() Al Viro
2020-01-19  3:17                                   ` [PATCH 8/9] massage __follow_mount_rcu() a bit Al Viro
2020-01-19  3:17                                   ` [PATCH 9/9] new helper: traverse_mounts() Al Viro
2020-01-30 14:13                                   ` [PATCH 01/17] do_add_mount(): lift lock_mount/unlock_mount into callers Christian Brauner
2020-01-19 14:33                                 ` [RFC][PATCHSET][CFT] pathwalk cleanups and fixes Ian Kent
2020-01-10 23:19                     ` [PATCH RFC 0/1] mount: universally disallow mounting over symlinks Al Viro
2020-01-13  1:48                       ` Ian Kent
2020-01-13  3:54                         ` Al Viro
2020-01-13  6:00                           ` Ian Kent
2020-01-13  6:03                             ` Ian Kent
2020-01-13 13:30                               ` Al Viro
2020-01-14  7:25                                 ` Ian Kent
2020-01-14 12:17                                   ` Ian Kent
2020-01-04  5:52               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200110041523.GK8904@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dev@opencontainers.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=serge@hallyn.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).