All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hao Lee <haolee.swjtu@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs/namespace: eliminate unnecessary mount counting
Date: Mon, 14 Feb 2022 16:29:48 +0800	[thread overview]
Message-ID: <CA+PpKPkRMW_=D-=7dOYA95h-oqTGYB=rUyBpitZ9bfgb8OawGA@mail.gmail.com> (raw)
In-Reply-To: <YgnGuy0GJzlqCSRj@zeniv-ca.linux.org.uk>

On Mon, Feb 14, 2022 at 11:04 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Jan 23, 2022 at 10:04:48AM +0000, Hao Lee wrote:
> > propagate_one() counts the number of propagated mounts in each
> > propagation. We can count them in advance and use the number in
> > subsequent propagation.
>
> You are relying upon highly non-obvious assumptions.  Namely, that
> copies will have the same amount of mounts as source_mnt.  AFAICS,
> it's not true in case of mount --move - there source_mnt might very
> well contain the things that would be skipped in subsequent copies.
> E.g. anything marked unbindable.  Or mntns binds - anything that would
> be skipped by copy_tree() without special flags.
>
> Sure, we could make count_mounts() return just the number of those
> that will go into subsequent copies (with mount --move we don't add
> the original subtree - it's been in the namespace and thus is already
> counted), but
>         1) it creates an extra dependency in already convoluted code
> (copy_tree() and count_mounts() need to be kept in sync, in case we ever
> add new classes of mounts to be skipped)
>         2) I'm *NOT* certain that we won't ever run into the non-move
> cases where the original tree contains something that would be skipped
> from subsequent ones, and there we want to count the original.  Matter of
> fact, we do run into that.  Look:
>
> # arrange a private tree at /tmp/a
> mkdir /tmp/a
> mount --bind /tmp/a /tmp/a
> mount --make-rprivate /tmp/a
> # mountpoint at /tmp/a/x
> mkdir /tmp/a/x
> mount --bind /tmp/a/x /tmp/a/x
> # this will be a peer of /tmp/a/x
> mkdir /tmp/a/y
> # ... and this - a mountpoint in it
> mkdir /tmp/a/x/v
> # ... rbind fodder:
> mkdir /tmp/a/z
> touch /tmp/a/z/f
> # start a new mntns, so we won't run afoul of loop checks
> unshare -m &
> # ... and bind it on /tmp/a/z/f
> mount --bind /proc/$!/ns/mnt /tmp/a/z/f
> # now we can do the rest - it won't spread into child namespace
> # make /tmp/a/x a peer of /tmp/b/x
> mount --make-shared /tmp/a/x
> mount --bind /tmp/a/x /tmp/a/y
> # ... and rbind /tmp/a/z at /tmp/a/x/v
> # which will propagate a copy to /tmp/b/x/v
> # except that mntns bound on /tmp/a/x/v/f will *not* propagate.
> mount --rbind /tmp/a/z /tmp/a/x/v
> # verify that
> stat /tmp/a/x/v
> stat /tmp/a/y/v
> stat /tmp/a/x/v/f
> stat /tmp/a/y/v/f
>
> Result:
>   File: /tmp/a/x/v/
>   Size: 4096            Blocks: 8          IO Block: 4096   directory
> Device: 808h/2056d      Inode: 270607      Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2022-02-13 21:43:45.058485130 -0500
> Modify: 2022-02-13 21:42:37.142457622 -0500
> Change: 2022-02-13 21:42:37.142457622 -0500
>  Birth: 2022-02-13 21:42:37.142457622 -0500
>   File: /tmp/a/y/v/
>   Size: 4096            Blocks: 8          IO Block: 4096   directory
> Device: 808h/2056d      Inode: 270607      Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2022-02-13 21:43:45.058485130 -0500
> Modify: 2022-02-13 21:42:37.142457622 -0500
> Change: 2022-02-13 21:42:37.142457622 -0500
>  Birth: 2022-02-13 21:42:37.142457622 -0500
>   File: /tmp/a/x/v/f
>   Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> Device: 4h/4d   Inode: 4026532237  Links: 1
> Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2022-02-13 21:42:37.146457624 -0500
> Modify: 2022-02-13 21:42:37.146457624 -0500
> Change: 2022-02-13 21:42:37.146457624 -0500
>  Birth: -
>   File: /tmp/a/y/v/f
>   Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> Device: 808h/2056d      Inode: 270608      Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2022-02-13 21:42:37.142457622 -0500
> Modify: 2022-02-13 21:42:37.142457622 -0500
> Change: 2022-02-13 21:42:37.142457622 -0500
>  Birth: 2022-02-13 21:42:37.142457622 -0500
>
>         Note that /tmp/a/x/v and /tmp/a/y/v resolve to the same directory
> (otherwise seen at /tmp/a/z), but /tmp/a/x/v/f and /tmp/a/y/v/f do *not*
> resolve to the same thing - the latter is a regular file on /dev/sda8
> (nothing got propagated there), while the former is *not* - it's an
> mntns descriptor we'd bound on /tmp/a/z/f
>
>         IOW, the first copy has two mount nodes, the second - only one.
> Initial copy at rbind does get mntns binds copied into it - look at
> CL_COPY_MNT_NS_FILE in arguments of copy_tree() call in __do_loopback().
> However, we do *not* propagate that subsequent copies (propagate_one()
> never passes CL_COPY_MNT_NS_FILE).  So that's at least one case where we
> want different contributions from the first copy and every subsequent one.
>
>         So we'd need to run *two* counts, the one to be used from
> attach_recursive_mnt() and another for propagate_one().  With even more
> places where the things could go wrong...

This is really a classic counterexample.
Thanks for your detailed explanation!

>
>         I don't believe it's worth the trouble.  Sure, you run that loop
> only once, instead of once per copy.  And if that's more than noise,
> compared to allocating the same mounts we'd been counting, connecting
> them into tree, hashing, etc., I would be *very* surprised.

Got it. Thanks!

>
> NAKed-by: Al Viro <viro@zeniv.linux.org.uk>

      parent reply	other threads:[~2022-02-14  8:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-23 10:04 [PATCH] fs/namespace: eliminate unnecessary mount counting Hao Lee
2022-02-14  3:04 ` Al Viro
2022-02-14  3:42   ` [PATCH] clean overflow checks in count_mounts() a bit Al Viro
2022-02-14  8:29   ` Hao Lee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+PpKPkRMW_=D-=7dOYA95h-oqTGYB=rUyBpitZ9bfgb8OawGA@mail.gmail.com' \
    --to=haolee.swjtu@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.