linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@virtuozzo.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrei Vagin <avagin@openvz.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH] mnt: allow to add a mount into an existing group
Date: Tue, 28 Feb 2017 19:20:26 -0800	[thread overview]
Message-ID: <20170301032025.GA7027@outlook.office365.com> (raw)
In-Reply-To: <87wpdlz2uc.fsf@xmission.com>

On Tue, Jan 24, 2017 at 02:03:23PM +1300, Eric W. Biederman wrote:
> Andrei Vagin <avagin@openvz.org> writes:
> 
> > Now a shared group can be only inherited from a source mount.
> > This patch adds an ability to add a mount into an existing shared
> > group.
> 
> This sounds like a lot of the discussion on bind mounts accross
> namespaces.  I am going to stay out of this for a bit until
> we resolve my latest patch.

Hello Eric,

I have seen your patches in the Linus' tree. Could we return to this
patch?

This patch should not add any security issues, because it allows to
create shared mounts between namespaces only if the current user has
CAP_SYS_ADMIN in both these mount namespaces.

For us (CRIU) this patch allows to separate restore of mount trees and
restore of shared groups.

Thanks,
Andrei

> 
> Eric
> 
> 
> > mount(source, target, NULL, MS_SET_GROUP, NULL)
> >
> > mount() with the MS_SET_GROUP flag adds the "target" mount into a group
> > of the "source" mount. The calling process has to have the CAP_SYS_ADMIN
> > capability in namespaces of these mounts. The source and the target
> > mounts have to have the same super block.
> >
> > This new functionality together with "mnt: Tuck mounts under others
> > instead of creating shadow/side mounts." allows CRIU to dump and restore
> > any set of mount namespaces.
> >
> > Currently we have a lot of issues about dumping and restoring mount
> > namespaces. The bigest problem is that we can't construct mount trees
> > directly due to several reasons:
> > * groups can't be set, they can be only inherited
> > * file systems has to be mounted from the specified user namespaces
> > * the mount() syscall doesn't just create one mount -- the mount is
> >   also propagated to all members of a parent group
> > * umount() doesn't detach mounts from all members of a group
> >   (mounts with children are not umounted)
> > * mounts are propagated underneath of existing mounts
> > * mount() doesn't allow to make bind-mounts between two namespaces
> > * processes can have opened file descriptors to overmounted files
> >
> > All these operations are non-trivial, making the task of restoring
> > a mount namespace practically unsolvable for reasonable time. The
> > proposed change allows to restore a mount namespace in a direct
> > manner, without any super complex logic.
> >
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Signed-off-by: Andrei Vagin <avagin@openvz.org>
> > ---
> >  fs/namespace.c          | 53 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/fs.h |  1 +
> >  2 files changed, 54 insertions(+)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index b5b1259..df52fd4 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2301,6 +2301,57 @@ static inline int tree_contains_unbindable(struct mount *mnt)
> >  	return 0;
> >  }
> >  
> > +static int do_set_group(struct path *path, const char *sibling_name)
> > +{
> > +	struct mount *sibling, *mnt;
> > +	struct path sibling_path;
> > +	int err;
> > +
> > +	if (!sibling_name || !*sibling_name)
> > +		return -EINVAL;
> > +
> > +	err = kern_path(sibling_name, LOOKUP_FOLLOW, &sibling_path);
> > +	if (err)
> > +		return err;
> > +
> > +	sibling = real_mount(sibling_path.mnt);
> > +	mnt = real_mount(path->mnt);
> > +
> > +	namespace_lock();
> > +
> > +	err = -EPERM;
> > +	if (!sibling->mnt_ns ||
> > +	    !ns_capable(sibling->mnt_ns->user_ns, CAP_SYS_ADMIN))
> > +		goto out_unlock;
> > +
> > +	err = -EINVAL;
> > +	if (sibling->mnt.mnt_sb != mnt->mnt.mnt_sb)
> > +		goto out_unlock;
> > +
> > +	if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt))
> > +		goto out_unlock;
> > +
> > +	if (IS_MNT_SLAVE(sibling)) {
> > +		struct mount *m = sibling->mnt_master;
> > +
> > +		list_add(&mnt->mnt_slave, &m->mnt_slave_list);
> > +		mnt->mnt_master = m;
> > +	}
> > +
> > +	if (IS_MNT_SHARED(sibling)) {
> > +		mnt->mnt_group_id = sibling->mnt_group_id;
> > +		list_add(&mnt->mnt_share, &sibling->mnt_share);
> > +		set_mnt_shared(mnt);
> > +	}
> > +
> > +	err = 0;
> > +out_unlock:
> > +	namespace_unlock();
> > +
> > +	path_put(&sibling_path);
> > +	return err;
> > +}
> > +
> >  static int do_move_mount(struct path *path, const char *old_name)
> >  {
> >  	struct path old_path, parent_path;
> > @@ -2779,6 +2830,8 @@ long do_mount(const char *dev_name, const char __user *dir_name,
> >  		retval = do_change_type(&path, flags);
> >  	else if (flags & MS_MOVE)
> >  		retval = do_move_mount(&path, dev_name);
> > +	else if (flags & MS_SET_GROUP)
> > +		retval = do_set_group(&path, dev_name);
> >  	else
> >  		retval = do_new_mount(&path, type_page, flags, mnt_flags,
> >  				      dev_name, data_page);
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index 36da93f..6e6e37d 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -130,6 +130,7 @@ struct inodes_stat_t {
> >  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
> >  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> >  #define MS_LAZYTIME	(1<<25) /* Update the on-disk [acm]times lazily */
> > +#define MS_SET_GROUP	(1<<26) /* Add a mount into a shared group */
> >  
> >  /* These sb flags are internal to the kernel */
> >  #define MS_NOREMOTELOCK	(1<<27)

  reply	other threads:[~2017-03-01  3:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23 23:37 [PATCH] mnt: allow to add a mount into an existing group Andrei Vagin
2017-01-24  1:03 ` Eric W. Biederman
2017-03-01  3:20   ` Andrei Vagin [this message]
2017-04-28  5:18 Andrei Vagin
     [not found] ` <20170428051831.20084-1-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2017-05-09 17:36   ` Andrey Vagin
     [not found]     ` <CANaxB-y9W6E_6W70BPWduTcZ+A3u=w9ZLw2dvdrfe-gYcDvKhQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-10  0:42       ` Eric W. Biederman
2017-05-10 23:58         ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170301032025.GA7027@outlook.office365.com \
    --to=avagin@virtuozzo.com \
    --cc=avagin@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).