All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>,
	linux-integrity@vger.kernel.org, zohar@linux.ibm.com,
	christian.brauner@ubuntu.com, containers@lists.linux.dev,
	dmitry.kasatkin@gmail.com, ebiederm@xmission.com,
	krzysztof.struczynski@huawei.com, roberto.sassu@huawei.com,
	mpeters@redhat.com, lhinds@redhat.com, lsturman@redhat.com,
	puiterwi@redhat.com, jejb@linux.ibm.com, jamjoom@us.ibm.com,
	linux-kernel@vger.kernel.org, paul@paul-moore.com,
	rgb@redhat.com, linux-security-module@vger.kernel.org,
	jmorris@namei.org, jpenumak@redhat.com,
	James Bottomley <James.Bottomley@HansenPartnership.com>
Subject: Re: [PATCH v12 02/26] securityfs: Extend securityfs with namespacing support
Date: Sat, 21 May 2022 11:38:39 +0200	[thread overview]
Message-ID: <20220521093839.3srwejkeqthgk2fq@wittgenstein> (raw)
In-Reply-To: <20220521022302.GA8575@mail.hallyn.com>

On Fri, May 20, 2022 at 09:23:02PM -0500, Serge Hallyn wrote:
> On Wed, Apr 20, 2022 at 10:06:09AM -0400, Stefan Berger wrote:
> > Enable multiple instances of securityfs by keying each instance with a
> > pointer to the user namespace it belongs to.
> > 
> > Since we do not need the pinning of the filesystem for the virtualization
> > case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
> > case when the init_user_ns is active. This simplifies the cleanup for the
> > virtualization case where usage of securityfs_remove() to free dentries
> > is therefore not needed anymore.
> > 
> > For the initial securityfs, i.e. the one mounted in the host userns mount,
> > nothing changes. The rules for securityfs_remove() are as before and it is
> > still paired with securityfs_create(). Specifically, a file created via
> > securityfs_create_dentry() in the initial securityfs mount still needs to
> > be removed by a call to securityfs_remove(). Creating a new dentry in the
> > initial securityfs mount still pins the filesystem like it always did.
> > Consequently, the initial securityfs mount is not destroyed on
> > umount/shutdown as long as at least one user of it still has dentries that
> > it hasn't removed with a call to securityfs_remove().
> > 
> > Prevent mounting of an instance of securityfs in another user namespace
> > than it belongs to. Also, prevent accesses to files and directories by
> > a user namespace that is neither the user namespace it belongs to
> > nor an ancestor of the user namespace that the instance of securityfs
> > belongs to. Do not prevent access if securityfs was bind-mounted and
> > therefore the init_user_ns is the owning user namespace.
> > 
> > Suggested-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> > 
> > ---
> > v11:
> >  - Formatted comment's first line to be '/*'
> > ---
> >  security/inode.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 62 insertions(+), 11 deletions(-)
> > 
> > diff --git a/security/inode.c b/security/inode.c
> > index 13e6780c4444..84c9396792a9 100644
> > --- a/security/inode.c
> > +++ b/security/inode.c
> > @@ -21,9 +21,38 @@
> >  #include <linux/security.h>
> >  #include <linux/lsm_hooks.h>
> >  #include <linux/magic.h>
> > +#include <linux/user_namespace.h>
> >  
> > -static struct vfsmount *mount;
> > -static int mount_count;
> > +static struct vfsmount *init_securityfs_mount;
> > +static int init_securityfs_mount_count;
> > +
> > +static int securityfs_permission(struct user_namespace *mnt_userns,
> > +				 struct inode *inode, int mask)
> > +{
> > +	int err;
> > +
> > +	err = generic_permission(&init_user_ns, inode, mask);
> > +	if (!err) {
> > +		/*
> > +		 * Unless bind-mounted, deny access if current_user_ns() is not
> > +		 * ancestor.
> 
> This comment has confused me the last few times I looked at this.  I see
> now you're using "bind-mounted" as a shortcut for saying "bind mounted from
> the init_user_ns into a child_user_ns container".  I do think that needs
> to be made clearer in this comment.
> 
> Should the init_user_ns really be special here?  What if I'm running a
> first level container with uptodate userspace that mounts its own
> securityfs, but in that i want to run a nested older userspace that
> bind mounts the parent securityfs?  Is there a good reason to deny that?
> 
> It would seem to me the better check would be
> 
> 	if (!is_original_mounter_of(current_user_ns, inode->i_sb->s_user_ns) &&
> 	     !in_userns(current_user_ns(), inode->i_sb->s_user_ns))
> 		err = -EACCESS;
> 
> the is_original_mounter_of() would require the user_ns to cache first
> its parent securityfs userns, and, when a task in the user_ns mounts
> securityfs, then cache its own userns.  (without a reference).
> If current_user_ns() has mounted a securityfs for a user_ns other than
> inode->i_sb->s_user_ns (or init_user_ns), then reject the mount.
> Otherwise check current_user_ns()->parent, etc, until init_user_ns.
> If you reach init_user_ns, or an ns which mounted inode->i_sb->s_user_ns,
> then allow, else deny.
> 
> It's the kind of special casing we've worked hard to avoid in other
> namespaces.
> 
> > +		 */
> > +		if (inode->i_sb->s_user_ns != &init_user_ns &&
> > +		    !in_userns(current_user_ns(), inode->i_sb->s_user_ns))
> > +			err = -EACCES;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> > +static const struct inode_operations securityfs_dir_inode_operations = {
> > +	.permission	= securityfs_permission,
> > +	.lookup		= simple_lookup,
> > +};
> > +
> > +static const struct inode_operations securityfs_file_inode_operations = {
> > +	.permission	= securityfs_permission,
> > +};
> >  
> >  static void securityfs_free_inode(struct inode *inode)
> >  {
> > @@ -40,20 +69,25 @@ static const struct super_operations securityfs_super_operations = {
> >  static int securityfs_fill_super(struct super_block *sb, struct fs_context *fc)
> >  {
> >  	static const struct tree_descr files[] = {{""}};
> > +	struct user_namespace *ns = fc->user_ns;
> >  	int error;
> >  
> > +	if (WARN_ON(ns != current_user_ns()))
> > +		return -EINVAL;
> > +
> >  	error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
> >  	if (error)
> >  		return error;
> >  
> >  	sb->s_op = &securityfs_super_operations;
> > +	sb->s_root->d_inode->i_op = &securityfs_dir_inode_operations;
> >  
> >  	return 0;
> >  }
> >  
> >  static int securityfs_get_tree(struct fs_context *fc)
> >  {
> > -	return get_tree_single(fc, securityfs_fill_super);
> > +	return get_tree_keyed(fc, securityfs_fill_super, fc->user_ns);
> >  }
> >  
> >  static const struct fs_context_operations securityfs_context_ops = {
> > @@ -71,6 +105,7 @@ static struct file_system_type fs_type = {
> >  	.name =		"securityfs",
> >  	.init_fs_context = securityfs_init_fs_context,
> >  	.kill_sb =	kill_litter_super,
> > +	.fs_flags =	FS_USERNS_MOUNT,
> >  };
> >  
> >  /**
> > @@ -109,6 +144,7 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
> >  					const struct file_operations *fops,
> >  					const struct inode_operations *iops)
> >  {
> > +	struct user_namespace *ns = current_user_ns();
> >  	struct dentry *dentry;
> >  	struct inode *dir, *inode;
> >  	int error;
> > @@ -118,12 +154,19 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
> >  
> >  	pr_debug("securityfs: creating file '%s'\n",name);
> >  
> > -	error = simple_pin_fs(&fs_type, &mount, &mount_count);
> > -	if (error)
> > -		return ERR_PTR(error);
> > +	if (ns == &init_user_ns) {
> > +		error = simple_pin_fs(&fs_type, &init_securityfs_mount,
> > +				      &init_securityfs_mount_count);
> 
> So ...  it's less work for the kernel to skip the simple_pin_fs()
> here, but it's more code, and more confusing code, to skip it.
> 
> So I just want to ask, to make sure:  is it worth it?  Or should
> it just be done for all namespaces here (and below and for release),
> for shorter, simpler, easier to read and grok code?

I think you might've skipped a few version of the thread.
It would be more code and a lot more confusing to try and keep the
simple_pin_fs(). You will need a per-userns vfsmount pointer and you
still need to change securityfs_create_dentry and securityfs_remove. For
more context see [1] and [2].

[1]: https://lore.kernel.org/lkml/20211206172600.1495968-12-stefanb@linux.ibm.com
[2]: https://lore.kernel.org/lkml/20211206172600.1495968-13-stefanb@linux.ibm.com

The fs pinning logic is most suited for single-superblock, almost
system-lifetime bound pseudo filesystems such as debugfs or configfs.
It becomes a rather huge burden when a pseudo fs is supposed to
support multiple superblocks (in this case keyed by userns).

  reply	other threads:[~2022-05-21  9:38 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-20 14:06 [PATCH v12 00/26] ima: Namespace IMA with audit support in IMA-ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 01/26] securityfs: rework dentry creation Stefan Berger
2022-05-09 19:54   ` Serge E. Hallyn
2022-05-09 20:36     ` Serge E. Hallyn
2022-05-10  8:43       ` Amir Goldstein
2022-05-10 10:38         ` Christian Brauner
2022-05-10 14:51           ` Serge E. Hallyn
2022-05-10 14:53         ` Serge E. Hallyn
2022-05-10 10:26       ` Christian Brauner
2022-05-10 10:25     ` Christian Brauner
2022-05-10 14:10       ` Serge E. Hallyn
2022-05-10 15:51         ` Christian Brauner
2022-05-10 18:51           ` Serge E. Hallyn
2022-05-10 20:41           ` Serge E. Hallyn
2022-06-09 14:27             ` Mimi Zohar
2022-05-10 16:50       ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 02/26] securityfs: Extend securityfs with namespacing support Stefan Berger
2022-05-21  2:23   ` Serge E. Hallyn
2022-05-21  9:38     ` Christian Brauner [this message]
2022-05-21 15:09       ` Serge E. Hallyn
2022-07-07 14:34     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 03/26] ima: Define ima_namespace struct and start moving variables into it Stefan Berger
2022-05-21  2:33   ` Serge E. Hallyn
2022-05-24 14:57     ` Stefan Berger
2022-05-24 15:05       ` Serge E. Hallyn
2022-05-24 16:18     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 04/26] ima: Move arch_policy_entry into ima_namespace Stefan Berger
2022-05-21  2:46   ` Serge E. Hallyn
2022-05-21  3:07     ` Serge E. Hallyn
2022-07-07 14:12     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 05/26] ima: Move ima_htable " Stefan Berger
2022-05-21  2:50   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 06/26] ima: Move measurement list related variables " Stefan Berger
2022-05-21  2:55   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 07/26] ima: Move some IMA policy and filesystem " Stefan Berger
2022-05-21  3:03   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 08/26] ima: Move IMA securityfs files into ima_namespace or onto stack Stefan Berger
2022-05-21  3:24   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 09/26] ima: Move ima_lsm_policy_notifier into ima_namespace Stefan Berger
2022-05-22  2:35   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 10/26] ima: Switch to lazy lsm policy updates for better performance Stefan Berger
2022-05-22 17:06   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 11/26] ima: Define mac_admin_ns_capable() as a wrapper for ns_capable() Stefan Berger
2022-05-22 17:31   ` Serge E. Hallyn
2022-05-24 14:17     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 12/26] ima: Only accept AUDIT rules for non-init_ima_ns namespaces for now Stefan Berger
2022-05-22 17:38   ` Serge E. Hallyn
2022-05-24 13:25     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 13/26] userns: Add pointer to ima_namespace to user_namespace Stefan Berger
2022-05-22 18:24   ` Serge E. Hallyn
2022-05-23  9:59     ` Christian Brauner
2022-05-23 11:31       ` Stefan Berger
2022-05-23 12:41         ` Christian Brauner
2022-05-23 12:58           ` Stefan Berger
2022-05-23 14:25           ` Serge E. Hallyn
2022-07-07 14:14             ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 14/26] ima: Implement hierarchical processing of file accesses Stefan Berger
2022-05-23  0:42   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 15/26] ima: Implement ima_free_policy_rules() for freeing of an ima_namespace Stefan Berger
2022-05-23  0:43   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 16/26] ima: Add functions for creating and " Stefan Berger
2022-05-30  1:07   ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 17/26] integrity/ima: Define ns_status for storing namespaced iint data Stefan Berger
2022-04-20 14:06 ` [PATCH v12 18/26] integrity: Add optional callback function to integrity_inode_free() Stefan Berger
2022-04-20 14:06 ` [PATCH v12 19/26] ima: Namespace audit status flags Stefan Berger
2022-04-20 14:06 ` [PATCH v12 20/26] ima: Remove unused iints from the integrity_iint_cache Stefan Berger
2022-04-20 14:06 ` [PATCH v12 21/26] ima: Setup securityfs for IMA namespace Stefan Berger
2022-05-30  1:16   ` Serge E. Hallyn
2022-05-31 19:26     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 22/26] ima: Introduce securityfs file to activate an " Stefan Berger
2022-04-20 14:06 ` [PATCH v12 23/26] ima: Show owning user namespace's uid and gid when displaying policy Stefan Berger
2022-05-22 17:54   ` Serge E. Hallyn
2022-05-24 13:19     ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 24/26] ima: Limit number of policy rules in non-init_ima_ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 25/26] ima: Restrict informational audit messages to init_ima_ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 26/26] ima: Enable IMA namespaces Stefan Berger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220521093839.3srwejkeqthgk2fq@wittgenstein \
    --to=brauner@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux.dev \
    --cc=dmitry.kasatkin@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=jamjoom@us.ibm.com \
    --cc=jejb@linux.ibm.com \
    --cc=jmorris@namei.org \
    --cc=jpenumak@redhat.com \
    --cc=krzysztof.struczynski@huawei.com \
    --cc=lhinds@redhat.com \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lsturman@redhat.com \
    --cc=mpeters@redhat.com \
    --cc=paul@paul-moore.com \
    --cc=puiterwi@redhat.com \
    --cc=rgb@redhat.com \
    --cc=roberto.sassu@huawei.com \
    --cc=serge@hallyn.com \
    --cc=stefanb@linux.ibm.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.