From: Christian Brauner <brauner@kernel.org>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>,
linux-integrity@vger.kernel.org, zohar@linux.ibm.com,
christian.brauner@ubuntu.com, containers@lists.linux.dev,
dmitry.kasatkin@gmail.com, ebiederm@xmission.com,
krzysztof.struczynski@huawei.com, roberto.sassu@huawei.com,
mpeters@redhat.com, lhinds@redhat.com, lsturman@redhat.com,
puiterwi@redhat.com, jejb@linux.ibm.com, jamjoom@us.ibm.com,
linux-kernel@vger.kernel.org, paul@paul-moore.com,
rgb@redhat.com, linux-security-module@vger.kernel.org,
jmorris@namei.org, jpenumak@redhat.com,
James Bottomley <James.Bottomley@HansenPartnership.com>
Subject: Re: [PATCH v12 02/26] securityfs: Extend securityfs with namespacing support
Date: Sat, 21 May 2022 11:38:39 +0200 [thread overview]
Message-ID: <20220521093839.3srwejkeqthgk2fq@wittgenstein> (raw)
In-Reply-To: <20220521022302.GA8575@mail.hallyn.com>
On Fri, May 20, 2022 at 09:23:02PM -0500, Serge Hallyn wrote:
> On Wed, Apr 20, 2022 at 10:06:09AM -0400, Stefan Berger wrote:
> > Enable multiple instances of securityfs by keying each instance with a
> > pointer to the user namespace it belongs to.
> >
> > Since we do not need the pinning of the filesystem for the virtualization
> > case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
> > case when the init_user_ns is active. This simplifies the cleanup for the
> > virtualization case where usage of securityfs_remove() to free dentries
> > is therefore not needed anymore.
> >
> > For the initial securityfs, i.e. the one mounted in the host userns mount,
> > nothing changes. The rules for securityfs_remove() are as before and it is
> > still paired with securityfs_create(). Specifically, a file created via
> > securityfs_create_dentry() in the initial securityfs mount still needs to
> > be removed by a call to securityfs_remove(). Creating a new dentry in the
> > initial securityfs mount still pins the filesystem like it always did.
> > Consequently, the initial securityfs mount is not destroyed on
> > umount/shutdown as long as at least one user of it still has dentries that
> > it hasn't removed with a call to securityfs_remove().
> >
> > Prevent mounting of an instance of securityfs in another user namespace
> > than it belongs to. Also, prevent accesses to files and directories by
> > a user namespace that is neither the user namespace it belongs to
> > nor an ancestor of the user namespace that the instance of securityfs
> > belongs to. Do not prevent access if securityfs was bind-mounted and
> > therefore the init_user_ns is the owning user namespace.
> >
> > Suggested-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> >
> > ---
> > v11:
> > - Formatted comment's first line to be '/*'
> > ---
> > security/inode.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 62 insertions(+), 11 deletions(-)
> >
> > diff --git a/security/inode.c b/security/inode.c
> > index 13e6780c4444..84c9396792a9 100644
> > --- a/security/inode.c
> > +++ b/security/inode.c
> > @@ -21,9 +21,38 @@
> > #include <linux/security.h>
> > #include <linux/lsm_hooks.h>
> > #include <linux/magic.h>
> > +#include <linux/user_namespace.h>
> >
> > -static struct vfsmount *mount;
> > -static int mount_count;
> > +static struct vfsmount *init_securityfs_mount;
> > +static int init_securityfs_mount_count;
> > +
> > +static int securityfs_permission(struct user_namespace *mnt_userns,
> > + struct inode *inode, int mask)
> > +{
> > + int err;
> > +
> > + err = generic_permission(&init_user_ns, inode, mask);
> > + if (!err) {
> > + /*
> > + * Unless bind-mounted, deny access if current_user_ns() is not
> > + * ancestor.
>
> This comment has confused me the last few times I looked at this. I see
> now you're using "bind-mounted" as a shortcut for saying "bind mounted from
> the init_user_ns into a child_user_ns container". I do think that needs
> to be made clearer in this comment.
>
> Should the init_user_ns really be special here? What if I'm running a
> first level container with uptodate userspace that mounts its own
> securityfs, but in that i want to run a nested older userspace that
> bind mounts the parent securityfs? Is there a good reason to deny that?
>
> It would seem to me the better check would be
>
> if (!is_original_mounter_of(current_user_ns, inode->i_sb->s_user_ns) &&
> !in_userns(current_user_ns(), inode->i_sb->s_user_ns))
> err = -EACCESS;
>
> the is_original_mounter_of() would require the user_ns to cache first
> its parent securityfs userns, and, when a task in the user_ns mounts
> securityfs, then cache its own userns. (without a reference).
> If current_user_ns() has mounted a securityfs for a user_ns other than
> inode->i_sb->s_user_ns (or init_user_ns), then reject the mount.
> Otherwise check current_user_ns()->parent, etc, until init_user_ns.
> If you reach init_user_ns, or an ns which mounted inode->i_sb->s_user_ns,
> then allow, else deny.
>
> It's the kind of special casing we've worked hard to avoid in other
> namespaces.
>
> > + */
> > + if (inode->i_sb->s_user_ns != &init_user_ns &&
> > + !in_userns(current_user_ns(), inode->i_sb->s_user_ns))
> > + err = -EACCES;
> > + }
> > +
> > + return err;
> > +}
> > +
> > +static const struct inode_operations securityfs_dir_inode_operations = {
> > + .permission = securityfs_permission,
> > + .lookup = simple_lookup,
> > +};
> > +
> > +static const struct inode_operations securityfs_file_inode_operations = {
> > + .permission = securityfs_permission,
> > +};
> >
> > static void securityfs_free_inode(struct inode *inode)
> > {
> > @@ -40,20 +69,25 @@ static const struct super_operations securityfs_super_operations = {
> > static int securityfs_fill_super(struct super_block *sb, struct fs_context *fc)
> > {
> > static const struct tree_descr files[] = {{""}};
> > + struct user_namespace *ns = fc->user_ns;
> > int error;
> >
> > + if (WARN_ON(ns != current_user_ns()))
> > + return -EINVAL;
> > +
> > error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
> > if (error)
> > return error;
> >
> > sb->s_op = &securityfs_super_operations;
> > + sb->s_root->d_inode->i_op = &securityfs_dir_inode_operations;
> >
> > return 0;
> > }
> >
> > static int securityfs_get_tree(struct fs_context *fc)
> > {
> > - return get_tree_single(fc, securityfs_fill_super);
> > + return get_tree_keyed(fc, securityfs_fill_super, fc->user_ns);
> > }
> >
> > static const struct fs_context_operations securityfs_context_ops = {
> > @@ -71,6 +105,7 @@ static struct file_system_type fs_type = {
> > .name = "securityfs",
> > .init_fs_context = securityfs_init_fs_context,
> > .kill_sb = kill_litter_super,
> > + .fs_flags = FS_USERNS_MOUNT,
> > };
> >
> > /**
> > @@ -109,6 +144,7 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
> > const struct file_operations *fops,
> > const struct inode_operations *iops)
> > {
> > + struct user_namespace *ns = current_user_ns();
> > struct dentry *dentry;
> > struct inode *dir, *inode;
> > int error;
> > @@ -118,12 +154,19 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
> >
> > pr_debug("securityfs: creating file '%s'\n",name);
> >
> > - error = simple_pin_fs(&fs_type, &mount, &mount_count);
> > - if (error)
> > - return ERR_PTR(error);
> > + if (ns == &init_user_ns) {
> > + error = simple_pin_fs(&fs_type, &init_securityfs_mount,
> > + &init_securityfs_mount_count);
>
> So ... it's less work for the kernel to skip the simple_pin_fs()
> here, but it's more code, and more confusing code, to skip it.
>
> So I just want to ask, to make sure: is it worth it? Or should
> it just be done for all namespaces here (and below and for release),
> for shorter, simpler, easier to read and grok code?
I think you might've skipped a few version of the thread.
It would be more code and a lot more confusing to try and keep the
simple_pin_fs(). You will need a per-userns vfsmount pointer and you
still need to change securityfs_create_dentry and securityfs_remove. For
more context see [1] and [2].
[1]: https://lore.kernel.org/lkml/20211206172600.1495968-12-stefanb@linux.ibm.com
[2]: https://lore.kernel.org/lkml/20211206172600.1495968-13-stefanb@linux.ibm.com
The fs pinning logic is most suited for single-superblock, almost
system-lifetime bound pseudo filesystems such as debugfs or configfs.
It becomes a rather huge burden when a pseudo fs is supposed to
support multiple superblocks (in this case keyed by userns).
next prev parent reply other threads:[~2022-05-21 9:38 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-20 14:06 [PATCH v12 00/26] ima: Namespace IMA with audit support in IMA-ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 01/26] securityfs: rework dentry creation Stefan Berger
2022-05-09 19:54 ` Serge E. Hallyn
2022-05-09 20:36 ` Serge E. Hallyn
2022-05-10 8:43 ` Amir Goldstein
2022-05-10 10:38 ` Christian Brauner
2022-05-10 14:51 ` Serge E. Hallyn
2022-05-10 14:53 ` Serge E. Hallyn
2022-05-10 10:26 ` Christian Brauner
2022-05-10 10:25 ` Christian Brauner
2022-05-10 14:10 ` Serge E. Hallyn
2022-05-10 15:51 ` Christian Brauner
2022-05-10 18:51 ` Serge E. Hallyn
2022-05-10 20:41 ` Serge E. Hallyn
2022-06-09 14:27 ` Mimi Zohar
2022-05-10 16:50 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 02/26] securityfs: Extend securityfs with namespacing support Stefan Berger
2022-05-21 2:23 ` Serge E. Hallyn
2022-05-21 9:38 ` Christian Brauner [this message]
2022-05-21 15:09 ` Serge E. Hallyn
2022-07-07 14:34 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 03/26] ima: Define ima_namespace struct and start moving variables into it Stefan Berger
2022-05-21 2:33 ` Serge E. Hallyn
2022-05-24 14:57 ` Stefan Berger
2022-05-24 15:05 ` Serge E. Hallyn
2022-05-24 16:18 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 04/26] ima: Move arch_policy_entry into ima_namespace Stefan Berger
2022-05-21 2:46 ` Serge E. Hallyn
2022-05-21 3:07 ` Serge E. Hallyn
2022-07-07 14:12 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 05/26] ima: Move ima_htable " Stefan Berger
2022-05-21 2:50 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 06/26] ima: Move measurement list related variables " Stefan Berger
2022-05-21 2:55 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 07/26] ima: Move some IMA policy and filesystem " Stefan Berger
2022-05-21 3:03 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 08/26] ima: Move IMA securityfs files into ima_namespace or onto stack Stefan Berger
2022-05-21 3:24 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 09/26] ima: Move ima_lsm_policy_notifier into ima_namespace Stefan Berger
2022-05-22 2:35 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 10/26] ima: Switch to lazy lsm policy updates for better performance Stefan Berger
2022-05-22 17:06 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 11/26] ima: Define mac_admin_ns_capable() as a wrapper for ns_capable() Stefan Berger
2022-05-22 17:31 ` Serge E. Hallyn
2022-05-24 14:17 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 12/26] ima: Only accept AUDIT rules for non-init_ima_ns namespaces for now Stefan Berger
2022-05-22 17:38 ` Serge E. Hallyn
2022-05-24 13:25 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 13/26] userns: Add pointer to ima_namespace to user_namespace Stefan Berger
2022-05-22 18:24 ` Serge E. Hallyn
2022-05-23 9:59 ` Christian Brauner
2022-05-23 11:31 ` Stefan Berger
2022-05-23 12:41 ` Christian Brauner
2022-05-23 12:58 ` Stefan Berger
2022-05-23 14:25 ` Serge E. Hallyn
2022-07-07 14:14 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 14/26] ima: Implement hierarchical processing of file accesses Stefan Berger
2022-05-23 0:42 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 15/26] ima: Implement ima_free_policy_rules() for freeing of an ima_namespace Stefan Berger
2022-05-23 0:43 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 16/26] ima: Add functions for creating and " Stefan Berger
2022-05-30 1:07 ` Serge E. Hallyn
2022-04-20 14:06 ` [PATCH v12 17/26] integrity/ima: Define ns_status for storing namespaced iint data Stefan Berger
2022-04-20 14:06 ` [PATCH v12 18/26] integrity: Add optional callback function to integrity_inode_free() Stefan Berger
2022-04-20 14:06 ` [PATCH v12 19/26] ima: Namespace audit status flags Stefan Berger
2022-04-20 14:06 ` [PATCH v12 20/26] ima: Remove unused iints from the integrity_iint_cache Stefan Berger
2022-04-20 14:06 ` [PATCH v12 21/26] ima: Setup securityfs for IMA namespace Stefan Berger
2022-05-30 1:16 ` Serge E. Hallyn
2022-05-31 19:26 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 22/26] ima: Introduce securityfs file to activate an " Stefan Berger
2022-04-20 14:06 ` [PATCH v12 23/26] ima: Show owning user namespace's uid and gid when displaying policy Stefan Berger
2022-05-22 17:54 ` Serge E. Hallyn
2022-05-24 13:19 ` Stefan Berger
2022-04-20 14:06 ` [PATCH v12 24/26] ima: Limit number of policy rules in non-init_ima_ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 25/26] ima: Restrict informational audit messages to init_ima_ns Stefan Berger
2022-04-20 14:06 ` [PATCH v12 26/26] ima: Enable IMA namespaces Stefan Berger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220521093839.3srwejkeqthgk2fq@wittgenstein \
--to=brauner@kernel.org \
--cc=James.Bottomley@HansenPartnership.com \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux.dev \
--cc=dmitry.kasatkin@gmail.com \
--cc=ebiederm@xmission.com \
--cc=jamjoom@us.ibm.com \
--cc=jejb@linux.ibm.com \
--cc=jmorris@namei.org \
--cc=jpenumak@redhat.com \
--cc=krzysztof.struczynski@huawei.com \
--cc=lhinds@redhat.com \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=lsturman@redhat.com \
--cc=mpeters@redhat.com \
--cc=paul@paul-moore.com \
--cc=puiterwi@redhat.com \
--cc=rgb@redhat.com \
--cc=roberto.sassu@huawei.com \
--cc=serge@hallyn.com \
--cc=stefanb@linux.ibm.com \
--cc=zohar@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).