* [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces @ 2014-09-02 15:44 Seth Forshee 2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee ` (5 more replies) 0 siblings, 6 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel, Seth Forshee Here's an updated set of patches for allowing fuse mounts from pid and user namespaces. I discussed some of the issues we debated with the last patch set (and a few others) with Eric at LinuxCon, and the updates here mainly reflect the outcome of those discussions. The stickiest issue in the v1 patches was the question of where to get the user and pid namespaces from that are used for translating ids for communication with userspace. Eric told me that for user namespaces at least we need to grab a namespace at open or mount time and use only that namespace to prevent certain types of attacks. That rules out the suggestion of using the user ns of current in the read/write paths, and I think it makes sense to handle pid and user namespaces similarly. So in these patches I'm still grabbing the namespaces of current during mount, but I've added an additional check to fail the mount if the f_cred's userns for the fd to userspace doesn't match. Another issue mentioned by Eric was what to use for i_[ug]id if the ids from userspace don't map into the user namespace, which is going to be a problem for any other filesystems which become mountable from user namespaces as well. We discussed a few options for addressing this, the most promising of which seems to be either using INVALID_[UG]ID for these inodes or creating vfs-wide "nobody" ids for this purpose. After thinking about it for a while I'm favoring using the invalid ids, but I'm hoping to solicit some more feedback. For now these patches are using invalid ids if the user doesn't map into the namespace. I went through the vfs code and found one place where this could be handled better (addressed in patch 1 of the series). The only other issue I found was that currently no one, not even root, can change onwership of such inodes, but I suspect we can find a way around this. The only other change since v1 is that I now fail changing file ownership if the new uid or gid does not map into the namespace used for userspace communication. Thanks, Seth Seth Forshee (3): vfs: Check for invalid i_uid in may_follow_link() fuse: Translate pids passed to userspace into pid namespaces fuse: Add support for mounts from user namespaces fs/fuse/dev.c | 13 +++++++------ fs/fuse/dir.c | 46 +++++++++++++++++++++++++++++++++------------- fs/fuse/fuse_i.h | 8 ++++++++ fs/fuse/inode.c | 31 +++++++++++++++++++++++-------- fs/namei.c | 2 +- 5 files changed, 72 insertions(+), 28 deletions(-) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee @ 2014-09-02 15:44 ` Seth Forshee 2014-09-05 17:05 ` Serge Hallyn 2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee ` (4 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel, Seth Forshee Filesystem uids which don't map into a user namespace may result in inode->i_uid being INVALID_UID. A symlink and its parent could have different owners in the filesystem can both get mapped to INVALID_UID, which may result in following a symlink when this would not have otherwise been permitted. Prevent this by adding a check that the uid is valid before the comparison. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> --- fs/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index a996bb48dfab..193da09e903e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd) return 0; /* Allowed if parent directory and link owner match. */ - if (uid_eq(parent->i_uid, inode->i_uid)) + if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid)) return 0; audit_log_link_denied("follow_link", link); -- 1.9.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() 2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee @ 2014-09-05 17:05 ` Serge Hallyn 2014-09-05 19:00 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Serge Hallyn @ 2014-09-05 17:05 UTC (permalink / raw) To: Seth Forshee Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > Filesystem uids which don't map into a user namespace may result > in inode->i_uid being INVALID_UID. A symlink and its parent > could have different owners in the filesystem can both get > mapped to INVALID_UID, which may result in following a symlink > when this would not have otherwise been permitted. Prevent this > by adding a check that the uid is valid before the comparison. > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> I'm a bit uncomfortable about this, but I can't put my finger on why. Wonder if it could mess up root looking into a malicious user's task by looking under /proc/self/root. I suppose not, as this should only be the case (with root in init_user_ns) for fuse? Anyway it seems needed for keeping root from falling into a trap. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> > --- > fs/namei.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/namei.c b/fs/namei.c > index a996bb48dfab..193da09e903e 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd) > return 0; > > /* Allowed if parent directory and link owner match. */ > - if (uid_eq(parent->i_uid, inode->i_uid)) > + if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid)) > return 0; > > audit_log_link_denied("follow_link", link); > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() 2014-09-05 17:05 ` Serge Hallyn @ 2014-09-05 19:00 ` Seth Forshee 2014-09-05 19:23 ` Serge Hallyn 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-05 19:00 UTC (permalink / raw) To: Serge Hallyn Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel, seth.forshee On Fri, Sep 05, 2014 at 05:05:09PM +0000, Serge Hallyn wrote: > Quoting Seth Forshee (seth.forshee@canonical.com): > > Filesystem uids which don't map into a user namespace may result > > in inode->i_uid being INVALID_UID. A symlink and its parent > > could have different owners in the filesystem can both get > > mapped to INVALID_UID, which may result in following a symlink > > when this would not have otherwise been permitted. Prevent this > > by adding a check that the uid is valid before the comparison. > > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> > > I'm a bit uncomfortable about this, but I can't put my finger > on why. Wonder if it could mess up root looking into > a malicious user's task by looking under /proc/self/root. > I suppose not, as this should only be the case (with root in > init_user_ns) for fuse? > > Anyway it seems needed for keeping root from falling into a trap. This shouldn't mess up looking into /proc/self/root or anything else where the symlink isn't under control of the malicious user. Plus this restriction only applies to world-writable directories with the sticky bit set. > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> > > > --- > > fs/namei.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/namei.c b/fs/namei.c > > index a996bb48dfab..193da09e903e 100644 > > --- a/fs/namei.c > > +++ b/fs/namei.c > > @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd) > > return 0; > > > > /* Allowed if parent directory and link owner match. */ > > - if (uid_eq(parent->i_uid, inode->i_uid)) > > + if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid)) > > return 0; > > > > audit_log_link_denied("follow_link", link); > > -- > > 1.9.1 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() 2014-09-05 19:00 ` Seth Forshee @ 2014-09-05 19:23 ` Serge Hallyn 0 siblings, 0 replies; 39+ messages in thread From: Serge Hallyn @ 2014-09-05 19:23 UTC (permalink / raw) To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > On Fri, Sep 05, 2014 at 05:05:09PM +0000, Serge Hallyn wrote: > > Quoting Seth Forshee (seth.forshee@canonical.com): > > > Filesystem uids which don't map into a user namespace may result > > > in inode->i_uid being INVALID_UID. A symlink and its parent > > > could have different owners in the filesystem can both get > > > mapped to INVALID_UID, which may result in following a symlink > > > when this would not have otherwise been permitted. Prevent this > > > by adding a check that the uid is valid before the comparison. > > > > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> > > > > I'm a bit uncomfortable about this, but I can't put my finger > > on why. Wonder if it could mess up root looking into > > a malicious user's task by looking under /proc/self/root. > > I suppose not, as this should only be the case (with root in > > init_user_ns) for fuse? > > > > Anyway it seems needed for keeping root from falling into a trap. > > This shouldn't mess up looking into /proc/self/root or anything else > where the symlink isn't under control of the malicious user. Plus this > restriction only applies to world-writable directories with the sticky > bit set. Oh - yeah, i saw that and glossed over it for some reason. Thanks. > > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> > > > > > --- > > > fs/namei.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/namei.c b/fs/namei.c > > > index a996bb48dfab..193da09e903e 100644 > > > --- a/fs/namei.c > > > +++ b/fs/namei.c > > > @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd) > > > return 0; > > > > > > /* Allowed if parent directory and link owner match. */ > > > - if (uid_eq(parent->i_uid, inode->i_uid)) > > > + if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid)) > > > return 0; > > > > > > audit_log_link_denied("follow_link", link); > > > -- > > > 1.9.1 > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee 2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee @ 2014-09-02 15:44 ` Seth Forshee 2014-09-05 17:10 ` Serge Hallyn 2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee ` (3 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel, Seth Forshee If the process reading on the fuse fd is executing in a pid namespace then giving it the global pid of the process making a request doesn't make sense. Instead, capture the pid namespace when the filesystem is first mounted and translate pids into this namespace before passing them to userspace. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> --- fs/fuse/dev.c | 9 +++++---- fs/fuse/fuse_i.h | 4 ++++ fs/fuse/inode.c | 3 +++ 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ca887314aba9..839caebd34f1 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -20,6 +20,7 @@ #include <linux/swap.h> #include <linux/splice.h> #include <linux/aio.h> +#include <linux/sched.h> MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); @@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req) atomic_dec(&req->count); } -static void fuse_req_init_context(struct fuse_req *req) +static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) { req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); - req->in.h.pid = current->pid; + req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); } static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) @@ -168,7 +169,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, goto out; } - fuse_req_init_context(req); + fuse_req_init_context(fc, req); req->waiting = 1; req->background = for_background; return req; @@ -257,7 +258,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc, if (!req) req = get_reserved_req(fc, file); - fuse_req_init_context(req); + fuse_req_init_context(fc, req); req->waiting = 1; req->background = 0; return req; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e8e47a6ab518..a3ded071e2c6 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -22,6 +22,7 @@ #include <linux/rbtree.h> #include <linux/poll.h> #include <linux/workqueue.h> +#include <linux/pid_namespace.h> /** Max number of pages that can be used in a single read request */ #define FUSE_MAX_PAGES_PER_REQ 32 @@ -386,6 +387,9 @@ struct fuse_conn { /** The group id for this mount */ kgid_t group_id; + /** The pid namespace for this mount */ + struct pid_namespace *pid_ns; + /** The fuse mount flags for this mount */ unsigned flags; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 03246cd9d47a..c6d8473b1a80 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -20,6 +20,7 @@ #include <linux/random.h> #include <linux/sched.h> #include <linux/exportfs.h> +#include <linux/pid_namespace.h> MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc) fc->initialized = 0; fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); + fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -953,6 +955,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) static void fuse_free_conn(struct fuse_conn *fc) { + put_pid_ns(fc->pid_ns); kfree_rcu(fc, rcu); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces 2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee @ 2014-09-05 17:10 ` Serge Hallyn 0 siblings, 0 replies; 39+ messages in thread From: Serge Hallyn @ 2014-09-05 17:10 UTC (permalink / raw) To: Seth Forshee Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > If the process reading on the fuse fd is executing in a pid > namespace then giving it the global pid of the process making > a request doesn't make sense. Instead, capture the pid namespace > when the filesystem is first mounted and translate pids into this > namespace before passing them to userspace. > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> > --- > fs/fuse/dev.c | 9 +++++---- > fs/fuse/fuse_i.h | 4 ++++ > fs/fuse/inode.c | 3 +++ > 3 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index ca887314aba9..839caebd34f1 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -20,6 +20,7 @@ > #include <linux/swap.h> > #include <linux/splice.h> > #include <linux/aio.h> > +#include <linux/sched.h> > > MODULE_ALIAS_MISCDEV(FUSE_MINOR); > MODULE_ALIAS("devname:fuse"); > @@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req) > atomic_dec(&req->count); > } > > -static void fuse_req_init_context(struct fuse_req *req) > +static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) > { > req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); > req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); > - req->in.h.pid = current->pid; > + req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); > } > > static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) > @@ -168,7 +169,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, > goto out; > } > > - fuse_req_init_context(req); > + fuse_req_init_context(fc, req); > req->waiting = 1; > req->background = for_background; > return req; > @@ -257,7 +258,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc, > if (!req) > req = get_reserved_req(fc, file); > > - fuse_req_init_context(req); > + fuse_req_init_context(fc, req); > req->waiting = 1; > req->background = 0; > return req; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index e8e47a6ab518..a3ded071e2c6 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -22,6 +22,7 @@ > #include <linux/rbtree.h> > #include <linux/poll.h> > #include <linux/workqueue.h> > +#include <linux/pid_namespace.h> > > /** Max number of pages that can be used in a single read request */ > #define FUSE_MAX_PAGES_PER_REQ 32 > @@ -386,6 +387,9 @@ struct fuse_conn { > /** The group id for this mount */ > kgid_t group_id; > > + /** The pid namespace for this mount */ > + struct pid_namespace *pid_ns; > + > /** The fuse mount flags for this mount */ > unsigned flags; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 03246cd9d47a..c6d8473b1a80 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -20,6 +20,7 @@ > #include <linux/random.h> > #include <linux/sched.h> > #include <linux/exportfs.h> > +#include <linux/pid_namespace.h> > > MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); > MODULE_DESCRIPTION("Filesystem in Userspace"); > @@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc) > fc->initialized = 0; > fc->attr_version = 1; > get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); > + fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); > } > EXPORT_SYMBOL_GPL(fuse_conn_init); > > @@ -953,6 +955,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) > > static void fuse_free_conn(struct fuse_conn *fc) > { > + put_pid_ns(fc->pid_ns); > kfree_rcu(fc, rcu); > } > > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 3/3] fuse: Add support for mounts from user namespaces 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee 2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee 2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee @ 2014-09-02 15:44 ` Seth Forshee 2014-09-05 16:48 ` Serge Hallyn 2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee ` (2 subsequent siblings) 5 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel, Seth Forshee Update fuse to support mounts from within user namespaces. This is mostly a matter of translating uids and gids into the namespace of the process reading requests before handing the requests off to userspace. Due to security concerns the namespace used should be fixed, otherwise a user might be able to pass the fuse fd to init_user_ns and inject suid files owned by a user outside the namespace in order to gain elevated privileges. For fuse we stash current_user_ns() when a filesystem is first mounted and abort the mount if this namespace is different than the one used to open the fd passed in the mount options. The allow_others options could also be a problem, as a userns mount could bypass system policy for this option and thus open the possiblity of DoS attacks. This is prevented by restricting the scope of allow_other to apply only to that superblock's userns and its children, giving the expected behavior within the userns while preventing DoS attacks on more privileged contexts. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> --- fs/fuse/dev.c | 4 ++-- fs/fuse/dir.c | 46 +++++++++++++++++++++++++++++++++------------- fs/fuse/fuse_i.h | 4 ++++ fs/fuse/inode.c | 28 ++++++++++++++++++++-------- 4 files changed, 59 insertions(+), 23 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 839caebd34f1..03c8785ed731 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req) static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) { - req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); - req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); + req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); + req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); } diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index de1d84af9f7c..c0b9968db6a1 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, stat->ino = attr->ino; stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); stat->nlink = attr->nlink; - stat->uid = make_kuid(&init_user_ns, attr->uid); - stat->gid = make_kgid(&init_user_ns, attr->gid); + stat->uid = make_kuid(fc->user_ns, attr->uid); + stat->gid = make_kgid(fc->user_ns, attr->gid); stat->rdev = inode->i_rdev; stat->atime.tv_sec = attr->atime; stat->atime.tv_nsec = attr->atimensec; @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, */ int fuse_allow_current_process(struct fuse_conn *fc) { - const struct cred *cred; + const struct cred *cred = current_cred(); - if (fc->flags & FUSE_ALLOW_OTHER) - return 1; + if (fc->flags & FUSE_ALLOW_OTHER) { + if (kuid_has_mapping(fc->user_ns, cred->euid) && + kuid_has_mapping(fc->user_ns, cred->suid) && + kuid_has_mapping(fc->user_ns, cred->uid) && + kgid_has_mapping(fc->user_ns, cred->egid) && + kgid_has_mapping(fc->user_ns, cred->sgid) && + kgid_has_mapping(fc->user_ns, cred->gid)) + return 1; + + return 0; + } - cred = current_cred(); if (uid_eq(cred->euid, fc->user_id) && uid_eq(cred->suid, fc->user_id) && uid_eq(cred->uid, fc->user_id) && @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime) return true; } -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, - bool trust_local_cmtime) +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr, + struct fuse_setattr_in *arg, bool trust_local_cmtime) { unsigned ivalid = iattr->ia_valid; if (ivalid & ATTR_MODE) arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode; - if (ivalid & ATTR_UID) - arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid); - if (ivalid & ATTR_GID) - arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid); + if (ivalid & ATTR_UID) { + arg->uid = from_kuid(fc->user_ns, iattr->ia_uid); + if (arg->uid == (uid_t)-1) + return -EINVAL; + arg->valid |= FATTR_UID; + } + if (ivalid & ATTR_GID) { + arg->gid = from_kgid(fc->user_ns, iattr->ia_gid); + if (arg->gid == (gid_t)-1) + return -EINVAL; + arg->valid |= FATTR_GID; + } if (ivalid & ATTR_SIZE) arg->valid |= FATTR_SIZE, arg->size = iattr->ia_size; if (ivalid & ATTR_ATIME) { @@ -1588,6 +1604,8 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, arg->ctime = iattr->ia_ctime.tv_sec; arg->ctimensec = iattr->ia_ctime.tv_nsec; } + + return 0; } /* @@ -1741,7 +1759,9 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr, memset(&inarg, 0, sizeof(inarg)); memset(&outarg, 0, sizeof(outarg)); - iattr_to_fattr(attr, &inarg, trust_local_cmtime); + err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime); + if (err) + goto error; if (file) { struct fuse_file *ff = file->private_data; inarg.valid |= FATTR_FH; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index a3ded071e2c6..2cfd0ca3407a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -22,6 +22,7 @@ #include <linux/rbtree.h> #include <linux/poll.h> #include <linux/workqueue.h> +#include <linux/user_namespace.h> #include <linux/pid_namespace.h> /** Max number of pages that can be used in a single read request */ @@ -387,6 +388,9 @@ struct fuse_conn { /** The group id for this mount */ kgid_t group_id; + /** The user namespace for this mount */ + struct user_namespace *user_ns; + /** The pid namespace for this mount */ struct pid_namespace *pid_ns; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index c6d8473b1a80..f3a3ded82f85 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, inode->i_ino = fuse_squash_ino(attr->ino); inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); set_nlink(inode, attr->nlink); - inode->i_uid = make_kuid(&init_user_ns, attr->uid); - inode->i_gid = make_kgid(&init_user_ns, attr->gid); + inode->i_uid = make_kuid(fc->user_ns, attr->uid); + inode->i_gid = make_kgid(fc->user_ns, attr->gid); inode->i_blocks = attr->blocks; inode->i_atime.tv_sec = attr->atime; inode->i_atime.tv_nsec = attr->atimensec; @@ -496,6 +496,8 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) memset(d, 0, sizeof(struct fuse_mount_data)); d->max_read = ~0; d->blksize = FUSE_DEFAULT_BLKSIZE; + d->user_id = make_kuid(current_user_ns(), 0); + d->group_id = make_kgid(current_user_ns(), 0); while ((p = strsep(&opt, ",")) != NULL) { int token; @@ -578,8 +580,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) struct super_block *sb = root->d_sb; struct fuse_conn *fc = get_fuse_conn_super(sb); - seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id)); - seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id)); + seq_printf(m, ",user_id=%u", + from_kuid_munged(fc->user_ns, fc->user_id)); + seq_printf(m, ",group_id=%u", + from_kgid_munged(fc->user_ns, fc->group_id)); if (fc->flags & FUSE_DEFAULT_PERMISSIONS) seq_puts(m, ",default_permissions"); if (fc->flags & FUSE_ALLOW_OTHER) @@ -618,6 +622,7 @@ void fuse_conn_init(struct fuse_conn *fc) fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); + fc->user_ns = get_user_ns(current_user_ns()); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -956,6 +961,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) static void fuse_free_conn(struct fuse_conn *fc) { put_pid_ns(fc->pid_ns); + put_user_ns(fc->user_ns); kfree_rcu(fc, rcu); } @@ -1042,8 +1048,14 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) if (!file) goto err; - if ((file->f_op != &fuse_dev_operations) || - (file->f_cred->user_ns != &init_user_ns)) + /* + * Require mount to happen from the same user namespace which + * opened /dev/fuse, otherwise users might be able to + * elevate privileges by opening in init_user_ns then + * mounting from a different namespace without MNT_NOSUID. + */ + if (file->f_op != &fuse_dev_operations || + file->f_cred->user_ns != current_user_ns()) goto err_fput; fc = kmalloc(sizeof(*fc), GFP_KERNEL); @@ -1157,7 +1169,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) static struct file_system_type fuse_fs_type = { .owner = THIS_MODULE, .name = "fuse", - .fs_flags = FS_HAS_SUBTYPE, + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, .mount = fuse_mount, .kill_sb = fuse_kill_sb_anon, }; @@ -1189,7 +1201,7 @@ static struct file_system_type fuseblk_fs_type = { .name = "fuseblk", .mount = fuse_mount_blk, .kill_sb = fuse_kill_sb_blk, - .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE, + .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT, }; MODULE_ALIAS_FS("fuseblk"); -- 1.9.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces 2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee @ 2014-09-05 16:48 ` Serge Hallyn 2014-09-05 17:36 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Serge Hallyn @ 2014-09-05 16:48 UTC (permalink / raw) To: Seth Forshee Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > Update fuse to support mounts from within user namespaces. This > is mostly a matter of translating uids and gids into the > namespace of the process reading requests before handing the > requests off to userspace. > > Due to security concerns the namespace used should be fixed, > otherwise a user might be able to pass the fuse fd to > init_user_ns and inject suid files owned by a user outside the > namespace in order to gain elevated privileges. For fuse we > stash current_user_ns() when a filesystem is first mounted and > abort the mount if this namespace is different than the one used > to open the fd passed in the mount options. > > The allow_others options could also be a problem, as a userns > mount could bypass system policy for this option and thus open > the possiblity of DoS attacks. This is prevented by restricting > the scope of allow_other to apply only to that superblock's > userns and its children, giving the expected behavior within the > userns while preventing DoS attacks on more privileged contexts. > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Thanks, Seth, just two little questions below. > --- > fs/fuse/dev.c | 4 ++-- > fs/fuse/dir.c | 46 +++++++++++++++++++++++++++++++++------------- > fs/fuse/fuse_i.h | 4 ++++ > fs/fuse/inode.c | 28 ++++++++++++++++++++-------- > 4 files changed, 59 insertions(+), 23 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 839caebd34f1..03c8785ed731 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req) > > static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) > { > - req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); > - req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); > + req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); > + req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); > req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); > } > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index de1d84af9f7c..c0b9968db6a1 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > stat->ino = attr->ino; > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > stat->nlink = attr->nlink; > - stat->uid = make_kuid(&init_user_ns, attr->uid); > - stat->gid = make_kgid(&init_user_ns, attr->gid); > + stat->uid = make_kuid(fc->user_ns, attr->uid); > + stat->gid = make_kgid(fc->user_ns, attr->gid); > stat->rdev = inode->i_rdev; > stat->atime.tv_sec = attr->atime; > stat->atime.tv_nsec = attr->atimensec; > @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, > */ > int fuse_allow_current_process(struct fuse_conn *fc) > { > - const struct cred *cred; > + const struct cred *cred = current_cred(); > > - if (fc->flags & FUSE_ALLOW_OTHER) > - return 1; > + if (fc->flags & FUSE_ALLOW_OTHER) { > + if (kuid_has_mapping(fc->user_ns, cred->euid) && > + kuid_has_mapping(fc->user_ns, cred->suid) && > + kuid_has_mapping(fc->user_ns, cred->uid) && > + kgid_has_mapping(fc->user_ns, cred->egid) && > + kgid_has_mapping(fc->user_ns, cred->sgid) && > + kgid_has_mapping(fc->user_ns, cred->gid)) Should fsuid be checked here? > + return 1; > + > + return 0; > + } > > - cred = current_cred(); > if (uid_eq(cred->euid, fc->user_id) && > uid_eq(cred->suid, fc->user_id) && > uid_eq(cred->uid, fc->user_id) && > @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime) > return true; > } > > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, > - bool trust_local_cmtime) > +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr, > + struct fuse_setattr_in *arg, bool trust_local_cmtime) > { > unsigned ivalid = iattr->ia_valid; > > if (ivalid & ATTR_MODE) > arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode; > - if (ivalid & ATTR_UID) > - arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid); > - if (ivalid & ATTR_GID) > - arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid); > + if (ivalid & ATTR_UID) { > + arg->uid = from_kuid(fc->user_ns, iattr->ia_uid); > + if (arg->uid == (uid_t)-1) Any reason not to use uid_valid() here (and gid_valid() below)? > + return -EINVAL; > + arg->valid |= FATTR_UID; > + } > + if (ivalid & ATTR_GID) { > + arg->gid = from_kgid(fc->user_ns, iattr->ia_gid); > + if (arg->gid == (gid_t)-1) > + return -EINVAL; > + arg->valid |= FATTR_GID; > + } > if (ivalid & ATTR_SIZE) > arg->valid |= FATTR_SIZE, arg->size = iattr->ia_size; > if (ivalid & ATTR_ATIME) { > @@ -1588,6 +1604,8 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, > arg->ctime = iattr->ia_ctime.tv_sec; > arg->ctimensec = iattr->ia_ctime.tv_nsec; > } > + > + return 0; > } > > /* > @@ -1741,7 +1759,9 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr, > > memset(&inarg, 0, sizeof(inarg)); > memset(&outarg, 0, sizeof(outarg)); > - iattr_to_fattr(attr, &inarg, trust_local_cmtime); > + err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime); > + if (err) > + goto error; > if (file) { > struct fuse_file *ff = file->private_data; > inarg.valid |= FATTR_FH; > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index a3ded071e2c6..2cfd0ca3407a 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -22,6 +22,7 @@ > #include <linux/rbtree.h> > #include <linux/poll.h> > #include <linux/workqueue.h> > +#include <linux/user_namespace.h> > #include <linux/pid_namespace.h> > > /** Max number of pages that can be used in a single read request */ > @@ -387,6 +388,9 @@ struct fuse_conn { > /** The group id for this mount */ > kgid_t group_id; > > + /** The user namespace for this mount */ > + struct user_namespace *user_ns; > + > /** The pid namespace for this mount */ > struct pid_namespace *pid_ns; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index c6d8473b1a80..f3a3ded82f85 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, > inode->i_ino = fuse_squash_ino(attr->ino); > inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > set_nlink(inode, attr->nlink); > - inode->i_uid = make_kuid(&init_user_ns, attr->uid); > - inode->i_gid = make_kgid(&init_user_ns, attr->gid); > + inode->i_uid = make_kuid(fc->user_ns, attr->uid); > + inode->i_gid = make_kgid(fc->user_ns, attr->gid); > inode->i_blocks = attr->blocks; > inode->i_atime.tv_sec = attr->atime; > inode->i_atime.tv_nsec = attr->atimensec; > @@ -496,6 +496,8 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) > memset(d, 0, sizeof(struct fuse_mount_data)); > d->max_read = ~0; > d->blksize = FUSE_DEFAULT_BLKSIZE; > + d->user_id = make_kuid(current_user_ns(), 0); > + d->group_id = make_kgid(current_user_ns(), 0); > > while ((p = strsep(&opt, ",")) != NULL) { > int token; > @@ -578,8 +580,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) > struct super_block *sb = root->d_sb; > struct fuse_conn *fc = get_fuse_conn_super(sb); > > - seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id)); > - seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id)); > + seq_printf(m, ",user_id=%u", > + from_kuid_munged(fc->user_ns, fc->user_id)); > + seq_printf(m, ",group_id=%u", > + from_kgid_munged(fc->user_ns, fc->group_id)); > if (fc->flags & FUSE_DEFAULT_PERMISSIONS) > seq_puts(m, ",default_permissions"); > if (fc->flags & FUSE_ALLOW_OTHER) > @@ -618,6 +622,7 @@ void fuse_conn_init(struct fuse_conn *fc) > fc->attr_version = 1; > get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); > fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); > + fc->user_ns = get_user_ns(current_user_ns()); > } > EXPORT_SYMBOL_GPL(fuse_conn_init); > > @@ -956,6 +961,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) > static void fuse_free_conn(struct fuse_conn *fc) > { > put_pid_ns(fc->pid_ns); > + put_user_ns(fc->user_ns); > kfree_rcu(fc, rcu); > } > > @@ -1042,8 +1048,14 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) > if (!file) > goto err; > > - if ((file->f_op != &fuse_dev_operations) || > - (file->f_cred->user_ns != &init_user_ns)) > + /* > + * Require mount to happen from the same user namespace which > + * opened /dev/fuse, otherwise users might be able to > + * elevate privileges by opening in init_user_ns then > + * mounting from a different namespace without MNT_NOSUID. > + */ > + if (file->f_op != &fuse_dev_operations || > + file->f_cred->user_ns != current_user_ns()) > goto err_fput; > > fc = kmalloc(sizeof(*fc), GFP_KERNEL); > @@ -1157,7 +1169,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) > static struct file_system_type fuse_fs_type = { > .owner = THIS_MODULE, > .name = "fuse", > - .fs_flags = FS_HAS_SUBTYPE, > + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, > .mount = fuse_mount, > .kill_sb = fuse_kill_sb_anon, > }; > @@ -1189,7 +1201,7 @@ static struct file_system_type fuseblk_fs_type = { > .name = "fuseblk", > .mount = fuse_mount_blk, > .kill_sb = fuse_kill_sb_blk, > - .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE, > + .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT, > }; > MODULE_ALIAS_FS("fuseblk"); > > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces 2014-09-05 16:48 ` Serge Hallyn @ 2014-09-05 17:36 ` Seth Forshee 2014-09-05 19:25 ` Serge Hallyn 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-05 17:36 UTC (permalink / raw) To: Serge Hallyn Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel, seth.forshee On Fri, Sep 05, 2014 at 04:48:11PM +0000, Serge Hallyn wrote: > Quoting Seth Forshee (seth.forshee@canonical.com): > > Update fuse to support mounts from within user namespaces. This > > is mostly a matter of translating uids and gids into the > > namespace of the process reading requests before handing the > > requests off to userspace. > > > > Due to security concerns the namespace used should be fixed, > > otherwise a user might be able to pass the fuse fd to > > init_user_ns and inject suid files owned by a user outside the > > namespace in order to gain elevated privileges. For fuse we > > stash current_user_ns() when a filesystem is first mounted and > > abort the mount if this namespace is different than the one used > > to open the fd passed in the mount options. > > > > The allow_others options could also be a problem, as a userns > > mount could bypass system policy for this option and thus open > > the possiblity of DoS attacks. This is prevented by restricting > > the scope of allow_other to apply only to that superblock's > > userns and its children, giving the expected behavior within the > > userns while preventing DoS attacks on more privileged contexts. > > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> > > Thanks, Seth, just two little questions below. > > > --- > > fs/fuse/dev.c | 4 ++-- > > fs/fuse/dir.c | 46 +++++++++++++++++++++++++++++++++------------- > > fs/fuse/fuse_i.h | 4 ++++ > > fs/fuse/inode.c | 28 ++++++++++++++++++++-------- > > 4 files changed, 59 insertions(+), 23 deletions(-) > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index 839caebd34f1..03c8785ed731 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req) > > > > static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) > > { > > - req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); > > - req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); > > + req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); > > + req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); > > req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); > > } > > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > index de1d84af9f7c..c0b9968db6a1 100644 > > --- a/fs/fuse/dir.c > > +++ b/fs/fuse/dir.c > > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > > stat->ino = attr->ino; > > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > > stat->nlink = attr->nlink; > > - stat->uid = make_kuid(&init_user_ns, attr->uid); > > - stat->gid = make_kgid(&init_user_ns, attr->gid); > > + stat->uid = make_kuid(fc->user_ns, attr->uid); > > + stat->gid = make_kgid(fc->user_ns, attr->gid); > > stat->rdev = inode->i_rdev; > > stat->atime.tv_sec = attr->atime; > > stat->atime.tv_nsec = attr->atimensec; > > @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, > > */ > > int fuse_allow_current_process(struct fuse_conn *fc) > > { > > - const struct cred *cred; > > + const struct cred *cred = current_cred(); > > > > - if (fc->flags & FUSE_ALLOW_OTHER) > > - return 1; > > + if (fc->flags & FUSE_ALLOW_OTHER) { > > + if (kuid_has_mapping(fc->user_ns, cred->euid) && > > + kuid_has_mapping(fc->user_ns, cred->suid) && > > + kuid_has_mapping(fc->user_ns, cred->uid) && > > + kgid_has_mapping(fc->user_ns, cred->egid) && > > + kgid_has_mapping(fc->user_ns, cred->sgid) && > > + kgid_has_mapping(fc->user_ns, cred->gid)) > > Should fsuid be checked here? The point of restricting access here is to prevent a DoS type of attack on a more privileged context by making a filesystem operation block indefinitely. Coming from that perspective I was thinking that these checks ought to be sufficient, but I could be wrong. > > > + return 1; > > + > > + return 0; > > + } > > > > - cred = current_cred(); > > if (uid_eq(cred->euid, fc->user_id) && > > uid_eq(cred->suid, fc->user_id) && > > uid_eq(cred->uid, fc->user_id) && > > @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime) > > return true; > > } > > > > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, > > - bool trust_local_cmtime) > > +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr, > > + struct fuse_setattr_in *arg, bool trust_local_cmtime) > > { > > unsigned ivalid = iattr->ia_valid; > > > > if (ivalid & ATTR_MODE) > > arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode; > > - if (ivalid & ATTR_UID) > > - arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid); > > - if (ivalid & ATTR_GID) > > - arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid); > > + if (ivalid & ATTR_UID) { > > + arg->uid = from_kuid(fc->user_ns, iattr->ia_uid); > > + if (arg->uid == (uid_t)-1) > > Any reason not to use uid_valid() here (and gid_valid() below)? Yes. arg->uid is a uid_t and not a kuid_t, so it wouldn't be valid to pass that to uid_valid(). And from_kuid() can return -1 for values other than INVALID_UID. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces 2014-09-05 17:36 ` Seth Forshee @ 2014-09-05 19:25 ` Serge Hallyn 0 siblings, 0 replies; 39+ messages in thread From: Serge Hallyn @ 2014-09-05 19:25 UTC (permalink / raw) To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > On Fri, Sep 05, 2014 at 04:48:11PM +0000, Serge Hallyn wrote: > > Quoting Seth Forshee (seth.forshee@canonical.com): > > > Update fuse to support mounts from within user namespaces. This > > > is mostly a matter of translating uids and gids into the > > > namespace of the process reading requests before handing the > > > requests off to userspace. > > > > > > Due to security concerns the namespace used should be fixed, > > > otherwise a user might be able to pass the fuse fd to > > > init_user_ns and inject suid files owned by a user outside the > > > namespace in order to gain elevated privileges. For fuse we > > > stash current_user_ns() when a filesystem is first mounted and > > > abort the mount if this namespace is different than the one used > > > to open the fd passed in the mount options. > > > > > > The allow_others options could also be a problem, as a userns > > > mount could bypass system policy for this option and thus open > > > the possiblity of DoS attacks. This is prevented by restricting > > > the scope of allow_other to apply only to that superblock's > > > userns and its children, giving the expected behavior within the > > > userns while preventing DoS attacks on more privileged contexts. > > > > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> > > > > Thanks, Seth, just two little questions below. > > > > > --- > > > fs/fuse/dev.c | 4 ++-- > > > fs/fuse/dir.c | 46 +++++++++++++++++++++++++++++++++------------- > > > fs/fuse/fuse_i.h | 4 ++++ > > > fs/fuse/inode.c | 28 ++++++++++++++++++++-------- > > > 4 files changed, 59 insertions(+), 23 deletions(-) > > > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > index 839caebd34f1..03c8785ed731 100644 > > > --- a/fs/fuse/dev.c > > > +++ b/fs/fuse/dev.c > > > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req) > > > > > > static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) > > > { > > > - req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); > > > - req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); > > > + req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); > > > + req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); > > > req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); > > > } > > > > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > > index de1d84af9f7c..c0b9968db6a1 100644 > > > --- a/fs/fuse/dir.c > > > +++ b/fs/fuse/dir.c > > > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > > > stat->ino = attr->ino; > > > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > > > stat->nlink = attr->nlink; > > > - stat->uid = make_kuid(&init_user_ns, attr->uid); > > > - stat->gid = make_kgid(&init_user_ns, attr->gid); > > > + stat->uid = make_kuid(fc->user_ns, attr->uid); > > > + stat->gid = make_kgid(fc->user_ns, attr->gid); > > > stat->rdev = inode->i_rdev; > > > stat->atime.tv_sec = attr->atime; > > > stat->atime.tv_nsec = attr->atimensec; > > > @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, > > > */ > > > int fuse_allow_current_process(struct fuse_conn *fc) > > > { > > > - const struct cred *cred; > > > + const struct cred *cred = current_cred(); > > > > > > - if (fc->flags & FUSE_ALLOW_OTHER) > > > - return 1; > > > + if (fc->flags & FUSE_ALLOW_OTHER) { > > > + if (kuid_has_mapping(fc->user_ns, cred->euid) && > > > + kuid_has_mapping(fc->user_ns, cred->suid) && > > > + kuid_has_mapping(fc->user_ns, cred->uid) && > > > + kgid_has_mapping(fc->user_ns, cred->egid) && > > > + kgid_has_mapping(fc->user_ns, cred->sgid) && > > > + kgid_has_mapping(fc->user_ns, cred->gid)) > > > > Should fsuid be checked here? > > The point of restricting access here is to prevent a DoS type of attack > on a more privileged context by making a filesystem operation block > indefinitely. Coming from that perspective I was thinking that these > checks ought to be sufficient, but I could be wrong. I supose it would, as euid 0 would already not be mapped. > > > + return 1; > > > + > > > + return 0; > > > + } > > > > > > - cred = current_cred(); > > > if (uid_eq(cred->euid, fc->user_id) && > > > uid_eq(cred->suid, fc->user_id) && > > > uid_eq(cred->uid, fc->user_id) && > > > @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime) > > > return true; > > > } > > > > > > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, > > > - bool trust_local_cmtime) > > > +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr, > > > + struct fuse_setattr_in *arg, bool trust_local_cmtime) > > > { > > > unsigned ivalid = iattr->ia_valid; > > > > > > if (ivalid & ATTR_MODE) > > > arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode; > > > - if (ivalid & ATTR_UID) > > > - arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid); > > > - if (ivalid & ATTR_GID) > > > - arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid); > > > + if (ivalid & ATTR_UID) { > > > + arg->uid = from_kuid(fc->user_ns, iattr->ia_uid); > > > + if (arg->uid == (uid_t)-1) > > > > Any reason not to use uid_valid() here (and gid_valid() below)? > > Yes. arg->uid is a uid_t and not a kuid_t, so it wouldn't be valid to > pass that to uid_valid(). And from_kuid() can return -1 for values other > than INVALID_UID. D'oh. Right. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee ` (2 preceding siblings ...) 2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee @ 2014-09-05 20:40 ` Seth Forshee 2014-09-10 12:35 ` Seth Forshee 2014-09-23 16:07 ` Miklos Szeredi 5 siblings, 0 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-05 20:40 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel, seth.forshee On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > Here's an updated set of patches for allowing fuse mounts from pid and > user namespaces. I discussed some of the issues we debated with the last > patch set (and a few others) with Eric at LinuxCon, and the updates here > mainly reflect the outcome of those discussions. > > The stickiest issue in the v1 patches was the question of where to get > the user and pid namespaces from that are used for translating ids for > communication with userspace. Eric told me that for user namespaces at > least we need to grab a namespace at open or mount time and use only > that namespace to prevent certain types of attacks. That rules out the > suggestion of using the user ns of current in the read/write paths, and > I think it makes sense to handle pid and user namespaces similarly. So > in these patches I'm still grabbing the namespaces of current during > mount, but I've added an additional check to fail the mount if the > f_cred's userns for the fd to userspace doesn't match. > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > from userspace don't map into the user namespace, which is going to be a > problem for any other filesystems which become mountable from user > namespaces as well. We discussed a few options for addressing this, the > most promising of which seems to be either using INVALID_[UG]ID for > these inodes or creating vfs-wide "nobody" ids for this purpose. After > thinking about it for a while I'm favoring using the invalid ids, but > I'm hoping to solicit some more feedback. > > For now these patches are using invalid ids if the user doesn't map into > the namespace. I went through the vfs code and found one place where > this could be handled better (addressed in patch 1 of the series). The > only other issue I found was that currently no one, not even root, can > change onwership of such inodes, but I suspect we can find a way around > this. > > The only other change since v1 is that I now fail changing file > ownership if the new uid or gid does not map into the namespace used for > userspace communication. I forgot that I did change one other thing. In v1 I didn't allow fuseblk mounts from user namespaces since I hadn't gotten around to testing or looking at the differences between it and normal fuse mounts yet. I've found time to do so since then and everything seems to be in good order, so I've enabled mounting fuseblk in user namespaces as well in the v2 patches. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee ` (3 preceding siblings ...) 2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee @ 2014-09-10 12:35 ` Seth Forshee 2014-09-10 16:21 ` Serge E. Hallyn 2014-09-23 16:07 ` Miklos Szeredi 5 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-10 12:35 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > from userspace don't map into the user namespace, which is going to be a > problem for any other filesystems which become mountable from user > namespaces as well. We discussed a few options for addressing this, the > most promising of which seems to be either using INVALID_[UG]ID for > these inodes or creating vfs-wide "nobody" ids for this purpose. After > thinking about it for a while I'm favoring using the invalid ids, but > I'm hoping to solicit some more feedback. > > For now these patches are using invalid ids if the user doesn't map into > the namespace. I went through the vfs code and found one place where > this could be handled better (addressed in patch 1 of the series). The > only other issue I found was that currently no one, not even root, can > change onwership of such inodes, but I suspect we can find a way around > this. I started playing around with using -2 as a global nobody id. The changes below (made on top of this series) are working fine in light testing. I'm still not sure about the security implications of doing this though. Possibly the nobody id should be removed from init_user_ns to prevent any use of the id to gain unauthroized access to such files. Thoughts? diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index c0b9968db6a1..893fc0d6ff96 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, stat->ino = attr->ino; stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); stat->nlink = attr->nlink; - stat->uid = make_kuid(fc->user_ns, attr->uid); - stat->gid = make_kgid(fc->user_ns, attr->gid); + stat->uid = make_kuid_munged(fc->user_ns, attr->uid); + stat->gid = make_kgid_munged(fc->user_ns, attr->gid); stat->rdev = inode->i_rdev; stat->atime.tv_sec = attr->atime; stat->atime.tv_nsec = attr->atimensec; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f3a3ded82f85..330ac3d502a6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, inode->i_ino = fuse_squash_ino(attr->ino); inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); set_nlink(inode, attr->nlink); - inode->i_uid = make_kuid(fc->user_ns, attr->uid); - inode->i_gid = make_kgid(fc->user_ns, attr->gid); + inode->i_uid = make_kuid_munged(fc->user_ns, attr->uid); + inode->i_gid = make_kgid_munged(fc->user_ns, attr->gid); inode->i_blocks = attr->blocks; inode->i_atime.tv_sec = attr->atime; inode->i_atime.tv_nsec = attr->atimensec; diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h index 6c302267f7cc..9054273af163 100644 --- a/include/linux/uidgid.h +++ b/include/linux/uidgid.h @@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid) #define INVALID_UID KUIDT_INIT(-1) #define INVALID_GID KGIDT_INIT(-1) +#define NOBODY_UID KUIDT_INIT(-2) +#define NOBODY_GID KGIDT_INIT(-2) + static inline bool uid_eq(kuid_t left, kuid_t right) { return __kuid_val(left) == __kuid_val(right); @@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) #endif /* CONFIG_USER_NS */ +/** + * make_kuid_munged - Map a user-namespace uid pair into a kuid + * @from: User namespace that the uid is in + * @uid: User identifier + * + * Maps a user-namespace uid pair into a kernel internal kuid, + * and returns that kuid. + * + * Unlike make_kuid, make_kuid_munged never fails and always + * returns a valid uid. If @uid has no mapping in @from + * NOBODY_UID is returned. + */ +static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid) +{ + kuid_t kuid = make_kuid(from, uid); + if (!uid_valid(kuid)) + kuid = NOBODY_UID; + return kuid; +} + +/** + * make_kgid_munged - Map a user-namespace gid pair into a kgid + * @from: User namespace that the gid is in + * @gid: User identifier + * + * Maps a user-namespace gid pair into a kernel internal kgid, + * and returns that kgid. + * + * Unlike make_kgid, make_kgid_munged never fails and always + * returns a valid gid. If @gid has no mapping in @from + * NOBODY_GID is returned. + */ +static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid) +{ + kgid_t kgid = make_kgid(from, gid); + if (!gid_valid(kgid)) + kgid = NOBODY_GID; + return kgid; +} + #endif /* _LINUX_UIDGID_H */ ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-10 12:35 ` Seth Forshee @ 2014-09-10 16:21 ` Serge E. Hallyn 2014-09-10 16:42 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Serge E. Hallyn @ 2014-09-10 16:21 UTC (permalink / raw) To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel Quoting Seth Forshee (seth.forshee@canonical.com): > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > > from userspace don't map into the user namespace, which is going to be a > > problem for any other filesystems which become mountable from user > > namespaces as well. We discussed a few options for addressing this, the > > most promising of which seems to be either using INVALID_[UG]ID for > > these inodes or creating vfs-wide "nobody" ids for this purpose. After > > thinking about it for a while I'm favoring using the invalid ids, but > > I'm hoping to solicit some more feedback. > > > > For now these patches are using invalid ids if the user doesn't map into > > the namespace. I went through the vfs code and found one place where > > this could be handled better (addressed in patch 1 of the series). The > > only other issue I found was that currently no one, not even root, can > > change onwership of such inodes, but I suspect we can find a way around > > this. > > I started playing around with using -2 as a global nobody id. The > changes below (made on top of this series) are working fine in light > testing. I'm still not sure about the security implications of doing > this though. Possibly the nobody id should be removed from init_user_ns > to prevent any use of the id to gain unauthroized access to such files. > Thoughts? Can you explain the downsides of just using -1? What are we able to do (as a fuse-in-container user) when we use -2 that we can't do when it uses -1? > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index c0b9968db6a1..893fc0d6ff96 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > stat->ino = attr->ino; > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > stat->nlink = attr->nlink; > - stat->uid = make_kuid(fc->user_ns, attr->uid); > - stat->gid = make_kgid(fc->user_ns, attr->gid); > + stat->uid = make_kuid_munged(fc->user_ns, attr->uid); > + stat->gid = make_kgid_munged(fc->user_ns, attr->gid); > stat->rdev = inode->i_rdev; > stat->atime.tv_sec = attr->atime; > stat->atime.tv_nsec = attr->atimensec; > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index f3a3ded82f85..330ac3d502a6 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, > inode->i_ino = fuse_squash_ino(attr->ino); > inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > set_nlink(inode, attr->nlink); > - inode->i_uid = make_kuid(fc->user_ns, attr->uid); > - inode->i_gid = make_kgid(fc->user_ns, attr->gid); > + inode->i_uid = make_kuid_munged(fc->user_ns, attr->uid); > + inode->i_gid = make_kgid_munged(fc->user_ns, attr->gid); > inode->i_blocks = attr->blocks; > inode->i_atime.tv_sec = attr->atime; > inode->i_atime.tv_nsec = attr->atimensec; > diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h > index 6c302267f7cc..9054273af163 100644 > --- a/include/linux/uidgid.h > +++ b/include/linux/uidgid.h > @@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid) > #define INVALID_UID KUIDT_INIT(-1) > #define INVALID_GID KGIDT_INIT(-1) > > +#define NOBODY_UID KUIDT_INIT(-2) > +#define NOBODY_GID KGIDT_INIT(-2) > + > static inline bool uid_eq(kuid_t left, kuid_t right) > { > return __kuid_val(left) == __kuid_val(right); > @@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) > > #endif /* CONFIG_USER_NS */ > > +/** > + * make_kuid_munged - Map a user-namespace uid pair into a kuid > + * @from: User namespace that the uid is in > + * @uid: User identifier > + * > + * Maps a user-namespace uid pair into a kernel internal kuid, > + * and returns that kuid. > + * > + * Unlike make_kuid, make_kuid_munged never fails and always > + * returns a valid uid. If @uid has no mapping in @from > + * NOBODY_UID is returned. > + */ > +static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid) > +{ > + kuid_t kuid = make_kuid(from, uid); > + if (!uid_valid(kuid)) > + kuid = NOBODY_UID; > + return kuid; > +} > + > +/** > + * make_kgid_munged - Map a user-namespace gid pair into a kgid > + * @from: User namespace that the gid is in > + * @gid: User identifier > + * > + * Maps a user-namespace gid pair into a kernel internal kgid, > + * and returns that kgid. > + * > + * Unlike make_kgid, make_kgid_munged never fails and always > + * returns a valid gid. If @gid has no mapping in @from > + * NOBODY_GID is returned. > + */ > +static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid) > +{ > + kgid_t kgid = make_kgid(from, gid); > + if (!gid_valid(kgid)) > + kgid = NOBODY_GID; > + return kgid; > +} > + > #endif /* _LINUX_UIDGID_H */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-10 16:21 ` Serge E. Hallyn @ 2014-09-10 16:42 ` Seth Forshee 2014-09-11 18:10 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-10 16:42 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote: > Quoting Seth Forshee (seth.forshee@canonical.com): > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > > > from userspace don't map into the user namespace, which is going to be a > > > problem for any other filesystems which become mountable from user > > > namespaces as well. We discussed a few options for addressing this, the > > > most promising of which seems to be either using INVALID_[UG]ID for > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After > > > thinking about it for a while I'm favoring using the invalid ids, but > > > I'm hoping to solicit some more feedback. > > > > > > For now these patches are using invalid ids if the user doesn't map into > > > the namespace. I went through the vfs code and found one place where > > > this could be handled better (addressed in patch 1 of the series). The > > > only other issue I found was that currently no one, not even root, can > > > change onwership of such inodes, but I suspect we can find a way around > > > this. > > > > I started playing around with using -2 as a global nobody id. The > > changes below (made on top of this series) are working fine in light > > testing. I'm still not sure about the security implications of doing > > this though. Possibly the nobody id should be removed from init_user_ns > > to prevent any use of the id to gain unauthroized access to such files. > > Thoughts? > > Can you explain the downsides of just using -1? What are we able to do > (as a fuse-in-container user) when we use -2 that we can't do when it > uses -1? The thing that happens with -1 is that checks like capable_wrt_inode_uidgid() always fail on those inodes because INVALID_UID isn't ever mapped into any namespace, even if you're system-wide root. If we use a real id this check would at least pass in init_user_ns, assuming that we don't have to remove -2 from init_user_ns for security reasons. Or we could modify these checks so that CAP_SYS_ADMIN always gets permission or something like that, which might be the better way to go. > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > index c0b9968db6a1..893fc0d6ff96 100644 > > --- a/fs/fuse/dir.c > > +++ b/fs/fuse/dir.c > > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > > stat->ino = attr->ino; > > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > > stat->nlink = attr->nlink; > > - stat->uid = make_kuid(fc->user_ns, attr->uid); > > - stat->gid = make_kgid(fc->user_ns, attr->gid); > > + stat->uid = make_kuid_munged(fc->user_ns, attr->uid); > > + stat->gid = make_kgid_munged(fc->user_ns, attr->gid); > > stat->rdev = inode->i_rdev; > > stat->atime.tv_sec = attr->atime; > > stat->atime.tv_nsec = attr->atimensec; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > index f3a3ded82f85..330ac3d502a6 100644 > > --- a/fs/fuse/inode.c > > +++ b/fs/fuse/inode.c > > @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, > > inode->i_ino = fuse_squash_ino(attr->ino); > > inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > > set_nlink(inode, attr->nlink); > > - inode->i_uid = make_kuid(fc->user_ns, attr->uid); > > - inode->i_gid = make_kgid(fc->user_ns, attr->gid); > > + inode->i_uid = make_kuid_munged(fc->user_ns, attr->uid); > > + inode->i_gid = make_kgid_munged(fc->user_ns, attr->gid); > > inode->i_blocks = attr->blocks; > > inode->i_atime.tv_sec = attr->atime; > > inode->i_atime.tv_nsec = attr->atimensec; > > diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h > > index 6c302267f7cc..9054273af163 100644 > > --- a/include/linux/uidgid.h > > +++ b/include/linux/uidgid.h > > @@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid) > > #define INVALID_UID KUIDT_INIT(-1) > > #define INVALID_GID KGIDT_INIT(-1) > > > > +#define NOBODY_UID KUIDT_INIT(-2) > > +#define NOBODY_GID KGIDT_INIT(-2) > > + > > static inline bool uid_eq(kuid_t left, kuid_t right) > > { > > return __kuid_val(left) == __kuid_val(right); > > @@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) > > > > #endif /* CONFIG_USER_NS */ > > > > +/** > > + * make_kuid_munged - Map a user-namespace uid pair into a kuid > > + * @from: User namespace that the uid is in > > + * @uid: User identifier > > + * > > + * Maps a user-namespace uid pair into a kernel internal kuid, > > + * and returns that kuid. > > + * > > + * Unlike make_kuid, make_kuid_munged never fails and always > > + * returns a valid uid. If @uid has no mapping in @from > > + * NOBODY_UID is returned. > > + */ > > +static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid) > > +{ > > + kuid_t kuid = make_kuid(from, uid); > > + if (!uid_valid(kuid)) > > + kuid = NOBODY_UID; > > + return kuid; > > +} > > + > > +/** > > + * make_kgid_munged - Map a user-namespace gid pair into a kgid > > + * @from: User namespace that the gid is in > > + * @gid: User identifier > > + * > > + * Maps a user-namespace gid pair into a kernel internal kgid, > > + * and returns that kgid. > > + * > > + * Unlike make_kgid, make_kgid_munged never fails and always > > + * returns a valid gid. If @gid has no mapping in @from > > + * NOBODY_GID is returned. > > + */ > > +static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid) > > +{ > > + kgid_t kgid = make_kgid(from, gid); > > + if (!gid_valid(kgid)) > > + kgid = NOBODY_GID; > > + return kgid; > > +} > > + > > #endif /* _LINUX_UIDGID_H */ > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-10 16:42 ` Seth Forshee @ 2014-09-11 18:10 ` Seth Forshee 2014-09-23 22:29 ` Eric W. Biederman 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-11 18:10 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote: > On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote: > > Quoting Seth Forshee (seth.forshee@canonical.com): > > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > > > > from userspace don't map into the user namespace, which is going to be a > > > > problem for any other filesystems which become mountable from user > > > > namespaces as well. We discussed a few options for addressing this, the > > > > most promising of which seems to be either using INVALID_[UG]ID for > > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After > > > > thinking about it for a while I'm favoring using the invalid ids, but > > > > I'm hoping to solicit some more feedback. > > > > > > > > For now these patches are using invalid ids if the user doesn't map into > > > > the namespace. I went through the vfs code and found one place where > > > > this could be handled better (addressed in patch 1 of the series). The > > > > only other issue I found was that currently no one, not even root, can > > > > change onwership of such inodes, but I suspect we can find a way around > > > > this. > > > > > > I started playing around with using -2 as a global nobody id. The > > > changes below (made on top of this series) are working fine in light > > > testing. I'm still not sure about the security implications of doing > > > this though. Possibly the nobody id should be removed from init_user_ns > > > to prevent any use of the id to gain unauthroized access to such files. > > > Thoughts? > > > > Can you explain the downsides of just using -1? What are we able to do > > (as a fuse-in-container user) when we use -2 that we can't do when it > > uses -1? > > The thing that happens with -1 is that checks like > capable_wrt_inode_uidgid() always fail on those inodes because > INVALID_UID isn't ever mapped into any namespace, even if you're > system-wide root. If we use a real id this check would at least pass in > init_user_ns, assuming that we don't have to remove -2 from init_user_ns > for security reasons. > > Or we could modify these checks so that CAP_SYS_ADMIN always gets > permission or something like that, which might be the better way to go. This ought to do (untested as of yet). I think I like this best, but I'm also interested in hearing what Eric has to say. diff --git a/fs/inode.c b/fs/inode.c index 26753ba7b6d6..1029320ff029 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1840,6 +1840,9 @@ bool inode_owner_or_capable(const struct inode *inode) { struct user_namespace *ns; + if (capable(CAP_SYS_ADMIN)) + return true; + if (uid_eq(current_fsuid(), inode->i_uid)) return true; diff --git a/kernel/capability.c b/kernel/capability.c index 989f5bfc57dc..a472eaa52b6a 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -438,8 +438,12 @@ EXPORT_SYMBOL(capable); */ bool capable_wrt_inode_uidgid(const struct inode *inode, int cap) { - struct user_namespace *ns = current_user_ns(); + struct user_namespace *ns; + if (capable(CAP_SYS_ADMIN)) + return true; + + ns = current_user_ns(); return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) && kgid_has_mapping(ns, inode->i_gid); } ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-11 18:10 ` Seth Forshee @ 2014-09-23 22:29 ` Eric W. Biederman 2014-09-24 13:29 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-23 22:29 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel Seth Forshee <seth.forshee@canonical.com> writes: > On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote: >> On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote: >> > Quoting Seth Forshee (seth.forshee@canonical.com): >> > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: >> > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids >> > > > from userspace don't map into the user namespace, which is going to be a >> > > > problem for any other filesystems which become mountable from user >> > > > namespaces as well. We discussed a few options for addressing this, the >> > > > most promising of which seems to be either using INVALID_[UG]ID for >> > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After >> > > > thinking about it for a while I'm favoring using the invalid ids, but >> > > > I'm hoping to solicit some more feedback. >> > > > >> > > > For now these patches are using invalid ids if the user doesn't map into >> > > > the namespace. I went through the vfs code and found one place where >> > > > this could be handled better (addressed in patch 1 of the series). The >> > > > only other issue I found was that currently no one, not even root, can >> > > > change onwership of such inodes, but I suspect we can find a way around >> > > > this. >> > > >> > > I started playing around with using -2 as a global nobody id. The >> > > changes below (made on top of this series) are working fine in light >> > > testing. I'm still not sure about the security implications of doing >> > > this though. Possibly the nobody id should be removed from init_user_ns >> > > to prevent any use of the id to gain unauthroized access to such files. >> > > Thoughts? >> > >> > Can you explain the downsides of just using -1? What are we able to do >> > (as a fuse-in-container user) when we use -2 that we can't do when it >> > uses -1? >> >> The thing that happens with -1 is that checks like >> capable_wrt_inode_uidgid() always fail on those inodes because >> INVALID_UID isn't ever mapped into any namespace, even if you're >> system-wide root. If we use a real id this check would at least pass in >> init_user_ns, assuming that we don't have to remove -2 from init_user_ns >> for security reasons. >> >> Or we could modify these checks so that CAP_SYS_ADMIN always gets >> permission or something like that, which might be the better way to go. > > This ought to do (untested as of yet). I think I like this best, but I'm > also interested in hearing what Eric has to say. So thinking about this and staring at fuse a little more I don't like the approach of mapping bad uids into INVALID_UID in the case of fuse. What scares me is that we are looking at a very uncommon case that no one will test much. And depending for security on a very subtle interpretation of semantics that only matter when someone is attacking us seems scary to me. For fuse at least I would assume that any time a file shows up with a uid that doesn't map, and we map it to INVALID_UID I would assume it is the work of a hostile user. It could be a misconfiguration but it is a broken action on the part of the filesystem in either case. Therefore given that a bad uid can only occur as a result of accidental or hostile action can we please call make_bad_inode in those cases? And thus always return -EIO. That way no one needs to consider what happens if we cache bad data or we try to use bad data, and it is an easy position to retreat from if it happens that we need to do something different. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-23 22:29 ` Eric W. Biederman @ 2014-09-24 13:29 ` Seth Forshee 2014-09-24 17:10 ` Eric W. Biederman 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-24 13:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel On Tue, Sep 23, 2014 at 03:29:57PM -0700, Eric W. Biederman wrote: > Seth Forshee <seth.forshee@canonical.com> writes: > > > On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote: > >> On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote: > >> > Quoting Seth Forshee (seth.forshee@canonical.com): > >> > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote: > >> > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids > >> > > > from userspace don't map into the user namespace, which is going to be a > >> > > > problem for any other filesystems which become mountable from user > >> > > > namespaces as well. We discussed a few options for addressing this, the > >> > > > most promising of which seems to be either using INVALID_[UG]ID for > >> > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After > >> > > > thinking about it for a while I'm favoring using the invalid ids, but > >> > > > I'm hoping to solicit some more feedback. > >> > > > > >> > > > For now these patches are using invalid ids if the user doesn't map into > >> > > > the namespace. I went through the vfs code and found one place where > >> > > > this could be handled better (addressed in patch 1 of the series). The > >> > > > only other issue I found was that currently no one, not even root, can > >> > > > change onwership of such inodes, but I suspect we can find a way around > >> > > > this. > >> > > > >> > > I started playing around with using -2 as a global nobody id. The > >> > > changes below (made on top of this series) are working fine in light > >> > > testing. I'm still not sure about the security implications of doing > >> > > this though. Possibly the nobody id should be removed from init_user_ns > >> > > to prevent any use of the id to gain unauthroized access to such files. > >> > > Thoughts? > >> > > >> > Can you explain the downsides of just using -1? What are we able to do > >> > (as a fuse-in-container user) when we use -2 that we can't do when it > >> > uses -1? > >> > >> The thing that happens with -1 is that checks like > >> capable_wrt_inode_uidgid() always fail on those inodes because > >> INVALID_UID isn't ever mapped into any namespace, even if you're > >> system-wide root. If we use a real id this check would at least pass in > >> init_user_ns, assuming that we don't have to remove -2 from init_user_ns > >> for security reasons. > >> > >> Or we could modify these checks so that CAP_SYS_ADMIN always gets > >> permission or something like that, which might be the better way to go. > > > > This ought to do (untested as of yet). I think I like this best, but I'm > > also interested in hearing what Eric has to say. > > So thinking about this and staring at fuse a little more I don't like > the approach of mapping bad uids into INVALID_UID in the case of fuse. > > What scares me is that we are looking at a very uncommon case that no > one will test much. And depending for security on a very subtle > interpretation of semantics that only matter when someone is attacking > us seems scary to me. > > For fuse at least I would assume that any time a file shows up with a > uid that doesn't map, and we map it to INVALID_UID I would assume it is > the work of a hostile user. It could be a misconfiguration but it is a > broken action on the part of the filesystem in either case. > > Therefore given that a bad uid can only occur as a result of accidental > or hostile action can we please call make_bad_inode in those cases? And > thus always return -EIO. > > That way no one needs to consider what happens if we cache bad data or > we try to use bad data, and it is an easy position to retreat from > if it happens that we need to do something different. One thing I don't understand is why you're qualifying these statements to be limited to fuse. Normal filesystems will have to deal with the same problem if/when they are made mountable from user namespaces; is this concern not valid there? Or would you advocate the same behavior for normal filesystems as well? Otherwise it seems like we're just putting off dealing with it for a while. I'd prefer for the mounts to be as useful as possible when the fs contains ids that don't map into the namespace, but it's also difficult to argue against taking the more restricted approach at first and loosening up if needed. So I guess I'm okay with returning -EIO to start out. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-24 13:29 ` Seth Forshee @ 2014-09-24 17:10 ` Eric W. Biederman 2014-09-25 15:04 ` Miklos Szeredi 0 siblings, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-24 17:10 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel, linux-kernel, linux-fsdevel Seth Forshee <seth.forshee@canonical.com> writes: > On Tue, Sep 23, 2014 at 03:29:57PM -0700, Eric W. Biederman wrote: >> >> So thinking about this and staring at fuse a little more I don't like >> the approach of mapping bad uids into INVALID_UID in the case of fuse. >> >> What scares me is that we are looking at a very uncommon case that no >> one will test much. And depending for security on a very subtle >> interpretation of semantics that only matter when someone is attacking >> us seems scary to me. >> >> For fuse at least I would assume that any time a file shows up with a >> uid that doesn't map, and we map it to INVALID_UID I would assume it is >> the work of a hostile user. It could be a misconfiguration but it is a >> broken action on the part of the filesystem in either case. >> >> Therefore given that a bad uid can only occur as a result of accidental >> or hostile action can we please call make_bad_inode in those cases? And >> thus always return -EIO. >> >> That way no one needs to consider what happens if we cache bad data or >> we try to use bad data, and it is an easy position to retreat from >> if it happens that we need to do something different. > > One thing I don't understand is why you're qualifying these statements > to be limited to fuse. Normal filesystems will have to deal with the > same problem if/when they are made mountable from user namespaces; is > this concern not valid there? Or would you advocate the same behavior > for normal filesystems as well? Otherwise it seems like we're just > putting off dealing with it for a while. I am qualifying them with fuse so that my thinking is concrete. I believe my thinking generalizes to all filesystems, but I there may be cases I have not considered that are different. > I'd prefer for the mounts to be as useful as possible when the fs > contains ids that don't map into the namespace, but it's also difficult > to argue against taking the more restricted approach at first and > loosening up if needed. So I guess I'm okay with returning -EIO to start > out. What bugs me the most about making the mounts as useful as possible is that it makes an assumption about why the uids are corrupt. I look at the situtation and I make a different assumption that the most likely reasons for the uids to be corrupt is someone being malicious. Being as useful as possible when someone is being malicious feels like a recipe for the malicious user to exploit your generousity. So in summary I see: - Low utility in being able to manipulate files with bad uids. - Bad uids are mostly likely malicious action. - make_bad_inode is trivial to analyze. - No impediments to change if I am wrong. So unless there is a compelling case, right now I would recommend returning -EIO initially. That allows us to concentrate on the easier parts of this and it leaves the changes only in fuse. I have no problems revisiting this later and in fact I am half persuaded we should, but simultaneously dealing working on both fuse and the generic problem of how to handle files with bad uids and gids seems too much to analyze and digest all at once. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-24 17:10 ` Eric W. Biederman @ 2014-09-25 15:04 ` Miklos Szeredi 2014-09-25 16:21 ` Seth Forshee 2014-09-25 18:05 ` Eric W. Biederman 0 siblings, 2 replies; 39+ messages in thread From: Miklos Szeredi @ 2014-09-25 15:04 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > So in summary I see: > - Low utility in being able to manipulate files with bad uids. > - Bad uids are mostly likely malicious action. > - make_bad_inode is trivial to analyze. > - No impediments to change if I am wrong. > > So unless there is a compelling case, right now I would recommend > returning -EIO initially. That allows us to concentrate on the easier > parts of this and it leaves the changes only in fuse. The problem with marking the inode bad is that it will mark it bad for all instances of this filesystem. Including ones which are in a namespace where the UIDs make perfect sense. So that really doesn't look like a good solution. Doing the check in inode_permission() might be too heavyweight, but it's still the only one that looks sane. Thanks, Miklos ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 15:04 ` Miklos Szeredi @ 2014-09-25 16:21 ` Seth Forshee 2014-09-25 18:05 ` Eric W. Biederman 1 sibling, 0 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-25 16:21 UTC (permalink / raw) To: Miklos Szeredi Cc: Eric W. Biederman, Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Thu, Sep 25, 2014 at 05:04:04PM +0200, Miklos Szeredi wrote: > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: > > > > So in summary I see: > > - Low utility in being able to manipulate files with bad uids. > > - Bad uids are mostly likely malicious action. > > - make_bad_inode is trivial to analyze. > > - No impediments to change if I am wrong. > > > > So unless there is a compelling case, right now I would recommend > > returning -EIO initially. That allows us to concentrate on the easier > > parts of this and it leaves the changes only in fuse. > > The problem with marking the inode bad is that it will mark it bad for > all instances of this filesystem. Including ones which are in a > namespace where the UIDs make perfect sense. As far as I can tell there can only be one interpretation of the uid userspace hands us for an inode per superblock. If userspace hands us 1000 it can't map to kuid 101000 in one context and 1000 in another context because there's only one inode. If the uid doesn't map into whatever namespace we're using to interpret it there's no value for i_uid which will cause it to be re-evaluated in a different context. I don't think it would be a good idea to do that anyway. /proc is mounted in my unprivileged container (i.e. uid 0 in the container maps to some unprivileged uid outside the container). Interpreting uid 0 in /proc relative to my container's namespace would give root in the container access to things that it shouldn't have access to. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 15:04 ` Miklos Szeredi 2014-09-25 16:21 ` Seth Forshee @ 2014-09-25 18:05 ` Eric W. Biederman 2014-09-25 18:44 ` Seth Forshee 1 sibling, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-25 18:05 UTC (permalink / raw) To: Miklos Szeredi Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel Miklos Szeredi <miklos@szeredi.hu> writes: > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: > > >> So in summary I see: >> - Low utility in being able to manipulate files with bad uids. >> - Bad uids are mostly likely malicious action. >> - make_bad_inode is trivial to analyze. >> - No impediments to change if I am wrong. >> >> So unless there is a compelling case, right now I would recommend >> returning -EIO initially. That allows us to concentrate on the easier >> parts of this and it leaves the changes only in fuse. > > The problem with marking the inode bad is that it will mark it bad for > all instances of this filesystem. Including ones which are in a > namespace where the UIDs make perfect sense. There are two cases: app <-> fuse fuse <-> server I proposed mark_bad_inode for "userspace server -> fuse". Where we have one superblock and one server so and one namespace that they decide to talk in when the filesystem was mounted. I think bad_inode is a reasonable response when the filesystem server starts spewing non-sense. > So that really doesn't look like a good solution. > > Doing the check in inode_permission() might be too heavyweight, but > it's still the only one that looks sane. For the "app <-> fuse" case we already have checks in inode_permision that are kuid based that handle that case. We use kuids not for performance (although there is a small advatnage) but to much more to keep the logic simple and maintainable. For the "app -> fuse" case in .setattr we do need a check to verify that the uid and gid are valid. However that check was added with the basic user namespace support and fuse current returns -EOVERFLOW when that happens. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 18:05 ` Eric W. Biederman @ 2014-09-25 18:44 ` Seth Forshee 2014-09-25 18:53 ` Seth Forshee 2014-09-25 19:14 ` Eric W. Biederman 0 siblings, 2 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-25 18:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Miklos Szeredi, Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote: > Miklos Szeredi <miklos@szeredi.hu> writes: > > > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman > > <ebiederm@xmission.com> wrote: > > > > > >> So in summary I see: > >> - Low utility in being able to manipulate files with bad uids. > >> - Bad uids are mostly likely malicious action. > >> - make_bad_inode is trivial to analyze. > >> - No impediments to change if I am wrong. > >> > >> So unless there is a compelling case, right now I would recommend > >> returning -EIO initially. That allows us to concentrate on the easier > >> parts of this and it leaves the changes only in fuse. > > > > The problem with marking the inode bad is that it will mark it bad for > > all instances of this filesystem. Including ones which are in a > > namespace where the UIDs make perfect sense. > > There are two cases: > app <-> fuse > fuse <-> server > > I proposed mark_bad_inode for "userspace server -> fuse". > Where we have one superblock and one server so and one namespace that > they decide to talk in when the filesystem was mounted. > > I think bad_inode is a reasonable response when the filesystem server > starts spewing non-sense. > > > So that really doesn't look like a good solution. > > > > Doing the check in inode_permission() might be too heavyweight, but > > it's still the only one that looks sane. > > For the "app <-> fuse" case we already have checks in inode_permision > that are kuid based that handle that case. We use kuids not for > performance (although there is a small advatnage) but to much more to > keep the logic simple and maintainable. > > > For the "app -> fuse" case in .setattr we do need a check to verify > that the uid and gid are valid. However that check was added with > the basic user namespace support and fuse current returns -EOVERFLOW > when that happens. Where does this happen? I haven't managed to track it down yet. I've also added a check in fuse for this. If a uid/gid passed to fuse_setattr doesn't map into the namespace it will return -EINVAL. Sounds like maybe it should return -EOVERFLOW instead. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 18:44 ` Seth Forshee @ 2014-09-25 18:53 ` Seth Forshee 2014-09-25 19:14 ` Eric W. Biederman 1 sibling, 0 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-25 18:53 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi, Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Thu, Sep 25, 2014 at 01:44:03PM -0500, Seth Forshee wrote: > On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote: > > Miklos Szeredi <miklos@szeredi.hu> writes: > > > > > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman > > > <ebiederm@xmission.com> wrote: > > > > > > > > >> So in summary I see: > > >> - Low utility in being able to manipulate files with bad uids. > > >> - Bad uids are mostly likely malicious action. > > >> - make_bad_inode is trivial to analyze. > > >> - No impediments to change if I am wrong. > > >> > > >> So unless there is a compelling case, right now I would recommend > > >> returning -EIO initially. That allows us to concentrate on the easier > > >> parts of this and it leaves the changes only in fuse. > > > > > > The problem with marking the inode bad is that it will mark it bad for > > > all instances of this filesystem. Including ones which are in a > > > namespace where the UIDs make perfect sense. > > > > There are two cases: > > app <-> fuse > > fuse <-> server > > > > I proposed mark_bad_inode for "userspace server -> fuse". > > Where we have one superblock and one server so and one namespace that > > they decide to talk in when the filesystem was mounted. > > > > I think bad_inode is a reasonable response when the filesystem server > > starts spewing non-sense. > > > > > So that really doesn't look like a good solution. > > > > > > Doing the check in inode_permission() might be too heavyweight, but > > > it's still the only one that looks sane. > > > > For the "app <-> fuse" case we already have checks in inode_permision > > that are kuid based that handle that case. We use kuids not for > > performance (although there is a small advatnage) but to much more to > > keep the logic simple and maintainable. > > > > > > For the "app -> fuse" case in .setattr we do need a check to verify > > that the uid and gid are valid. However that check was added with > > the basic user namespace support and fuse current returns -EOVERFLOW > > when that happens. > > Where does this happen? I haven't managed to track it down yet. I guess it must be the one in chown_common()? Except that returns EINVAL, not EOVERFLOW. > > I've also added a check in fuse for this. If a uid/gid passed to > fuse_setattr doesn't map into the namespace it will return -EINVAL. > Sounds like maybe it should return -EOVERFLOW instead. > > Thanks, > Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 18:44 ` Seth Forshee 2014-09-25 18:53 ` Seth Forshee @ 2014-09-25 19:14 ` Eric W. Biederman 2014-09-25 19:48 ` Seth Forshee 1 sibling, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-25 19:14 UTC (permalink / raw) To: Seth Forshee Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Miklos Szeredi Seth Forshee <seth.forshee@canonical.com> writes: > On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote: >> Miklos Szeredi <miklos@szeredi.hu> writes: >> >> > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman >> > <ebiederm@xmission.com> wrote: >> > >> > >> >> So in summary I see: >> >> - Low utility in being able to manipulate files with bad uids. >> >> - Bad uids are mostly likely malicious action. >> >> - make_bad_inode is trivial to analyze. >> >> - No impediments to change if I am wrong. >> >> >> >> So unless there is a compelling case, right now I would recommend >> >> returning -EIO initially. That allows us to concentrate on the easier >> >> parts of this and it leaves the changes only in fuse. >> > >> > The problem with marking the inode bad is that it will mark it bad for >> > all instances of this filesystem. Including ones which are in a >> > namespace where the UIDs make perfect sense. >> >> There are two cases: >> app <-> fuse >> fuse <-> server >> >> I proposed mark_bad_inode for "userspace server -> fuse". >> Where we have one superblock and one server so and one namespace that >> they decide to talk in when the filesystem was mounted. >> >> I think bad_inode is a reasonable response when the filesystem server >> starts spewing non-sense. >> >> > So that really doesn't look like a good solution. >> > >> > Doing the check in inode_permission() might be too heavyweight, but >> > it's still the only one that looks sane. >> >> For the "app <-> fuse" case we already have checks in inode_permision >> that are kuid based that handle that case. We use kuids not for >> performance (although there is a small advatnage) but to much more to >> keep the logic simple and maintainable. >> >> >> For the "app -> fuse" case in .setattr we do need a check to verify >> that the uid and gid are valid. However that check was added with >> the basic user namespace support and fuse current returns -EOVERFLOW >> when that happens. > > Where does this happen? I haven't managed to track it down yet. Sorry iattr_to_setattr look for from_kuid and from_kgid. The call path is fuse_setattr fuse_do_setattr iattr_to_fattr. Keeping everything in kuids for as long as possible and only converting at the edges tends to mean that I caught most of the conversion issues with my basic support for user namespaces. > I've also added a check in fuse for this. If a uid/gid passed to > fuse_setattr doesn't map into the namespace it will return -EINVAL. > Sounds like maybe it should return -EOVERFLOW instead. I am not 100% about -EOVERLFLOW but it is the closest I could come up with. Frankly looking at what I have coded I am inconsistent. In chown_common I use -EINVAL, whereas in fuse I use -EOVERFLOW. So it is probably worth a second look, and probably a patch to a man page or two just to document this weird case. One document that has advised some of my decisions is the rational for how 64bit file offset support was added on 32bit systems ages ago. That is where I got -EOVERFLOW. The logic for handling 64bit file offsets was ultimately that any operation that could cause corruption would return an error (typically EOVERFLOW). Most of the vfs operations with uids are unlikely to cause corruption and are more likely to be problematic if they don't work which is why I tend to use overflow_uid/-1. For when uid/gid values don't map into a kuid/kgid value. Some error is definitely required. I am on the fence about what to do when a uid from the filesystem server or for other filesystems the on-disk data structures does not map, but make_bad_inode is simpler in conception. So make_bad_inode seems like a good place to start. For fuse especially this isn't hard because the make_bad_inode calls are already there to handle a change in i_mode. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 19:14 ` Eric W. Biederman @ 2014-09-25 19:48 ` Seth Forshee 2014-09-27 1:41 ` Eric W. Biederman 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-25 19:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Miklos Szeredi On Thu, Sep 25, 2014 at 12:14:01PM -0700, Eric W. Biederman wrote: > Seth Forshee <seth.forshee@canonical.com> writes: > > > On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote: > >> Miklos Szeredi <miklos@szeredi.hu> writes: > >> > >> > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman > >> > <ebiederm@xmission.com> wrote: > >> > > >> > > >> >> So in summary I see: > >> >> - Low utility in being able to manipulate files with bad uids. > >> >> - Bad uids are mostly likely malicious action. > >> >> - make_bad_inode is trivial to analyze. > >> >> - No impediments to change if I am wrong. > >> >> > >> >> So unless there is a compelling case, right now I would recommend > >> >> returning -EIO initially. That allows us to concentrate on the easier > >> >> parts of this and it leaves the changes only in fuse. > >> > > >> > The problem with marking the inode bad is that it will mark it bad for > >> > all instances of this filesystem. Including ones which are in a > >> > namespace where the UIDs make perfect sense. > >> > >> There are two cases: > >> app <-> fuse > >> fuse <-> server > >> > >> I proposed mark_bad_inode for "userspace server -> fuse". > >> Where we have one superblock and one server so and one namespace that > >> they decide to talk in when the filesystem was mounted. > >> > >> I think bad_inode is a reasonable response when the filesystem server > >> starts spewing non-sense. > >> > >> > So that really doesn't look like a good solution. > >> > > >> > Doing the check in inode_permission() might be too heavyweight, but > >> > it's still the only one that looks sane. > >> > >> For the "app <-> fuse" case we already have checks in inode_permision > >> that are kuid based that handle that case. We use kuids not for > >> performance (although there is a small advatnage) but to much more to > >> keep the logic simple and maintainable. > >> > >> > >> For the "app -> fuse" case in .setattr we do need a check to verify > >> that the uid and gid are valid. However that check was added with > >> the basic user namespace support and fuse current returns -EOVERFLOW > >> when that happens. > > > > Where does this happen? I haven't managed to track it down yet. > > Sorry iattr_to_setattr look for from_kuid and from_kgid. > > The call path is > fuse_setattr > fuse_do_setattr > iattr_to_fattr. Bah. Sorry, I misread that originally and thought you were talking about something outside of fuse. And I was looking at a tree with my fuse changes, so of course I wouldn't have found it. Actually in 3.17-rc6 I still don't see that iattr_to_fattr (I assume this is what you meant) checks the results of the conversion (not that it really needs to since it uses init_user_ns), nor any use of EOVERFLOW in fuse. Anyway, it's not really important. > Keeping everything in kuids for as long as possible and only converting > at the edges tends to mean that I caught most of the conversion issues > with my basic support for user namespaces. > > > I've also added a check in fuse for this. If a uid/gid passed to > > fuse_setattr doesn't map into the namespace it will return -EINVAL. > > Sounds like maybe it should return -EOVERFLOW instead. > > I am not 100% about -EOVERLFLOW but it is the closest I could come up > with. > > Frankly looking at what I have coded I am inconsistent. In chown_common > I use -EINVAL, whereas in fuse I use -EOVERFLOW. So it is probably > worth a second look, and probably a patch to a man page or two > just to document this weird case. > > One document that has advised some of my decisions is the rational for > how 64bit file offset support was added on 32bit systems ages ago. > That is where I got -EOVERFLOW. The logic for handling 64bit file > offsets was ultimately that any operation that could cause > corruption would return an error (typically EOVERFLOW). > > Most of the vfs operations with uids are unlikely to cause corruption > and are more likely to be problematic if they don't work which is why > I tend to use overflow_uid/-1. > > For when uid/gid values don't map into a kuid/kgid value. Some error > is definitely required. Well, unless you say otherwise I guess I'll leave it -EINVAL to be consistent with chown_common(). > I am on the fence about what to do when a uid from the filesystem server > or for other filesystems the on-disk data structures does not map, but > make_bad_inode is simpler in conception. So make_bad_inode seems like > a good place to start. For fuse especially this isn't hard because > the make_bad_inode calls are already there to handle a change in i_mode. I agree that if we're unsure then make_bad_inode is a more logical place to start, since it's easier to loosen the restrictions later than to tighten them. I've got an initiail implementation that I'm in the process of testing. If you want to take a look I've pushed it to: git://kernel.ubuntu.com/sforshee/linux.git fuse-userns Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-25 19:48 ` Seth Forshee @ 2014-09-27 1:41 ` Eric W. Biederman 2014-09-27 4:24 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-27 1:41 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Miklos Szeredi, Serge E. Hallyn [-- Attachment #1: Type: text/plain, Size: 2143 bytes --] Seth Forshee <seth.forshee@canonical.com> writes: > On Thu, Sep 25, 2014 at 12:14:01PM -0700, Eric W. Biederman wrote: >> Sorry iattr_to_setattr look for from_kuid and from_kgid. >> >> The call path is >> fuse_setattr >> fuse_do_setattr >> iattr_to_fattr. > > Bah. Sorry, I misread that originally and thought you were talking about > something outside of fuse. And I was looking at a tree with my fuse > changes, so of course I wouldn't have found it. > > Actually in 3.17-rc6 I still don't see that iattr_to_fattr (I assume > this is what you meant) checks the results of the conversion (not that > it really needs to since it uses init_user_ns), nor any use of EOVERFLOW > in fuse. Anyway, it's not really important. True. I also goofed up in that I was looking at the wrong tree. My tree had all of my preliminary fuse patches I worked up a while ago applied so I did have the error handling in there. > Well, unless you say otherwise I guess I'll leave it -EINVAL to be > consistent with chown_common(). That sounds like a good plan. >> I am on the fence about what to do when a uid from the filesystem server >> or for other filesystems the on-disk data structures does not map, but >> make_bad_inode is simpler in conception. So make_bad_inode seems like >> a good place to start. For fuse especially this isn't hard because >> the make_bad_inode calls are already there to handle a change in i_mode. > > I agree that if we're unsure then make_bad_inode is a more logical place > to start, since it's easier to loosen the restrictions later than to > tighten them. I've got an initiail implementation that I'm in the > process of testing. If you want to take a look I've pushed it to: > > git://kernel.ubuntu.com/sforshee/linux.git fuse-userns Thanks. If I can scrape together enough focus I will look at it. As a second best thing here are my prototype from when I was looking at performing this fuse conversion last. Given that you had missed checking the from_kuid permission checks, it might be worth comparing and seeing if there is something else in these patches that would be worth including. Eric [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-userns-Allow-for-fuse-filesystems-outside-the-initia.patch --] [-- Type: text/x-diff, Size: 7489 bytes --] >From 94195ce5b06915846d14eaa35d9274c0315b46a0 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Thu, 4 Oct 2012 13:34:45 -0700 Subject: [PATCH 1/4] userns: Allow for fuse filesystems outside the initial user namespace Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/fuse/dev.c | 10 +++++----- fs/fuse/dir.c | 41 ++++++++++++++++++++++++++++++----------- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 10 ++++++++-- 4 files changed, 46 insertions(+), 18 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ca887314aba9..e01f30c51b3c 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -124,10 +124,10 @@ static void __fuse_put_request(struct fuse_req *req) atomic_dec(&req->count); } -static void fuse_req_init_context(struct fuse_req *req) +static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) { - req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid()); - req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid()); + req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); + req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); req->in.h.pid = current->pid; } @@ -168,7 +168,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, goto out; } - fuse_req_init_context(req); + fuse_req_init_context(fc, req); req->waiting = 1; req->background = for_background; return req; @@ -257,7 +257,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc, if (!req) req = get_reserved_req(fc, file); - fuse_req_init_context(req); + fuse_req_init_context(fc, req); req->waiting = 1; req->background = 0; return req; diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index de1d84af9f7c..d74c75a057cd 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -886,7 +886,7 @@ static int fuse_link(struct dentry *entry, struct inode *newdir, return err; } -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr, struct kstat *stat) { unsigned int blkbits; @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, stat->ino = attr->ino; stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); stat->nlink = attr->nlink; - stat->uid = make_kuid(&init_user_ns, attr->uid); - stat->gid = make_kgid(&init_user_ns, attr->gid); + stat->uid = make_kuid(fc->user_ns, attr->uid); + if (!uid_valid(stat->uid)) + return -EOVERFLOW; + stat->gid = make_kgid(fc->user_ns, attr->gid); + if (!gid_valid(stat->gid)) + return -EOVERFLOW; stat->rdev = inode->i_rdev; stat->atime.tv_sec = attr->atime; stat->atime.tv_nsec = attr->atimensec; @@ -923,6 +927,7 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, blkbits = inode->i_sb->s_blocksize_bits; stat->blksize = 1 << blkbits; + return 0; } static int fuse_do_getattr(struct inode *inode, struct kstat *stat, @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat, attr_timeout(&outarg), attr_version); if (stat) - fuse_fillattr(inode, &outarg.attr, stat); + err = fuse_fillattr(inode, &outarg.attr, + stat); } } return err; @@ -1556,17 +1562,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime) return true; } -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, - bool trust_local_cmtime) +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr, + struct fuse_setattr_in *arg, bool trust_local_cmtime) { unsigned ivalid = iattr->ia_valid; if (ivalid & ATTR_MODE) arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode; - if (ivalid & ATTR_UID) - arg->valid |= FATTR_UID, arg->uid = from_kuid(&init_user_ns, iattr->ia_uid); - if (ivalid & ATTR_GID) - arg->valid |= FATTR_GID, arg->gid = from_kgid(&init_user_ns, iattr->ia_gid); + if (ivalid & ATTR_UID) { + arg->valid |= FATTR_UID; + arg->uid = from_kuid(fc->user_ns, iattr->ia_uid); + if (arg->uid == (uid_t)-1) + return -EOVERFLOW; + } + if (ivalid & ATTR_GID) { + arg->valid |= FATTR_GID; + arg->gid = from_kgid(fc->user_ns, iattr->ia_gid); + if (arg->gid == (gid_t)-1) + return -EOVERFLOW; + } if (ivalid & ATTR_SIZE) arg->valid |= FATTR_SIZE, arg->size = iattr->ia_size; if (ivalid & ATTR_ATIME) { @@ -1588,6 +1602,7 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg, arg->ctime = iattr->ia_ctime.tv_sec; arg->ctimensec = iattr->ia_ctime.tv_nsec; } + return 0; } /* @@ -1741,7 +1756,11 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr, memset(&inarg, 0, sizeof(inarg)); memset(&outarg, 0, sizeof(outarg)); - iattr_to_fattr(attr, &inarg, trust_local_cmtime); + err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime); + if (err) { + goto error; + } + if (file) { struct fuse_file *ff = file->private_data; inarg.valid |= FATTR_FH; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e8e47a6ab518..e7dbbb5d62b4 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -598,6 +598,9 @@ struct fuse_conn { /** Read/write semaphore to hold when accessing sb. */ struct rw_semaphore killsb; + + /** User namespace to communicate uids and gids to the fuse daemon */ + struct user_namespace *user_ns; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 03246cd9d47a..894288f7ad67 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -20,6 +20,7 @@ #include <linux/random.h> #include <linux/sched.h> #include <linux/exportfs.h> +#include <linux/user_namespace.h> MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -577,8 +578,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) struct super_block *sb = root->d_sb; struct fuse_conn *fc = get_fuse_conn_super(sb); - seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id)); - seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id)); + seq_printf(m, ",user_id=%u", from_kuid_munged(fc->user_ns, fc->user_id)); + seq_printf(m, ",group_id=%u", from_kgid_munged(fc->user_ns, fc->group_id)); if (fc->flags & FUSE_DEFAULT_PERMISSIONS) seq_puts(m, ",default_permissions"); if (fc->flags & FUSE_ALLOW_OTHER) @@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc) fc->initialized = 0; fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); + fc->user_ns = get_user_ns(current_user_ns()); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc) if (atomic_dec_and_test(&fc->count)) { if (fc->destroy_req) fuse_request_free(fc->destroy_req); + put_user_ns(fc->user_ns); + fc->user_ns = NULL; fc->release(fc); } } @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_time_gran = 1; sb->s_export_op = &fuse_export_operations; + sb->s_user_ns = get_user_ns(current_user_ns()); file = fget(d.fd); err = -EINVAL; @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) } kill_anon_super(sb); + put_user_ns(sb->s_user_ns); } static struct file_system_type fuse_fs_type = { -- 1.9.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-fuse-Teach-fuse-how-to-handle-the-pid-namespace.patch --] [-- Type: text/x-diff, Size: 4851 bytes --] >From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Thu, 4 Oct 2012 13:42:34 -0700 Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/fuse/dev.c | 2 +- fs/fuse/file.c | 17 ++++++++++------- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 4 ++++ 4 files changed, 18 insertions(+), 8 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e01f30c51b3c..448775701763 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -128,7 +128,7 @@ static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req) { req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); - req->in.h.pid = current->pid; + req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); } static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 912061ac4baf..2d52801fa5dd 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2130,7 +2130,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, +static int convert_fuse_file_lock(struct fuse_conn *fc, + const struct fuse_file_lock *ffl, struct file_lock *fl) { switch (ffl->type) { @@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, fl->fl_start = ffl->start; fl->fl_end = ffl->end; - fl->fl_pid = ffl->pid; + rcu_read_lock(); + fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns)); + rcu_read_unlock(); break; default: @@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, } static void fuse_lk_fill(struct fuse_req *req, struct file *file, - const struct file_lock *fl, int opcode, pid_t pid, + const struct file_lock *fl, int opcode, struct pid *pid, int flock) { struct inode *inode = file_inode(file); @@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct file *file, arg->lk.start = fl->fl_start; arg->lk.end = fl->fl_end; arg->lk.type = fl->fl_type; - arg->lk.pid = pid; + arg->lk.pid = pid_nr_ns(pid, fc->pid_ns); if (flock) arg->lk_flags |= FUSE_LK_FLOCK; req->in.h.opcode = opcode; @@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl) if (IS_ERR(req)) return PTR_ERR(req); - fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0); + fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0); req->out.numargs = 1; req->out.args[0].size = sizeof(outarg); req->out.args[0].value = &outarg; @@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl) err = req->out.h.error; fuse_put_request(fc, req); if (!err) - err = convert_fuse_file_lock(&outarg.lk, fl); + err = convert_fuse_file_lock(fc, &outarg.lk, fl); return err; } @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock) struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK; - pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0; + struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL; int err; if (fl->fl_lmops && fl->fl_lmops->lm_grant) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e7dbbb5d62b4..5d93f87e9960 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -601,6 +601,9 @@ struct fuse_conn { /** User namespace to communicate uids and gids to the fuse daemon */ struct user_namespace *user_ns; + + /** Pid namespace to communicate pids to the fuse daemon */ + struct pid_namespace *pid_ns; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 894288f7ad67..5284d7fda269 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -21,6 +21,7 @@ #include <linux/sched.h> #include <linux/exportfs.h> #include <linux/user_namespace.h> +#include <linux/pid_namespace.h> MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -618,6 +619,7 @@ void fuse_conn_init(struct fuse_conn *fc) fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->user_ns = get_user_ns(current_user_ns()); + fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc) fuse_request_free(fc->destroy_req); put_user_ns(fc->user_ns); fc->user_ns = NULL; + put_pid_ns(fc->pid_ns); + fc->pid_ns = NULL; fc->release(fc); } } -- 1.9.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #4: 0003-userns-fuse-unprivileged-mount-suport.patch --] [-- Type: text/x-diff, Size: 804 bytes --] >From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Fri, 5 Oct 2012 10:18:28 -0700 Subject: [PATCH 3/4] userns: fuse unprivileged mount suport Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/fuse/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 5284d7fda269..75f5326868e0 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) static struct file_system_type fuse_fs_type = { .owner = THIS_MODULE, .name = "fuse", - .fs_flags = FS_HAS_SUBTYPE, + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, .mount = fuse_mount, .kill_sb = fuse_kill_sb_anon, }; -- 1.9.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #5: 0004-fuse-Only-allow-read-writing-user-xattrs.patch --] [-- Type: text/x-diff, Size: 1500 bytes --] >From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Fri, 5 Oct 2012 14:33:36 -0700 Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs In the context of unprivileged mounts supporting anything except xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP for all other types of xattrs. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/fuse/dir.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index d74c75a057cd..d84f5b819fab 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -13,6 +13,7 @@ #include <linux/sched.h> #include <linux/namei.h> #include <linux/slab.h> +#include <linux/xattr.h> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) { @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name, if (fc->no_setxattr) return -EOPNOTSUPP; + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) + return -EOPNOTSUPP; + req = fuse_get_req_nopages(fc); if (IS_ERR(req)) return PTR_ERR(req); @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, if (fc->no_getxattr) return -EOPNOTSUPP; + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) + return -EOPNOTSUPP; + req = fuse_get_req_nopages(fc); if (IS_ERR(req)) return PTR_ERR(req); -- 1.9.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-27 1:41 ` Eric W. Biederman @ 2014-09-27 4:24 ` Seth Forshee 2014-09-29 19:34 ` Eric W. Biederman 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-27 4:24 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote: > >> I am on the fence about what to do when a uid from the filesystem server > >> or for other filesystems the on-disk data structures does not map, but > >> make_bad_inode is simpler in conception. So make_bad_inode seems like > >> a good place to start. For fuse especially this isn't hard because > >> the make_bad_inode calls are already there to handle a change in i_mode. > > > > I agree that if we're unsure then make_bad_inode is a more logical place > > to start, since it's easier to loosen the restrictions later than to > > tighten them. I've got an initiail implementation that I'm in the > > process of testing. If you want to take a look I've pushed it to: > > > > git://kernel.ubuntu.com/sforshee/linux.git fuse-userns > > Thanks. If I can scrape together enough focus I will look at it. > > As a second best thing here are my prototype from when I was looking at > performing this fuse conversion last. Given that you had missed > checking the from_kuid permission checks, it might be worth comparing > and seeing if there is something else in these patches that would be > worth including. I think all of the from_kuid checks have been there for at least the last two rounds of patches, but let me know if you see one I missed. Looking at your patches, I see a lot that looks familiar. Seems like a good sign :-) I do see a few things I missed though. I've pointed those out below, and in those cases I'm updating my patches. There are also some other differences which I don't believe require any changes on my part, and I've provided explanations for those. I also asked a few questions. Miklos, are you satisfied with Eric's explanations about using a single namespace and the permissions checks on uids? If so I should be able to send updated patches early next week, otherwise let's continue trying to resolve the points of disagreement. > -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > struct kstat *stat) > { > unsigned int blkbits; > @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > stat->ino = attr->ino; > stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > stat->nlink = attr->nlink; > - stat->uid = make_kuid(&init_user_ns, attr->uid); > - stat->gid = make_kgid(&init_user_ns, attr->gid); > + stat->uid = make_kuid(fc->user_ns, attr->uid); > + if (!uid_valid(stat->uid)) > + return -EOVERFLOW; > + stat->gid = make_kgid(fc->user_ns, attr->gid); > + if (!gid_valid(stat->gid)) > + return -EOVERFLOW; If we were to go with the invalid id approach these checks shouldn't be necessary, and they'll get converted to the overflow ids for userspace. In my make_bad_inode patches I'm doing this check when we update the inode and returning -EINVAL if it doesn't map. Then I changed fuse_fillattr to just copy them from the inode, since fuse always updates the inode right before calling fuse_fillattr anyway and that makes it clear why we don't need to check the results of the conversion. > static int fuse_do_getattr(struct inode *inode, struct kstat *stat, > @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat, > attr_timeout(&outarg), > attr_version); > if (stat) > - fuse_fillattr(inode, &outarg.attr, stat); > + err = fuse_fillattr(inode, &outarg.attr, > + stat); > } > } > return err; So right before calling fuse_fillattr here fuse_change_attributes is called to update the inode, so it appears your patches would have allowed setting inode->i_iuid to INVALID_UID? > @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc) > if (atomic_dec_and_test(&fc->count)) { > if (fc->destroy_req) > fuse_request_free(fc->destroy_req); > + put_user_ns(fc->user_ns); > + fc->user_ns = NULL; > fc->release(fc); > } > } I did this in fuse_free_con, but now looking again this is a bug for cuse since it uses a different release method. I've updated my tree to do it here instead. > @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) > sb->s_maxbytes = MAX_LFS_FILESIZE; > sb->s_time_gran = 1; > sb->s_export_op = &fuse_export_operations; > + sb->s_user_ns = get_user_ns(current_user_ns()); > > file = fget(d.fd); > err = -EINVAL; > @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) > } > > kill_anon_super(sb); > + put_user_ns(sb->s_user_ns); > } What's the point of s_user_ns? It doesn't seem to be used anywhere. > From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001 > From: "Eric W. Biederman" <ebiederm@xmission.com> > Date: Thu, 4 Oct 2012 13:42:34 -0700 > Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace. <snip> > -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, > +static int convert_fuse_file_lock(struct fuse_conn *fc, > + const struct fuse_file_lock *ffl, > struct file_lock *fl) > { > switch (ffl->type) { > @@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, > > fl->fl_start = ffl->start; > fl->fl_end = ffl->end; > - fl->fl_pid = ffl->pid; > + rcu_read_lock(); > + fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns)); > + rcu_read_unlock(); > break; > > default: > @@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl, > } > > static void fuse_lk_fill(struct fuse_req *req, struct file *file, > - const struct file_lock *fl, int opcode, pid_t pid, > + const struct file_lock *fl, int opcode, struct pid *pid, > int flock) > { > struct inode *inode = file_inode(file); > @@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct file *file, > arg->lk.start = fl->fl_start; > arg->lk.end = fl->fl_end; > arg->lk.type = fl->fl_type; > - arg->lk.pid = pid; > + arg->lk.pid = pid_nr_ns(pid, fc->pid_ns); > if (flock) > arg->lk_flags |= FUSE_LK_FLOCK; > req->in.h.opcode = opcode; > @@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl) > if (IS_ERR(req)) > return PTR_ERR(req); > > - fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0); > + fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0); > req->out.numargs = 1; > req->out.args[0].size = sizeof(outarg); > req->out.args[0].value = &outarg; > @@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl) > err = req->out.h.error; > fuse_put_request(fc, req); > if (!err) > - err = convert_fuse_file_lock(&outarg.lk, fl); > + err = convert_fuse_file_lock(fc, &outarg.lk, fl); > > return err; > } > @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock) > struct fuse_conn *fc = get_fuse_conn(inode); > struct fuse_req *req; > int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK; > - pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0; > + struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL; > int err; > > if (fl->fl_lmops && fl->fl_lmops->lm_grant) { D'oh, I did totally miss this file locking stuff. Your changes look good, so I'll pretty much take them verbatim. > @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc) > fuse_request_free(fc->destroy_req); > put_user_ns(fc->user_ns); > fc->user_ns = NULL; > + put_pid_ns(fc->pid_ns); > + fc->pid_ns = NULL; > fc->release(fc); > } > } And I had a leak here for cuse as with the userns. > From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001 > From: "Eric W. Biederman" <ebiederm@xmission.com> > Date: Fri, 5 Oct 2012 10:18:28 -0700 > Subject: [PATCH 3/4] userns: fuse unprivileged mount suport > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/fuse/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 5284d7fda269..75f5326868e0 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) > static struct file_system_type fuse_fs_type = { > .owner = THIS_MODULE, > .name = "fuse", > - .fs_flags = FS_HAS_SUBTYPE, > + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, > .mount = fuse_mount, > .kill_sb = fuse_kill_sb_anon, > }; I have the order of my patches switched, then this one squashed in with the one which adds userns support to fuse. > From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001 > From: "Eric W. Biederman" <ebiederm@xmission.com> > Date: Fri, 5 Oct 2012 14:33:36 -0700 > Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs > > In the context of unprivileged mounts supporting anything except > xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP > for all other types of xattrs. > > Cc: Miklos Szeredi <miklos@szeredi.hu> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/fuse/dir.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index d74c75a057cd..d84f5b819fab 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -13,6 +13,7 @@ > #include <linux/sched.h> > #include <linux/namei.h> > #include <linux/slab.h> > +#include <linux/xattr.h> > > static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) > { > @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > if (fc->no_setxattr) > return -EOPNOTSUPP; > > + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > + return -EOPNOTSUPP; > + > req = fuse_get_req_nopages(fc); > if (IS_ERR(req)) > return PTR_ERR(req); > @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > if (fc->no_getxattr) > return -EOPNOTSUPP; > > + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > + return -EOPNOTSUPP; > + > req = fuse_get_req_nopages(fc); > if (IS_ERR(req)) > return PTR_ERR(req); This I hadn't considered, but it seems reasonable. Two questions though. Why shouldn't we let root-owned mounts support namespaces other than "user."? Thinking ahead to (hopefully) allowing other filesystems to be mounted from user namespaces, should this be enforced by the vfs instead? Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-27 4:24 ` Seth Forshee @ 2014-09-29 19:34 ` Eric W. Biederman 2014-09-30 16:25 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Eric W. Biederman @ 2014-09-29 19:34 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn Seth Forshee <seth.forshee@canonical.com> writes: > On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote: >> >> I am on the fence about what to do when a uid from the filesystem server >> >> or for other filesystems the on-disk data structures does not map, but >> >> make_bad_inode is simpler in conception. So make_bad_inode seems like >> >> a good place to start. For fuse especially this isn't hard because >> >> the make_bad_inode calls are already there to handle a change in i_mode. >> > >> > I agree that if we're unsure then make_bad_inode is a more logical place >> > to start, since it's easier to loosen the restrictions later than to >> > tighten them. I've got an initiail implementation that I'm in the >> > process of testing. If you want to take a look I've pushed it to: >> > >> > git://kernel.ubuntu.com/sforshee/linux.git fuse-userns >> >> Thanks. If I can scrape together enough focus I will look at it. >> >> As a second best thing here are my prototype from when I was looking at >> performing this fuse conversion last. Given that you had missed >> checking the from_kuid permission checks, it might be worth comparing >> and seeing if there is something else in these patches that would be >> worth including. > > I think all of the from_kuid checks have been there for at least the > last two rounds of patches, but let me know if you see one I missed. > > Looking at your patches, I see a lot that looks familiar. Seems like a > good sign :-) > > I do see a few things I missed though. I've pointed those out below, and > in those cases I'm updating my patches. There are also some other > differences which I don't believe require any changes on my part, and > I've provided explanations for those. I also asked a few questions. > > Miklos, are you satisfied with Eric's explanations about using a single > namespace and the permissions checks on uids? If so I should be able to > send updated patches early next week, otherwise let's continue trying to > resolve the points of disagreement. > >> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, >> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr, >> struct kstat *stat) >> { >> unsigned int blkbits; >> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, >> stat->ino = attr->ino; >> stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); >> stat->nlink = attr->nlink; >> - stat->uid = make_kuid(&init_user_ns, attr->uid); >> - stat->gid = make_kgid(&init_user_ns, attr->gid); >> + stat->uid = make_kuid(fc->user_ns, attr->uid); >> + if (!uid_valid(stat->uid)) >> + return -EOVERFLOW; >> + stat->gid = make_kgid(fc->user_ns, attr->gid); >> + if (!gid_valid(stat->gid)) >> + return -EOVERFLOW; > > If we were to go with the invalid id approach these checks shouldn't be > necessary, and they'll get converted to the overflow ids for > userspace. Correct. > In my make_bad_inode patches I'm doing this check when we update the > inode and returning -EINVAL if it doesn't map. In fuse_change_attributes? That seems reasonable. > fuse_fillattr to just copy them from the inode, since fuse always > updates the inode right before calling fuse_fillattr anyway and that > makes it clear why we don't need to check the results of the > conversion. Agreed. There is no point in having that code twice. >> static int fuse_do_getattr(struct inode *inode, struct kstat *stat, >> @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat, >> attr_timeout(&outarg), >> attr_version); >> if (stat) >> - fuse_fillattr(inode, &outarg.attr, stat); >> + err = fuse_fillattr(inode, &outarg.attr, >> + stat); >> } >> } >> return err; > > So right before calling fuse_fillattr here fuse_change_attributes is > called to update the inode, so it appears your patches would have > allowed setting inode->i_iuid to INVALID_UID? Quite possibly. I am scratching my head about that one. >> @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc) >> if (atomic_dec_and_test(&fc->count)) { >> if (fc->destroy_req) >> fuse_request_free(fc->destroy_req); >> + put_user_ns(fc->user_ns); >> + fc->user_ns = NULL; >> fc->release(fc); >> } >> } > > I did this in fuse_free_con, but now looking again this is a bug for > cuse since it uses a different release method. I've updated my tree to > do it here instead. >> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) >> sb->s_maxbytes = MAX_LFS_FILESIZE; >> sb->s_time_gran = 1; >> sb->s_export_op = &fuse_export_operations; >> + sb->s_user_ns = get_user_ns(current_user_ns()); >> >> file = fget(d.fd); >> err = -EINVAL; >> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) >> } >> >> kill_anon_super(sb); >> + put_user_ns(sb->s_user_ns); >> } > > What's the point of s_user_ns? It doesn't seem to be used anywhere. Not in these patches, and the field isn't added here either. It was my exporting of the user namespace that ids are enconded in the filesystem to generic code in the vfs. By default that I am setting that field to init_user_ns. Looking through my tree the only real user of it is the quota code. However for acls and other xattrs we may also care. For block based filesystems I woudn't expect to need the fuse connection structure and storing the user namespace there. So that field becomes a lot less redundant in the general case. >> @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock) >> struct fuse_conn *fc = get_fuse_conn(inode); >> struct fuse_req *req; >> int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK; >> - pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0; >> + struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL; >> int err; >> >> if (fl->fl_lmops && fl->fl_lmops->lm_grant) { > > D'oh, I did totally miss this file locking stuff. Your changes look > good, so I'll pretty much take them verbatim. Sounds good. >> @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc) >> fuse_request_free(fc->destroy_req); >> put_user_ns(fc->user_ns); >> fc->user_ns = NULL; >> + put_pid_ns(fc->pid_ns); >> + fc->pid_ns = NULL; >> fc->release(fc); >> } >> } > > And I had a leak here for cuse as with the userns. > >> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001 >> From: "Eric W. Biederman" <ebiederm@xmission.com> >> Date: Fri, 5 Oct 2012 10:18:28 -0700 >> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> fs/fuse/inode.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> index 5284d7fda269..75f5326868e0 100644 >> --- a/fs/fuse/inode.c >> +++ b/fs/fuse/inode.c >> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) >> static struct file_system_type fuse_fs_type = { >> .owner = THIS_MODULE, >> .name = "fuse", >> - .fs_flags = FS_HAS_SUBTYPE, >> + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, >> .mount = fuse_mount, >> .kill_sb = fuse_kill_sb_anon, >> }; > > I have the order of my patches switched, then this one squashed in with > the one which adds userns support to fuse. It is nice having this as a separate patch because it allows incremental progress. Then this patch can be added when victory is declared. Although looking at my code this looks like this patch is probably a little early in the series ;-) >> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001 >> From: "Eric W. Biederman" <ebiederm@xmission.com> >> Date: Fri, 5 Oct 2012 14:33:36 -0700 >> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs >> >> In the context of unprivileged mounts supporting anything except >> xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP >> for all other types of xattrs. >> >> Cc: Miklos Szeredi <miklos@szeredi.hu> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> fs/fuse/dir.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index d74c75a057cd..d84f5b819fab 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -13,6 +13,7 @@ >> #include <linux/sched.h> >> #include <linux/namei.h> >> #include <linux/slab.h> >> +#include <linux/xattr.h> >> >> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) >> { >> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name, >> if (fc->no_setxattr) >> return -EOPNOTSUPP; >> >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) >> + return -EOPNOTSUPP; >> + >> req = fuse_get_req_nopages(fc); >> if (IS_ERR(req)) >> return PTR_ERR(req); >> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, >> if (fc->no_getxattr) >> return -EOPNOTSUPP; >> >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) >> + return -EOPNOTSUPP; >> + >> req = fuse_get_req_nopages(fc); >> if (IS_ERR(req)) >> return PTR_ERR(req); > > This I hadn't considered, but it seems reasonable. Two questions though. > > Why shouldn't we let root-owned mounts support namespaces other than > "user."? Are there any users of fuse who care. Certainly the core use case of fuse is unprivileged mounts and in that case nosuid should take care of this. > Thinking ahead to (hopefully) allowing other filesystems to be mounted > from user namespaces, should this be enforced by the vfs instead? Potentially. I would have to look to see how hard it is to lift this code into the vfs. At least historically the xattr interface was ugly enough getting some of this stuff into the vfs would require refactoring. My tenative patches in this area look pretty rough. For ext2 I just implemented: + if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns) + return -EOPNOTSUPP; + So for ext2 I did honor allow things to happen as you would have suspected. But if we are not going to require specifying nosuid looking closely at xattrs and acls and security attributes seems pretty important. Looking at all of this my guess is we probably want to write a list of rules for unprivileged mounting of filesystems. So that people can (a) review the basic theory in case they are aware of anything we may have missed. (b) Have a reference for the whatever filesystems come next. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-29 19:34 ` Eric W. Biederman @ 2014-09-30 16:25 ` Seth Forshee 2014-10-05 16:48 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-30 16:25 UTC (permalink / raw) To: Eric W. Biederman Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn On Mon, Sep 29, 2014 at 12:34:44PM -0700, Eric W. Biederman wrote: > >> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > >> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > >> struct kstat *stat) > >> { > >> unsigned int blkbits; > >> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr, > >> stat->ino = attr->ino; > >> stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); > >> stat->nlink = attr->nlink; > >> - stat->uid = make_kuid(&init_user_ns, attr->uid); > >> - stat->gid = make_kgid(&init_user_ns, attr->gid); > >> + stat->uid = make_kuid(fc->user_ns, attr->uid); > >> + if (!uid_valid(stat->uid)) > >> + return -EOVERFLOW; > >> + stat->gid = make_kgid(fc->user_ns, attr->gid); > >> + if (!gid_valid(stat->gid)) > >> + return -EOVERFLOW; > > > > If we were to go with the invalid id approach these checks shouldn't be > > necessary, and they'll get converted to the overflow ids for > > userspace. > > Correct. > > > In my make_bad_inode patches I'm doing this check when we update the > > inode and returning -EINVAL if it doesn't map. > > In fuse_change_attributes? That seems reasonable. In fuse_change_attributes_common, which then filters up to fuse_change_attributes. > >> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) > >> sb->s_maxbytes = MAX_LFS_FILESIZE; > >> sb->s_time_gran = 1; > >> sb->s_export_op = &fuse_export_operations; > >> + sb->s_user_ns = get_user_ns(current_user_ns()); > >> > >> file = fget(d.fd); > >> err = -EINVAL; > >> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) > >> } > >> > >> kill_anon_super(sb); > >> + put_user_ns(sb->s_user_ns); > >> } > > > > What's the point of s_user_ns? It doesn't seem to be used anywhere. > > Not in these patches, and the field isn't added here either. It was my > exporting of the user namespace that ids are enconded in the filesystem > to generic code in the vfs. By default that I am setting that field > to init_user_ns. > > Looking through my tree the only real user of it is the quota code. > > However for acls and other xattrs we may also care. > > For block based filesystems I woudn't expect to need the fuse connection > structure and storing the user namespace there. So that field becomes > a lot less redundant in the general case. Hmm, so that does seem to make sense for acls. The ids should be getting translated relative to the namespace used for the superblock. Currently they're translated relative to init_user_ns. > >> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001 > >> From: "Eric W. Biederman" <ebiederm@xmission.com> > >> Date: Fri, 5 Oct 2012 10:18:28 -0700 > >> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport > >> > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> --- > >> fs/fuse/inode.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > >> index 5284d7fda269..75f5326868e0 100644 > >> --- a/fs/fuse/inode.c > >> +++ b/fs/fuse/inode.c > >> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb) > >> static struct file_system_type fuse_fs_type = { > >> .owner = THIS_MODULE, > >> .name = "fuse", > >> - .fs_flags = FS_HAS_SUBTYPE, > >> + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT, > >> .mount = fuse_mount, > >> .kill_sb = fuse_kill_sb_anon, > >> }; > > > > I have the order of my patches switched, then this one squashed in with > > the one which adds userns support to fuse. > > It is nice having this as a separate patch because it allows incremental > progress. Then this patch can be added when victory is declared. > > Although looking at my code this looks like this patch is probably a > little early in the series ;-) It doesn't really make any difference to me, so I'll go ahead and split it out. > >> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001 > >> From: "Eric W. Biederman" <ebiederm@xmission.com> > >> Date: Fri, 5 Oct 2012 14:33:36 -0700 > >> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs > >> > >> In the context of unprivileged mounts supporting anything except > >> xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP > >> for all other types of xattrs. > >> > >> Cc: Miklos Szeredi <miklos@szeredi.hu> > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> --- > >> fs/fuse/dir.c | 7 +++++++ > >> 1 file changed, 7 insertions(+) > >> > >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >> index d74c75a057cd..d84f5b819fab 100644 > >> --- a/fs/fuse/dir.c > >> +++ b/fs/fuse/dir.c > >> @@ -13,6 +13,7 @@ > >> #include <linux/sched.h> > >> #include <linux/namei.h> > >> #include <linux/slab.h> > >> +#include <linux/xattr.h> > >> > >> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) > >> { > >> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > >> if (fc->no_setxattr) > >> return -EOPNOTSUPP; > >> > >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > >> + return -EOPNOTSUPP; > >> + > >> req = fuse_get_req_nopages(fc); > >> if (IS_ERR(req)) > >> return PTR_ERR(req); > >> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > >> if (fc->no_getxattr) > >> return -EOPNOTSUPP; > >> > >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > >> + return -EOPNOTSUPP; > >> + > >> req = fuse_get_req_nopages(fc); > >> if (IS_ERR(req)) > >> return PTR_ERR(req); > > > > This I hadn't considered, but it seems reasonable. Two questions though. > > > > Why shouldn't we let root-owned mounts support namespaces other than > > "user."? > > Are there any users of fuse who care. Certainly the core use case of > fuse is unprivileged mounts and in that case nosuid should take care of > this. I couldn't say for sure - I'm thinking that there are some filesystems which are only supported via fuse, and if distros are automounting any of those then it would likely be via root-owned fuse mounts. And if any of those supports xattrs then this could be considered a regression. > > Thinking ahead to (hopefully) allowing other filesystems to be mounted > > from user namespaces, should this be enforced by the vfs instead? > > Potentially. I would have to look to see how hard it is to lift this > code into the vfs. At least historically the xattr interface was ugly > enough getting some of this stuff into the vfs would require > refactoring. > > My tenative patches in this area look pretty rough. For ext2 I > just implemented: > > + if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns) > + return -EOPNOTSUPP; > + > > So for ext2 I did honor allow things to happen as you would have > suspected. > > But if we are not going to require specifying nosuid looking closely at > xattrs and acls and security attributes seems pretty important. > > Looking at all of this my guess is we probably want to write a list of > rules for unprivileged mounting of filesystems. So that people can > (a) review the basic theory in case they are aware of anything we may > have missed. > (b) Have a reference for the whatever filesystems come next. Several namespaces are restricted at the vfs level right now, though system.* and security.* are specifically called out as being left to the individual filesystem to decide. I'm not well versed in the use of xattrs, so I'll need to go do some studying before I'll fully understand everything that needs to be done. I guess there are a couple of options: try to tackle all of this before merging any patches for fuse, or put some limits if fuse right now that could potentially be lifted after we have more general rules. I'd prefer the second option so we can get some level of support for fuse in containers sooner. For xattrs I could do something like this to avoid regressions in init_user_ns while restricting to user.* in other namespaces for now: if (fc->user_ns != &init_user_ns && strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) return -EOPNOTSUPP; Thoughts? Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-30 16:25 ` Seth Forshee @ 2014-10-05 16:48 ` Seth Forshee 2014-10-06 16:00 ` Serge Hallyn 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-10-05 16:48 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn On Tue, Sep 30, 2014 at 11:25:59AM -0500, Seth Forshee wrote: > > >> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001 > > >> From: "Eric W. Biederman" <ebiederm@xmission.com> > > >> Date: Fri, 5 Oct 2012 14:33:36 -0700 > > >> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs > > >> > > >> In the context of unprivileged mounts supporting anything except > > >> xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP > > >> for all other types of xattrs. > > >> > > >> Cc: Miklos Szeredi <miklos@szeredi.hu> > > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > >> --- > > >> fs/fuse/dir.c | 7 +++++++ > > >> 1 file changed, 7 insertions(+) > > >> > > >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > >> index d74c75a057cd..d84f5b819fab 100644 > > >> --- a/fs/fuse/dir.c > > >> +++ b/fs/fuse/dir.c > > >> @@ -13,6 +13,7 @@ > > >> #include <linux/sched.h> > > >> #include <linux/namei.h> > > >> #include <linux/slab.h> > > >> +#include <linux/xattr.h> > > >> > > >> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) > > >> { > > >> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > > >> if (fc->no_setxattr) > > >> return -EOPNOTSUPP; > > >> > > >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > >> + return -EOPNOTSUPP; > > >> + > > >> req = fuse_get_req_nopages(fc); > > >> if (IS_ERR(req)) > > >> return PTR_ERR(req); > > >> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > > >> if (fc->no_getxattr) > > >> return -EOPNOTSUPP; > > >> > > >> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > >> + return -EOPNOTSUPP; > > >> + > > >> req = fuse_get_req_nopages(fc); > > >> if (IS_ERR(req)) > > >> return PTR_ERR(req); > > > > > > This I hadn't considered, but it seems reasonable. Two questions though. > > > > > > Why shouldn't we let root-owned mounts support namespaces other than > > > "user."? > > > > Are there any users of fuse who care. Certainly the core use case of > > fuse is unprivileged mounts and in that case nosuid should take care of > > this. > > I couldn't say for sure - I'm thinking that there are some filesystems > which are only supported via fuse, and if distros are automounting any > of those then it would likely be via root-owned fuse mounts. And if any > of those supports xattrs then this could be considered a regression. > > > > Thinking ahead to (hopefully) allowing other filesystems to be mounted > > > from user namespaces, should this be enforced by the vfs instead? > > > > Potentially. I would have to look to see how hard it is to lift this > > code into the vfs. At least historically the xattr interface was ugly > > enough getting some of this stuff into the vfs would require > > refactoring. > > > > My tenative patches in this area look pretty rough. For ext2 I > > just implemented: > > > > + if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns) > > + return -EOPNOTSUPP; > > + > > > > So for ext2 I did honor allow things to happen as you would have > > suspected. > > > > But if we are not going to require specifying nosuid looking closely at > > xattrs and acls and security attributes seems pretty important. > > > > Looking at all of this my guess is we probably want to write a list of > > rules for unprivileged mounting of filesystems. So that people can > > (a) review the basic theory in case they are aware of anything we may > > have missed. > > (b) Have a reference for the whatever filesystems come next. > > Several namespaces are restricted at the vfs level right now, though > system.* and security.* are specifically called out as being left to the > individual filesystem to decide. > > I'm not well versed in the use of xattrs, so I'll need to go do some > studying before I'll fully understand everything that needs to be done. > I guess there are a couple of options: try to tackle all of this before > merging any patches for fuse, or put some limits if fuse right now that > could potentially be lifted after we have more general rules. I'd prefer > the second option so we can get some level of support for fuse in > containers sooner. > > For xattrs I could do something like this to avoid regressions in > init_user_ns while restricting to user.* in other namespaces for now: > > if (fc->user_ns != &init_user_ns && > strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > return -EOPNOTSUPP; After digging into this some more I think I agree with you. At minimum letting users insert arbitrary xattrs via fuse bypasses the usual restrictions on setting xattrs. This is probably mitigated by the limited visibility of the fuse mount in the usual case for unprivileged users, but it does seem like a bad idea fundamentally. So I was thinking of something like the following (untested) to let root in the host support privileged xattrs while limiting unprivileged users to user.*. Miklos, does this look acceptable or would you prefer something different? diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index e3123bfbc711..1a3ee5663dea 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name, if (fc->no_setxattr) return -EOPNOTSUPP; + if (!(fc->flags & FUSE_PRIV_XATTRS) && + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) + return -EOPNOTSUPP; + req = fuse_get_req_nopages(fc); if (IS_ERR(req)) return PTR_ERR(req); @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, if (fc->no_getxattr) return -EOPNOTSUPP; + if (!(fc->flags & FUSE_PRIV_XATTRS) && + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) + return -EOPNOTSUPP; + req = fuse_get_req_nopages(fc); if (IS_ERR(req)) return PTR_ERR(req); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 81187ba04e4a..bc0fd14b962a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -46,6 +46,11 @@ doing the mount will be allowed to access the filesystem */ #define FUSE_ALLOW_OTHER (1 << 1) +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the + user.* namespace are allowed. This option is only allowed for + system root. */ +#define FUSE_PRIV_XATTRS (1 << 2) + /** Number of page pointers embedded in fuse_req */ #define FUSE_REQ_INLINE_PAGES 1 diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index b88b5a780228..6716b56d43a1 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -493,6 +493,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_PRIV_XATTRS, OPT_ERR }; @@ -505,6 +506,7 @@ static const match_table_t tokens = { {OPT_ALLOW_OTHER, "allow_other"}, {OPT_MAX_READ, "max_read=%u"}, {OPT_BLKSIZE, "blksize=%u"}, + {OPT_PRIV_XATTRS, "priv_xattr"}, {OPT_ERR, NULL} }; @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) d->blksize = value; break; + case OPT_PRIV_XATTRS: + if (!capable(CAP_SYS_ADMIN)) + return 0; + d->flags |= FUSE_PRIV_XATTRS; + break; + default: return 0; } ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-10-05 16:48 ` Seth Forshee @ 2014-10-06 16:00 ` Serge Hallyn 2014-10-06 16:31 ` Seth Forshee 0 siblings, 1 reply; 39+ messages in thread From: Serge Hallyn @ 2014-10-06 16:00 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn Quoting Seth Forshee (seth.forshee@canonical.com): ... > After digging into this some more I think I agree with you. At minimum > letting users insert arbitrary xattrs via fuse bypasses the usual > restrictions on setting xattrs. This is probably mitigated by the > limited visibility of the fuse mount in the usual case for unprivileged > users, but it does seem like a bad idea fundamentally. > > So I was thinking of something like the following (untested) to let root > in the host support privileged xattrs while limiting unprivileged users > to user.*. Miklos, does this look acceptable or would you prefer > something different? So it won't be possible to set capabilities in a fuse fs? This may be necessary, but it will prevent i.e. live-iso builders from writing for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the iso. > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index e3123bfbc711..1a3ee5663dea 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > if (fc->no_setxattr) > return -EOPNOTSUPP; > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > + return -EOPNOTSUPP; > + > req = fuse_get_req_nopages(fc); > if (IS_ERR(req)) > return PTR_ERR(req); > @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > if (fc->no_getxattr) > return -EOPNOTSUPP; > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > + return -EOPNOTSUPP; > + > req = fuse_get_req_nopages(fc); > if (IS_ERR(req)) > return PTR_ERR(req); > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 81187ba04e4a..bc0fd14b962a 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -46,6 +46,11 @@ > doing the mount will be allowed to access the filesystem */ > #define FUSE_ALLOW_OTHER (1 << 1) > > +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the > + user.* namespace are allowed. This option is only allowed for > + system root. */ > +#define FUSE_PRIV_XATTRS (1 << 2) > + > /** Number of page pointers embedded in fuse_req */ > #define FUSE_REQ_INLINE_PAGES 1 > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index b88b5a780228..6716b56d43a1 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -493,6 +493,7 @@ enum { > OPT_ALLOW_OTHER, > OPT_MAX_READ, > OPT_BLKSIZE, > + OPT_PRIV_XATTRS, > OPT_ERR > }; > > @@ -505,6 +506,7 @@ static const match_table_t tokens = { > {OPT_ALLOW_OTHER, "allow_other"}, > {OPT_MAX_READ, "max_read=%u"}, > {OPT_BLKSIZE, "blksize=%u"}, > + {OPT_PRIV_XATTRS, "priv_xattr"}, > {OPT_ERR, NULL} > }; > > @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) > d->blksize = value; > break; > > + case OPT_PRIV_XATTRS: > + if (!capable(CAP_SYS_ADMIN)) > + return 0; > + d->flags |= FUSE_PRIV_XATTRS; > + break; > + > default: > return 0; > } > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-10-06 16:00 ` Serge Hallyn @ 2014-10-06 16:31 ` Seth Forshee 2014-10-06 16:36 ` Serge Hallyn 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-10-06 16:31 UTC (permalink / raw) To: Serge Hallyn Cc: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn On Mon, Oct 06, 2014 at 04:00:06PM +0000, Serge Hallyn wrote: > Quoting Seth Forshee (seth.forshee@canonical.com): > ... > > After digging into this some more I think I agree with you. At minimum > > letting users insert arbitrary xattrs via fuse bypasses the usual > > restrictions on setting xattrs. This is probably mitigated by the > > limited visibility of the fuse mount in the usual case for unprivileged > > users, but it does seem like a bad idea fundamentally. > > > > So I was thinking of something like the following (untested) to let root > > in the host support privileged xattrs while limiting unprivileged users > > to user.*. Miklos, does this look acceptable or would you prefer > > something different? > > So it won't be possible to set capabilities in a fuse fs? This may > be necessary, but it will prevent i.e. live-iso builders from writing > for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the > iso. cap_inode_setxattr() already requires CAP_SETFCAP in the host to do this, which I'd think root in in an unpriv container wouldn't have, so aren't you prevented from doing so already? I suppose the LSM could override this restriction though. > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > index e3123bfbc711..1a3ee5663dea 100644 > > --- a/fs/fuse/dir.c > > +++ b/fs/fuse/dir.c > > @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > > if (fc->no_setxattr) > > return -EOPNOTSUPP; > > > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > + return -EOPNOTSUPP; > > + > > req = fuse_get_req_nopages(fc); > > if (IS_ERR(req)) > > return PTR_ERR(req); > > @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > > if (fc->no_getxattr) > > return -EOPNOTSUPP; > > > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > + return -EOPNOTSUPP; > > + > > req = fuse_get_req_nopages(fc); > > if (IS_ERR(req)) > > return PTR_ERR(req); > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > index 81187ba04e4a..bc0fd14b962a 100644 > > --- a/fs/fuse/fuse_i.h > > +++ b/fs/fuse/fuse_i.h > > @@ -46,6 +46,11 @@ > > doing the mount will be allowed to access the filesystem */ > > #define FUSE_ALLOW_OTHER (1 << 1) > > > > +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the > > + user.* namespace are allowed. This option is only allowed for > > + system root. */ > > +#define FUSE_PRIV_XATTRS (1 << 2) > > + > > /** Number of page pointers embedded in fuse_req */ > > #define FUSE_REQ_INLINE_PAGES 1 > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > index b88b5a780228..6716b56d43a1 100644 > > --- a/fs/fuse/inode.c > > +++ b/fs/fuse/inode.c > > @@ -493,6 +493,7 @@ enum { > > OPT_ALLOW_OTHER, > > OPT_MAX_READ, > > OPT_BLKSIZE, > > + OPT_PRIV_XATTRS, > > OPT_ERR > > }; > > > > @@ -505,6 +506,7 @@ static const match_table_t tokens = { > > {OPT_ALLOW_OTHER, "allow_other"}, > > {OPT_MAX_READ, "max_read=%u"}, > > {OPT_BLKSIZE, "blksize=%u"}, > > + {OPT_PRIV_XATTRS, "priv_xattr"}, > > {OPT_ERR, NULL} > > }; > > > > @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) > > d->blksize = value; > > break; > > > > + case OPT_PRIV_XATTRS: > > + if (!capable(CAP_SYS_ADMIN)) > > + return 0; > > + d->flags |= FUSE_PRIV_XATTRS; > > + break; > > + > > default: > > return 0; > > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-10-06 16:31 ` Seth Forshee @ 2014-10-06 16:36 ` Serge Hallyn 0 siblings, 0 replies; 39+ messages in thread From: Serge Hallyn @ 2014-10-06 16:36 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel, Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn Quoting Seth Forshee (seth.forshee@canonical.com): > On Mon, Oct 06, 2014 at 04:00:06PM +0000, Serge Hallyn wrote: > > Quoting Seth Forshee (seth.forshee@canonical.com): > > ... > > > After digging into this some more I think I agree with you. At minimum > > > letting users insert arbitrary xattrs via fuse bypasses the usual > > > restrictions on setting xattrs. This is probably mitigated by the > > > limited visibility of the fuse mount in the usual case for unprivileged > > > users, but it does seem like a bad idea fundamentally. > > > > > > So I was thinking of something like the following (untested) to let root > > > in the host support privileged xattrs while limiting unprivileged users > > > to user.*. Miklos, does this look acceptable or would you prefer > > > something different? > > > > So it won't be possible to set capabilities in a fuse fs? This may > > be necessary, but it will prevent i.e. live-iso builders from writing > > for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the > > iso. > > cap_inode_setxattr() already requires CAP_SETFCAP in the host to do > this, which I'd think root in in an unpriv container wouldn't have, so > aren't you prevented from doing so already? I suppose the LSM could > override this restriction though. It's true we'd have to complicated that path. And really I was being silly. It's not safe (at least not trivially). > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > > > index e3123bfbc711..1a3ee5663dea 100644 > > > --- a/fs/fuse/dir.c > > > +++ b/fs/fuse/dir.c > > > @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name, > > > if (fc->no_setxattr) > > > return -EOPNOTSUPP; > > > > > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > > > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > > + return -EOPNOTSUPP; > > > + > > > req = fuse_get_req_nopages(fc); > > > if (IS_ERR(req)) > > > return PTR_ERR(req); > > > @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name, > > > if (fc->no_getxattr) > > > return -EOPNOTSUPP; > > > > > > + if (!(fc->flags & FUSE_PRIV_XATTRS) && > > > + strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0) > > > + return -EOPNOTSUPP; > > > + > > > req = fuse_get_req_nopages(fc); > > > if (IS_ERR(req)) > > > return PTR_ERR(req); > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > > index 81187ba04e4a..bc0fd14b962a 100644 > > > --- a/fs/fuse/fuse_i.h > > > +++ b/fs/fuse/fuse_i.h > > > @@ -46,6 +46,11 @@ > > > doing the mount will be allowed to access the filesystem */ > > > #define FUSE_ALLOW_OTHER (1 << 1) > > > > > > +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the > > > + user.* namespace are allowed. This option is only allowed for > > > + system root. */ > > > +#define FUSE_PRIV_XATTRS (1 << 2) > > > + > > > /** Number of page pointers embedded in fuse_req */ > > > #define FUSE_REQ_INLINE_PAGES 1 > > > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > > index b88b5a780228..6716b56d43a1 100644 > > > --- a/fs/fuse/inode.c > > > +++ b/fs/fuse/inode.c > > > @@ -493,6 +493,7 @@ enum { > > > OPT_ALLOW_OTHER, > > > OPT_MAX_READ, > > > OPT_BLKSIZE, > > > + OPT_PRIV_XATTRS, > > > OPT_ERR > > > }; > > > > > > @@ -505,6 +506,7 @@ static const match_table_t tokens = { > > > {OPT_ALLOW_OTHER, "allow_other"}, > > > {OPT_MAX_READ, "max_read=%u"}, > > > {OPT_BLKSIZE, "blksize=%u"}, > > > + {OPT_PRIV_XATTRS, "priv_xattr"}, > > > {OPT_ERR, NULL} > > > }; > > > > > > @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) > > > d->blksize = value; > > > break; > > > > > > + case OPT_PRIV_XATTRS: > > > + if (!capable(CAP_SYS_ADMIN)) > > > + return 0; > > > + d->flags |= FUSE_PRIV_XATTRS; > > > + break; > > > + > > > default: > > > return 0; > > > } > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee ` (4 preceding siblings ...) 2014-09-10 12:35 ` Seth Forshee @ 2014-09-23 16:07 ` Miklos Szeredi 2014-09-23 16:26 ` Seth Forshee 5 siblings, 1 reply; 39+ messages in thread From: Miklos Szeredi @ 2014-09-23 16:07 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > Here's an updated set of patches for allowing fuse mounts from pid and > user namespaces. I discussed some of the issues we debated with the last > patch set (and a few others) with Eric at LinuxCon, and the updates here > mainly reflect the outcome of those discussions. > > The stickiest issue in the v1 patches was the question of where to get > the user and pid namespaces from that are used for translating ids for > communication with userspace. Eric told me that for user namespaces at > least we need to grab a namespace at open or mount time and use only > that namespace to prevent certain types of attacks. I'm not convinced. Let us have the gory details, please. Thanks, Miklos ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-23 16:07 ` Miklos Szeredi @ 2014-09-23 16:26 ` Seth Forshee 2014-09-23 17:03 ` Miklos Szeredi 0 siblings, 1 reply; 39+ messages in thread From: Seth Forshee @ 2014-09-23 16:26 UTC (permalink / raw) To: Miklos Szeredi, Eric W. Biederman Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote: > On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > Here's an updated set of patches for allowing fuse mounts from pid and > > user namespaces. I discussed some of the issues we debated with the last > > patch set (and a few others) with Eric at LinuxCon, and the updates here > > mainly reflect the outcome of those discussions. > > > > The stickiest issue in the v1 patches was the question of where to get > > the user and pid namespaces from that are used for translating ids for > > communication with userspace. Eric told me that for user namespaces at > > least we need to grab a namespace at open or mount time and use only > > that namespace to prevent certain types of attacks. > > I'm not convinced. Let us have the gory details, please. I'll do my best, and hopefully Eric will fill in any details I miss. I think there may have been more than one possible scenario that Eric was describing to me, but this is the one I remember. A user could create a namespace and mount a fuse filesystem without nosuid. It could then pass the /dev/fuse fd to a process in init_user_ns, which could expose a suid file owned by root (or any other user) and use it to gain elevated privileges. On the other hand, if file ownership is always interpreted in the context of the namespace from which the filesystem is mounted then suid can only be used to become another uid already under that user's control. Eric, does that sound right? Did I miss anything? Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-23 16:26 ` Seth Forshee @ 2014-09-23 17:03 ` Miklos Szeredi 2014-09-23 17:33 ` Seth Forshee 2014-09-23 21:46 ` Eric W. Biederman 0 siblings, 2 replies; 39+ messages in thread From: Miklos Szeredi @ 2014-09-23 17:03 UTC (permalink / raw) To: Miklos Szeredi, Eric W. Biederman, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote: >> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote: >> > Here's an updated set of patches for allowing fuse mounts from pid and >> > user namespaces. I discussed some of the issues we debated with the last >> > patch set (and a few others) with Eric at LinuxCon, and the updates here >> > mainly reflect the outcome of those discussions. >> > >> > The stickiest issue in the v1 patches was the question of where to get >> > the user and pid namespaces from that are used for translating ids for >> > communication with userspace. Eric told me that for user namespaces at >> > least we need to grab a namespace at open or mount time and use only >> > that namespace to prevent certain types of attacks. >> >> I'm not convinced. Let us have the gory details, please. > > I'll do my best, and hopefully Eric will fill in any details I miss. > > I think there may have been more than one possible scenario that Eric > was describing to me, but this is the one I remember. A user could > create a namespace and mount a fuse filesystem without nosuid. It could > then pass the /dev/fuse fd to a process in init_user_ns, which could > expose a suid file owned by root (or any other user) and use it to gain > elevated privileges. > > On the other hand, if file ownership is always interpreted in the > context of the namespace from which the filesystem is mounted then suid > can only be used to become another uid already under that user's > control. Much simpler solution: don't allow SUID in unprivileged namespaces. SUID is a really ugly hack with many problems, just get rid of it. Thanks, Miklos ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-23 17:03 ` Miklos Szeredi @ 2014-09-23 17:33 ` Seth Forshee 2014-09-23 21:46 ` Eric W. Biederman 1 sibling, 0 replies; 39+ messages in thread From: Seth Forshee @ 2014-09-23 17:33 UTC (permalink / raw) To: Miklos Szeredi Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel On Tue, Sep 23, 2014 at 07:03:47PM +0200, Miklos Szeredi wrote: > On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: > > On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote: > >> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > >> > Here's an updated set of patches for allowing fuse mounts from pid and > >> > user namespaces. I discussed some of the issues we debated with the last > >> > patch set (and a few others) with Eric at LinuxCon, and the updates here > >> > mainly reflect the outcome of those discussions. > >> > > >> > The stickiest issue in the v1 patches was the question of where to get > >> > the user and pid namespaces from that are used for translating ids for > >> > communication with userspace. Eric told me that for user namespaces at > >> > least we need to grab a namespace at open or mount time and use only > >> > that namespace to prevent certain types of attacks. > >> > >> I'm not convinced. Let us have the gory details, please. > > > > I'll do my best, and hopefully Eric will fill in any details I miss. > > > > I think there may have been more than one possible scenario that Eric > > was describing to me, but this is the one I remember. A user could > > create a namespace and mount a fuse filesystem without nosuid. It could > > then pass the /dev/fuse fd to a process in init_user_ns, which could > > expose a suid file owned by root (or any other user) and use it to gain > > elevated privileges. > > > > On the other hand, if file ownership is always interpreted in the > > context of the namespace from which the filesystem is mounted then suid > > can only be used to become another uid already under that user's > > control. > > Much simpler solution: don't allow SUID in unprivileged namespaces. > SUID is a really ugly hack with many problems, just get rid of it. Yeah, that's an option but will require some vfs support similar to with nodev, but I wouldn't call it simpler. The implementation of using a single namespace for all uid/gid translation is actually quite a bit simpler than trying to use the context of the /dev/fuse reads and writes. The reason is that much of the translation happens in the context of the process doing fs operations on the fuse mount, not in the context of the /dev/fuse reads and writes, so it involves a lot of grabbing references to namespaces and passing them around. Either that or moving that code to happen in the read/write path, which would also require some substantial changes. Thanks, Seth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces 2014-09-23 17:03 ` Miklos Szeredi 2014-09-23 17:33 ` Seth Forshee @ 2014-09-23 21:46 ` Eric W. Biederman 1 sibling, 0 replies; 39+ messages in thread From: Eric W. Biederman @ 2014-09-23 21:46 UTC (permalink / raw) To: Miklos Szeredi Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List, Linux-Fsdevel Miklos Szeredi <miklos@szeredi.hu> writes: > On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: >> On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote: >>> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote: >>> > Here's an updated set of patches for allowing fuse mounts from pid and >>> > user namespaces. I discussed some of the issues we debated with the last >>> > patch set (and a few others) with Eric at LinuxCon, and the updates here >>> > mainly reflect the outcome of those discussions. >>> > >>> > The stickiest issue in the v1 patches was the question of where to get >>> > the user and pid namespaces from that are used for translating ids for >>> > communication with userspace. Eric told me that for user namespaces at >>> > least we need to grab a namespace at open or mount time and use only >>> > that namespace to prevent certain types of attacks. >>> >>> I'm not convinced. Let us have the gory details, please. >> >> I'll do my best, and hopefully Eric will fill in any details I miss. >> >> I think there may have been more than one possible scenario that Eric >> was describing to me, but this is the one I remember. A user could >> create a namespace and mount a fuse filesystem without nosuid. It could >> then pass the /dev/fuse fd to a process in init_user_ns, which could >> expose a suid file owned by root (or any other user) and use it to gain >> elevated privileges. >> >> On the other hand, if file ownership is always interpreted in the >> context of the namespace from which the filesystem is mounted then suid >> can only be used to become another uid already under that user's >> control. > > Much simpler solution: don't allow SUID in unprivileged namespaces. > SUID is a really ugly hack with many problems, just get rid of it. Except that doesn't solve the problem. The core problem is how do we avoid allowing letting a processes implementing a fuse filesystem to manipulate a process with privileges that it does not. The classic fuse solution to this is to only allow a single uid to access the fuse filesystem. With user namespaces we can relax that restriction when mounted with just user namespace root permissions to allow a filesystem to use any uids mapped into the user namespace, as the mounter of the filesystem. The user namespace root that is mounting the filesystem already has privileges to manipulate those users already. Which means the simple straight forward and understandable restriction to enforce is: - The user namespace of the opener of /dev/fuse is the same as the user namespace the fuse filesystem is mounted in is the same as the user namespace we translate any uids in and out of. By forbidding all uid s and gids not mapped into a user namespace from being used you can not manipulate processes you would not be able to manipulate before. As a nice side effect that happens to make having the setuid bit set something of a fuse filesystem something that is uninteresting. At that point the setuid bit being set is no more dangerous than the setuid being being set on any file you own. Eric ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2014-10-06 16:37 UTC | newest] Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee 2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee 2014-09-05 17:05 ` Serge Hallyn 2014-09-05 19:00 ` Seth Forshee 2014-09-05 19:23 ` Serge Hallyn 2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee 2014-09-05 17:10 ` Serge Hallyn 2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee 2014-09-05 16:48 ` Serge Hallyn 2014-09-05 17:36 ` Seth Forshee 2014-09-05 19:25 ` Serge Hallyn 2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee 2014-09-10 12:35 ` Seth Forshee 2014-09-10 16:21 ` Serge E. Hallyn 2014-09-10 16:42 ` Seth Forshee 2014-09-11 18:10 ` Seth Forshee 2014-09-23 22:29 ` Eric W. Biederman 2014-09-24 13:29 ` Seth Forshee 2014-09-24 17:10 ` Eric W. Biederman 2014-09-25 15:04 ` Miklos Szeredi 2014-09-25 16:21 ` Seth Forshee 2014-09-25 18:05 ` Eric W. Biederman 2014-09-25 18:44 ` Seth Forshee 2014-09-25 18:53 ` Seth Forshee 2014-09-25 19:14 ` Eric W. Biederman 2014-09-25 19:48 ` Seth Forshee 2014-09-27 1:41 ` Eric W. Biederman 2014-09-27 4:24 ` Seth Forshee 2014-09-29 19:34 ` Eric W. Biederman 2014-09-30 16:25 ` Seth Forshee 2014-10-05 16:48 ` Seth Forshee 2014-10-06 16:00 ` Serge Hallyn 2014-10-06 16:31 ` Seth Forshee 2014-10-06 16:36 ` Serge Hallyn 2014-09-23 16:07 ` Miklos Szeredi 2014-09-23 16:26 ` Seth Forshee 2014-09-23 17:03 ` Miklos Szeredi 2014-09-23 17:33 ` Seth Forshee 2014-09-23 21:46 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).