[PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
@ 2014-09-02 15:44 Seth Forshee
  2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel, Seth Forshee

Here's an updated set of patches for allowing fuse mounts from pid and
user namespaces. I discussed some of the issues we debated with the last
patch set (and a few others) with Eric at LinuxCon, and the updates here
mainly reflect the outcome of those discussions.

The stickiest issue in the v1 patches was the question of where to get
the user and pid namespaces from that are used for translating ids for
communication with userspace. Eric told me that for user namespaces at
least we need to grab a namespace at open or mount time and use only
that namespace to prevent certain types of attacks. That rules out the
suggestion of using the user ns of current in the read/write paths, and
I think it makes sense to handle pid and user namespaces similarly. So
in these patches I'm still grabbing the namespaces of current during
mount, but I've added an additional check to fail the mount if the
f_cred's userns for the fd to userspace doesn't match.

Another issue mentioned by Eric was what to use for i_[ug]id if the ids
from userspace don't map into the user namespace, which is going to be a
problem for any other filesystems which become mountable from user
namespaces as well. We discussed a few options for addressing this, the
most promising of which seems to be either using INVALID_[UG]ID for
these inodes or creating vfs-wide "nobody" ids for this purpose. After
thinking about it for a while I'm favoring using the invalid ids, but
I'm hoping to solicit some more feedback.

For now these patches are using invalid ids if the user doesn't map into
the namespace. I went through the vfs code and found one place where
this could be handled better (addressed in patch 1 of the series). The
only other issue I found was that currently no one, not even root, can
change onwership of such inodes, but I suspect we can find a way around
this.

The only other change since v1 is that I now fail changing file
ownership if the new uid or gid does not map into the namespace used for
userspace communication.

Thanks,
Seth

Seth Forshee (3):
  vfs: Check for invalid i_uid in may_follow_link()
  fuse: Translate pids passed to userspace into pid namespaces
  fuse: Add support for mounts from user namespaces

 fs/fuse/dev.c    | 13 +++++++------
 fs/fuse/dir.c    | 46 +++++++++++++++++++++++++++++++++-------------
 fs/fuse/fuse_i.h |  8 ++++++++
 fs/fuse/inode.c  | 31 +++++++++++++++++++++++--------
 fs/namei.c       |  2 +-
 5 files changed, 72 insertions(+), 28 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link()
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
@ 2014-09-02 15:44 ` Seth Forshee
  2014-09-05 17:05   ` Serge Hallyn
  2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel, Seth Forshee

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted. Prevent this
by adding a check that the uid is valid before the comparison.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/namei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index a996bb48dfab..193da09e903e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	if (uid_eq(parent->i_uid, inode->i_uid))
+	if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid))
 		return 0;
 
 	audit_log_link_denied("follow_link", link);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link()
  2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee
@ 2014-09-05 17:05   ` Serge Hallyn
  2014-09-05 19:00     ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Serge Hallyn @ 2014-09-05 17:05 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> Filesystem uids which don't map into a user namespace may result
> in inode->i_uid being INVALID_UID. A symlink and its parent
> could have different owners in the filesystem can both get
> mapped to INVALID_UID, which may result in following a symlink
> when this would not have otherwise been permitted. Prevent this
> by adding a check that the uid is valid before the comparison.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

I'm a bit uncomfortable about this, but I can't put my finger
on why.  Wonder if it could mess up root looking into
a malicious user's task by looking under /proc/self/root.
I suppose not, as this should only be the case (with root in
init_user_ns) for fuse?

Anyway it seems needed for keeping root from falling into a trap.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

> ---
>  fs/namei.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index a996bb48dfab..193da09e903e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
>  		return 0;
>  
>  	/* Allowed if parent directory and link owner match. */
> -	if (uid_eq(parent->i_uid, inode->i_uid))
> +	if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid))
>  		return 0;
>  
>  	audit_log_link_denied("follow_link", link);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link()
  2014-09-05 17:05   ` Serge Hallyn
@ 2014-09-05 19:00     ` Seth Forshee
  2014-09-05 19:23       ` Serge Hallyn
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-05 19:00 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel, seth.forshee

On Fri, Sep 05, 2014 at 05:05:09PM +0000, Serge Hallyn wrote:
> Quoting Seth Forshee (seth.forshee@canonical.com):
> > Filesystem uids which don't map into a user namespace may result
> > in inode->i_uid being INVALID_UID. A symlink and its parent
> > could have different owners in the filesystem can both get
> > mapped to INVALID_UID, which may result in following a symlink
> > when this would not have otherwise been permitted. Prevent this
> > by adding a check that the uid is valid before the comparison.
> > 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> 
> I'm a bit uncomfortable about this, but I can't put my finger
> on why.  Wonder if it could mess up root looking into
> a malicious user's task by looking under /proc/self/root.
> I suppose not, as this should only be the case (with root in
> init_user_ns) for fuse?
> 
> Anyway it seems needed for keeping root from falling into a trap.

This shouldn't mess up looking into /proc/self/root or anything else
where the symlink isn't under control of the malicious user. Plus this
restriction only applies to world-writable directories with the sticky
bit set.

> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> 
> > ---
> >  fs/namei.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index a996bb48dfab..193da09e903e 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
> >  		return 0;
> >  
> >  	/* Allowed if parent directory and link owner match. */
> > -	if (uid_eq(parent->i_uid, inode->i_uid))
> > +	if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid))
> >  		return 0;
> >  
> >  	audit_log_link_denied("follow_link", link);
> > -- 
> > 1.9.1
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link()
  2014-09-05 19:00     ` Seth Forshee
@ 2014-09-05 19:23       ` Serge Hallyn
  0 siblings, 0 replies; 39+ messages in thread
From: Serge Hallyn @ 2014-09-05 19:23 UTC (permalink / raw)
  To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> On Fri, Sep 05, 2014 at 05:05:09PM +0000, Serge Hallyn wrote:
> > Quoting Seth Forshee (seth.forshee@canonical.com):
> > > Filesystem uids which don't map into a user namespace may result
> > > in inode->i_uid being INVALID_UID. A symlink and its parent
> > > could have different owners in the filesystem can both get
> > > mapped to INVALID_UID, which may result in following a symlink
> > > when this would not have otherwise been permitted. Prevent this
> > > by adding a check that the uid is valid before the comparison.
> > > 
> > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > 
> > I'm a bit uncomfortable about this, but I can't put my finger
> > on why.  Wonder if it could mess up root looking into
> > a malicious user's task by looking under /proc/self/root.
> > I suppose not, as this should only be the case (with root in
> > init_user_ns) for fuse?
> > 
> > Anyway it seems needed for keeping root from falling into a trap.
> 
> This shouldn't mess up looking into /proc/self/root or anything else
> where the symlink isn't under control of the malicious user. Plus this
> restriction only applies to world-writable directories with the sticky
> bit set.

Oh - yeah, i saw that and glossed over it for some reason.  Thanks.

> > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> > 
> > > ---
> > >  fs/namei.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index a996bb48dfab..193da09e903e 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -741,7 +741,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
> > >  		return 0;
> > >  
> > >  	/* Allowed if parent directory and link owner match. */
> > > -	if (uid_eq(parent->i_uid, inode->i_uid))
> > > +	if (uid_valid(inode->i_uid) && uid_eq(parent->i_uid, inode->i_uid))
> > >  		return 0;
> > >  
> > >  	audit_log_link_denied("follow_link", link);
> > > -- 
> > > 1.9.1
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
  2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee
@ 2014-09-02 15:44 ` Seth Forshee
  2014-09-05 17:10   ` Serge Hallyn
  2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel, Seth Forshee

If the process reading on the fuse fd is executing in a pid
namespace then giving it the global pid of the process making
a request doesn't make sense. Instead, capture the pid namespace
when the filesystem is first mounted and translate pids into this
namespace before passing them to userspace.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/fuse/dev.c    | 9 +++++----
 fs/fuse/fuse_i.h | 4 ++++
 fs/fuse/inode.c  | 3 +++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ca887314aba9..839caebd34f1 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -20,6 +20,7 @@
 #include <linux/swap.h>
 #include <linux/splice.h>
 #include <linux/aio.h>
+#include <linux/sched.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
@@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
 	atomic_dec(&req->count);
 }
 
-static void fuse_req_init_context(struct fuse_req *req)
+static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
 	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
 	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
-	req->in.h.pid = current->pid;
+	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
 static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)
@@ -168,7 +169,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 		goto out;
 	}
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	req->waiting = 1;
 	req->background = for_background;
 	return req;
@@ -257,7 +258,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
 	if (!req)
 		req = get_reserved_req(fc, file);
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	req->waiting = 1;
 	req->background = 0;
 	return req;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e47a6ab518..a3ded071e2c6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -22,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/poll.h>
 #include <linux/workqueue.h>
+#include <linux/pid_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -386,6 +387,9 @@ struct fuse_conn {
 	/** The group id for this mount */
 	kgid_t group_id;
 
+	/** The pid namespace for this mount */
+	struct pid_namespace *pid_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 03246cd9d47a..c6d8473b1a80 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/exportfs.h>
+#include <linux/pid_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->initialized = 0;
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
+	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -953,6 +955,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
 
 static void fuse_free_conn(struct fuse_conn *fc)
 {
+	put_pid_ns(fc->pid_ns);
 	kfree_rcu(fc, rcu);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces
  2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee
@ 2014-09-05 17:10   ` Serge Hallyn
  0 siblings, 0 replies; 39+ messages in thread
From: Serge Hallyn @ 2014-09-05 17:10 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> If the process reading on the fuse fd is executing in a pid
> namespace then giving it the global pid of the process making
> a request doesn't make sense. Instead, capture the pid namespace
> when the filesystem is first mounted and translate pids into this
> namespace before passing them to userspace.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

> ---
>  fs/fuse/dev.c    | 9 +++++----
>  fs/fuse/fuse_i.h | 4 ++++
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index ca887314aba9..839caebd34f1 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -20,6 +20,7 @@
>  #include <linux/swap.h>
>  #include <linux/splice.h>
>  #include <linux/aio.h>
> +#include <linux/sched.h>
>  
>  MODULE_ALIAS_MISCDEV(FUSE_MINOR);
>  MODULE_ALIAS("devname:fuse");
> @@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
>  	atomic_dec(&req->count);
>  }
>  
> -static void fuse_req_init_context(struct fuse_req *req)
> +static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
>  {
>  	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
>  	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> -	req->in.h.pid = current->pid;
> +	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
>  }
>  
>  static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)
> @@ -168,7 +169,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
>  		goto out;
>  	}
>  
> -	fuse_req_init_context(req);
> +	fuse_req_init_context(fc, req);
>  	req->waiting = 1;
>  	req->background = for_background;
>  	return req;
> @@ -257,7 +258,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
>  	if (!req)
>  		req = get_reserved_req(fc, file);
>  
> -	fuse_req_init_context(req);
> +	fuse_req_init_context(fc, req);
>  	req->waiting = 1;
>  	req->background = 0;
>  	return req;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e47a6ab518..a3ded071e2c6 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -22,6 +22,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/poll.h>
>  #include <linux/workqueue.h>
> +#include <linux/pid_namespace.h>
>  
>  /** Max number of pages that can be used in a single read request */
>  #define FUSE_MAX_PAGES_PER_REQ 32
> @@ -386,6 +387,9 @@ struct fuse_conn {
>  	/** The group id for this mount */
>  	kgid_t group_id;
>  
> +	/** The pid namespace for this mount */
> +	struct pid_namespace *pid_ns;
> +
>  	/** The fuse mount flags for this mount */
>  	unsigned flags;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 03246cd9d47a..c6d8473b1a80 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -20,6 +20,7 @@
>  #include <linux/random.h>
>  #include <linux/sched.h>
>  #include <linux/exportfs.h>
> +#include <linux/pid_namespace.h>
>  
>  MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
>  MODULE_DESCRIPTION("Filesystem in Userspace");
> @@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc)
>  	fc->initialized = 0;
>  	fc->attr_version = 1;
>  	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
> +	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
>  }
>  EXPORT_SYMBOL_GPL(fuse_conn_init);
>  
> @@ -953,6 +955,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
>  
>  static void fuse_free_conn(struct fuse_conn *fc)
>  {
> +	put_pid_ns(fc->pid_ns);
>  	kfree_rcu(fc, rcu);
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] fuse: Add support for mounts from user namespaces
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
  2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee
  2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee
@ 2014-09-02 15:44 ` Seth Forshee
  2014-09-05 16:48   ` Serge Hallyn
  2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-02 15:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel, Seth Forshee

Update fuse to support mounts from within user namespaces. This
is mostly a matter of translating uids and gids into the
namespace of the process reading requests before handing the
requests off to userspace.

Due to security concerns the namespace used should be fixed,
otherwise a user might be able to pass the fuse fd to
init_user_ns and inject suid files owned by a user outside the
namespace in order to gain elevated privileges. For fuse we
stash current_user_ns() when a filesystem is first mounted and
abort the mount if this namespace is different than the one used
to open the fd passed in the mount options.

The allow_others options could also be a problem, as a userns
mount could bypass system policy for this option and thus open
the possiblity of DoS attacks. This is prevented by restricting
the scope of allow_other to apply only to that superblock's
userns and its children, giving the expected behavior within the
userns while preventing DoS attacks on more privileged contexts.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/fuse/dev.c    |  4 ++--
 fs/fuse/dir.c    | 46 +++++++++++++++++++++++++++++++++-------------
 fs/fuse/fuse_i.h |  4 ++++
 fs/fuse/inode.c  | 28 ++++++++++++++++++++--------
 4 files changed, 59 insertions(+), 23 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 839caebd34f1..03c8785ed731 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
+	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
 	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index de1d84af9f7c..c0b9968db6a1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(&init_user_ns, attr->uid);
-	stat->gid = make_kgid(&init_user_ns, attr->gid);
+	stat->uid = make_kuid(fc->user_ns, attr->uid);
+	stat->gid = make_kgid(fc->user_ns, attr->gid);
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
@@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
  */
 int fuse_allow_current_process(struct fuse_conn *fc)
 {
-	const struct cred *cred;
+	const struct cred *cred = current_cred();
 
-	if (fc->flags & FUSE_ALLOW_OTHER)
-		return 1;
+	if (fc->flags & FUSE_ALLOW_OTHER) {
+		if (kuid_has_mapping(fc->user_ns, cred->euid) &&
+		    kuid_has_mapping(fc->user_ns, cred->suid) &&
+		    kuid_has_mapping(fc->user_ns, cred->uid) &&
+		    kgid_has_mapping(fc->user_ns, cred->egid) &&
+		    kgid_has_mapping(fc->user_ns, cred->sgid) &&
+		    kgid_has_mapping(fc->user_ns, cred->gid))
+			return 1;
+
+		return 0;
+	}
 
-	cred = current_cred();
 	if (uid_eq(cred->euid, fc->user_id) &&
 	    uid_eq(cred->suid, fc->user_id) &&
 	    uid_eq(cred->uid,  fc->user_id) &&
@@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
 	return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-			   bool trust_local_cmtime)
+static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
 	unsigned ivalid = iattr->ia_valid;
 
 	if (ivalid & ATTR_MODE)
 		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
-	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
-	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+	if (ivalid & ATTR_UID) {
+		arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
+		if (arg->uid == (uid_t)-1)
+			return -EINVAL;
+		arg->valid |= FATTR_UID;
+	}
+	if (ivalid & ATTR_GID) {
+		arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
+		if (arg->gid == (gid_t)-1)
+			return -EINVAL;
+		arg->valid |= FATTR_GID;
+	}
 	if (ivalid & ATTR_SIZE)
 		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
 	if (ivalid & ATTR_ATIME) {
@@ -1588,6 +1604,8 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
 		arg->ctime = iattr->ia_ctime.tv_sec;
 		arg->ctimensec = iattr->ia_ctime.tv_nsec;
 	}
+
+	return 0;
 }
 
 /*
@@ -1741,7 +1759,9 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
 
 	memset(&inarg, 0, sizeof(inarg));
 	memset(&outarg, 0, sizeof(outarg));
-	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+	err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
+	if (err)
+		goto error;
 	if (file) {
 		struct fuse_file *ff = file->private_data;
 		inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a3ded071e2c6..2cfd0ca3407a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -22,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/poll.h>
 #include <linux/workqueue.h>
+#include <linux/user_namespace.h>
 #include <linux/pid_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
@@ -387,6 +388,9 @@ struct fuse_conn {
 	/** The group id for this mount */
 	kgid_t group_id;
 
+	/** The user namespace for this mount */
+	struct user_namespace *user_ns;
+
 	/** The pid namespace for this mount */
 	struct pid_namespace *pid_ns;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index c6d8473b1a80..f3a3ded82f85 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	set_nlink(inode, attr->nlink);
-	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
-	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
+	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
+	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
 	inode->i_blocks  = attr->blocks;
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
@@ -496,6 +496,8 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 	memset(d, 0, sizeof(struct fuse_mount_data));
 	d->max_read = ~0;
 	d->blksize = FUSE_DEFAULT_BLKSIZE;
+	d->user_id = make_kuid(current_user_ns(), 0);
+	d->group_id = make_kgid(current_user_ns(), 0);
 
 	while ((p = strsep(&opt, ",")) != NULL) {
 		int token;
@@ -578,8 +580,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
 
-	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+	seq_printf(m, ",user_id=%u",
+		   from_kuid_munged(fc->user_ns, fc->user_id));
+	seq_printf(m, ",group_id=%u",
+		   from_kgid_munged(fc->user_ns, fc->group_id));
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
@@ -618,6 +622,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
+	fc->user_ns = get_user_ns(current_user_ns());
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -956,6 +961,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
 static void fuse_free_conn(struct fuse_conn *fc)
 {
 	put_pid_ns(fc->pid_ns);
+	put_user_ns(fc->user_ns);
 	kfree_rcu(fc, rcu);
 }
 
@@ -1042,8 +1048,14 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!file)
 		goto err;
 
-	if ((file->f_op != &fuse_dev_operations) ||
-	    (file->f_cred->user_ns != &init_user_ns))
+	/*
+	 * Require mount to happen from the same user namespace which
+	 * opened /dev/fuse, otherwise users might be able to
+	 * elevate privileges by opening in init_user_ns then
+	 * mounting from a different namespace without MNT_NOSUID.
+	 */
+	if (file->f_op != &fuse_dev_operations ||
+	    file->f_cred->user_ns != current_user_ns())
 		goto err_fput;
 
 	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
@@ -1157,7 +1169,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 static struct file_system_type fuse_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "fuse",
-	.fs_flags	= FS_HAS_SUBTYPE,
+	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 	.mount		= fuse_mount,
 	.kill_sb	= fuse_kill_sb_anon,
 };
@@ -1189,7 +1201,7 @@ static struct file_system_type fuseblk_fs_type = {
 	.name		= "fuseblk",
 	.mount		= fuse_mount_blk,
 	.kill_sb	= fuse_kill_sb_blk,
-	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
+	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 };
 MODULE_ALIAS_FS("fuseblk");
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces
  2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee
@ 2014-09-05 16:48   ` Serge Hallyn
  2014-09-05 17:36     ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Serge Hallyn @ 2014-09-05 16:48 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> Update fuse to support mounts from within user namespaces. This
> is mostly a matter of translating uids and gids into the
> namespace of the process reading requests before handing the
> requests off to userspace.
> 
> Due to security concerns the namespace used should be fixed,
> otherwise a user might be able to pass the fuse fd to
> init_user_ns and inject suid files owned by a user outside the
> namespace in order to gain elevated privileges. For fuse we
> stash current_user_ns() when a filesystem is first mounted and
> abort the mount if this namespace is different than the one used
> to open the fd passed in the mount options.
> 
> The allow_others options could also be a problem, as a userns
> mount could bypass system policy for this option and thus open
> the possiblity of DoS attacks. This is prevented by restricting
> the scope of allow_other to apply only to that superblock's
> userns and its children, giving the expected behavior within the
> userns while preventing DoS attacks on more privileged contexts.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Thanks, Seth, just two little questions below.

> ---
>  fs/fuse/dev.c    |  4 ++--
>  fs/fuse/dir.c    | 46 +++++++++++++++++++++++++++++++++-------------
>  fs/fuse/fuse_i.h |  4 ++++
>  fs/fuse/inode.c  | 28 ++++++++++++++++++++--------
>  4 files changed, 59 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 839caebd34f1..03c8785ed731 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
>  
>  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
>  {
> -	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
> -	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> +	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
> +	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
>  	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
>  }
>  
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index de1d84af9f7c..c0b9968db6a1 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>  	stat->ino = attr->ino;
>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	stat->nlink = attr->nlink;
> -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> +	stat->uid = make_kuid(fc->user_ns, attr->uid);
> +	stat->gid = make_kgid(fc->user_ns, attr->gid);
>  	stat->rdev = inode->i_rdev;
>  	stat->atime.tv_sec = attr->atime;
>  	stat->atime.tv_nsec = attr->atimensec;
> @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
>   */
>  int fuse_allow_current_process(struct fuse_conn *fc)
>  {
> -	const struct cred *cred;
> +	const struct cred *cred = current_cred();
>  
> -	if (fc->flags & FUSE_ALLOW_OTHER)
> -		return 1;
> +	if (fc->flags & FUSE_ALLOW_OTHER) {
> +		if (kuid_has_mapping(fc->user_ns, cred->euid) &&
> +		    kuid_has_mapping(fc->user_ns, cred->suid) &&
> +		    kuid_has_mapping(fc->user_ns, cred->uid) &&
> +		    kgid_has_mapping(fc->user_ns, cred->egid) &&
> +		    kgid_has_mapping(fc->user_ns, cred->sgid) &&
> +		    kgid_has_mapping(fc->user_ns, cred->gid))

Should fsuid be checked here?

> +			return 1;
> +
> +		return 0;
> +	}
>  
> -	cred = current_cred();
>  	if (uid_eq(cred->euid, fc->user_id) &&
>  	    uid_eq(cred->suid, fc->user_id) &&
>  	    uid_eq(cred->uid,  fc->user_id) &&
> @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
>  	return true;
>  }
>  
> -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
> -			   bool trust_local_cmtime)
> +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
> +			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
>  {
>  	unsigned ivalid = iattr->ia_valid;
>  
>  	if (ivalid & ATTR_MODE)
>  		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
> -	if (ivalid & ATTR_UID)
> -		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
> -	if (ivalid & ATTR_GID)
> -		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
> +	if (ivalid & ATTR_UID) {
> +		arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
> +		if (arg->uid == (uid_t)-1)

Any reason not to use uid_valid() here (and gid_valid() below)?

> +			return -EINVAL;
> +		arg->valid |= FATTR_UID;
> +	}
> +	if (ivalid & ATTR_GID) {
> +		arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
> +		if (arg->gid == (gid_t)-1)
> +			return -EINVAL;
> +		arg->valid |= FATTR_GID;
> +	}
>  	if (ivalid & ATTR_SIZE)
>  		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
>  	if (ivalid & ATTR_ATIME) {
> @@ -1588,6 +1604,8 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
>  		arg->ctime = iattr->ia_ctime.tv_sec;
>  		arg->ctimensec = iattr->ia_ctime.tv_nsec;
>  	}
> +
> +	return 0;
>  }
>  
>  /*
> @@ -1741,7 +1759,9 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
>  
>  	memset(&inarg, 0, sizeof(inarg));
>  	memset(&outarg, 0, sizeof(outarg));
> -	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
> +	err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
> +	if (err)
> +		goto error;
>  	if (file) {
>  		struct fuse_file *ff = file->private_data;
>  		inarg.valid |= FATTR_FH;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index a3ded071e2c6..2cfd0ca3407a 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -22,6 +22,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/poll.h>
>  #include <linux/workqueue.h>
> +#include <linux/user_namespace.h>
>  #include <linux/pid_namespace.h>
>  
>  /** Max number of pages that can be used in a single read request */
> @@ -387,6 +388,9 @@ struct fuse_conn {
>  	/** The group id for this mount */
>  	kgid_t group_id;
>  
> +	/** The user namespace for this mount */
> +	struct user_namespace *user_ns;
> +
>  	/** The pid namespace for this mount */
>  	struct pid_namespace *pid_ns;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index c6d8473b1a80..f3a3ded82f85 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>  	inode->i_ino     = fuse_squash_ino(attr->ino);
>  	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	set_nlink(inode, attr->nlink);
> -	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
> -	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
> +	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
> +	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
>  	inode->i_blocks  = attr->blocks;
>  	inode->i_atime.tv_sec   = attr->atime;
>  	inode->i_atime.tv_nsec  = attr->atimensec;
> @@ -496,6 +496,8 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>  	memset(d, 0, sizeof(struct fuse_mount_data));
>  	d->max_read = ~0;
>  	d->blksize = FUSE_DEFAULT_BLKSIZE;
> +	d->user_id = make_kuid(current_user_ns(), 0);
> +	d->group_id = make_kgid(current_user_ns(), 0);
>  
>  	while ((p = strsep(&opt, ",")) != NULL) {
>  		int token;
> @@ -578,8 +580,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  	struct super_block *sb = root->d_sb;
>  	struct fuse_conn *fc = get_fuse_conn_super(sb);
>  
> -	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
> -	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
> +	seq_printf(m, ",user_id=%u",
> +		   from_kuid_munged(fc->user_ns, fc->user_id));
> +	seq_printf(m, ",group_id=%u",
> +		   from_kgid_munged(fc->user_ns, fc->group_id));
>  	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
>  		seq_puts(m, ",default_permissions");
>  	if (fc->flags & FUSE_ALLOW_OTHER)
> @@ -618,6 +622,7 @@ void fuse_conn_init(struct fuse_conn *fc)
>  	fc->attr_version = 1;
>  	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
>  	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
> +	fc->user_ns = get_user_ns(current_user_ns());
>  }
>  EXPORT_SYMBOL_GPL(fuse_conn_init);
>  
> @@ -956,6 +961,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
>  static void fuse_free_conn(struct fuse_conn *fc)
>  {
>  	put_pid_ns(fc->pid_ns);
> +	put_user_ns(fc->user_ns);
>  	kfree_rcu(fc, rcu);
>  }
>  
> @@ -1042,8 +1048,14 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>  	if (!file)
>  		goto err;
>  
> -	if ((file->f_op != &fuse_dev_operations) ||
> -	    (file->f_cred->user_ns != &init_user_ns))
> +	/*
> +	 * Require mount to happen from the same user namespace which
> +	 * opened /dev/fuse, otherwise users might be able to
> +	 * elevate privileges by opening in init_user_ns then
> +	 * mounting from a different namespace without MNT_NOSUID.
> +	 */
> +	if (file->f_op != &fuse_dev_operations ||
> +	    file->f_cred->user_ns != current_user_ns())
>  		goto err_fput;
>  
>  	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
> @@ -1157,7 +1169,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>  static struct file_system_type fuse_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "fuse",
> -	.fs_flags	= FS_HAS_SUBTYPE,
> +	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>  	.mount		= fuse_mount,
>  	.kill_sb	= fuse_kill_sb_anon,
>  };
> @@ -1189,7 +1201,7 @@ static struct file_system_type fuseblk_fs_type = {
>  	.name		= "fuseblk",
>  	.mount		= fuse_mount_blk,
>  	.kill_sb	= fuse_kill_sb_blk,
> -	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
> +	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>  };
>  MODULE_ALIAS_FS("fuseblk");
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces
  2014-09-05 16:48   ` Serge Hallyn
@ 2014-09-05 17:36     ` Seth Forshee
  2014-09-05 19:25       ` Serge Hallyn
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-05 17:36 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel, seth.forshee

On Fri, Sep 05, 2014 at 04:48:11PM +0000, Serge Hallyn wrote:
> Quoting Seth Forshee (seth.forshee@canonical.com):
> > Update fuse to support mounts from within user namespaces. This
> > is mostly a matter of translating uids and gids into the
> > namespace of the process reading requests before handing the
> > requests off to userspace.
> > 
> > Due to security concerns the namespace used should be fixed,
> > otherwise a user might be able to pass the fuse fd to
> > init_user_ns and inject suid files owned by a user outside the
> > namespace in order to gain elevated privileges. For fuse we
> > stash current_user_ns() when a filesystem is first mounted and
> > abort the mount if this namespace is different than the one used
> > to open the fd passed in the mount options.
> > 
> > The allow_others options could also be a problem, as a userns
> > mount could bypass system policy for this option and thus open
> > the possiblity of DoS attacks. This is prevented by restricting
> > the scope of allow_other to apply only to that superblock's
> > userns and its children, giving the expected behavior within the
> > userns while preventing DoS attacks on more privileged contexts.
> > 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> 
> Thanks, Seth, just two little questions below.
> 
> > ---
> >  fs/fuse/dev.c    |  4 ++--
> >  fs/fuse/dir.c    | 46 +++++++++++++++++++++++++++++++++-------------
> >  fs/fuse/fuse_i.h |  4 ++++
> >  fs/fuse/inode.c  | 28 ++++++++++++++++++++--------
> >  4 files changed, 59 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 839caebd34f1..03c8785ed731 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
> >  
> >  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
> >  {
> > -	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
> > -	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> > +	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
> > +	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
> >  	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
> >  }
> >  
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index de1d84af9f7c..c0b9968db6a1 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >  	stat->ino = attr->ino;
> >  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >  	stat->nlink = attr->nlink;
> > -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> > -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> > +	stat->uid = make_kuid(fc->user_ns, attr->uid);
> > +	stat->gid = make_kgid(fc->user_ns, attr->gid);
> >  	stat->rdev = inode->i_rdev;
> >  	stat->atime.tv_sec = attr->atime;
> >  	stat->atime.tv_nsec = attr->atimensec;
> > @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
> >   */
> >  int fuse_allow_current_process(struct fuse_conn *fc)
> >  {
> > -	const struct cred *cred;
> > +	const struct cred *cred = current_cred();
> >  
> > -	if (fc->flags & FUSE_ALLOW_OTHER)
> > -		return 1;
> > +	if (fc->flags & FUSE_ALLOW_OTHER) {
> > +		if (kuid_has_mapping(fc->user_ns, cred->euid) &&
> > +		    kuid_has_mapping(fc->user_ns, cred->suid) &&
> > +		    kuid_has_mapping(fc->user_ns, cred->uid) &&
> > +		    kgid_has_mapping(fc->user_ns, cred->egid) &&
> > +		    kgid_has_mapping(fc->user_ns, cred->sgid) &&
> > +		    kgid_has_mapping(fc->user_ns, cred->gid))
> 
> Should fsuid be checked here?

The point of restricting access here is to prevent a DoS type of attack
on a more privileged context by making a filesystem operation block
indefinitely. Coming from that perspective I was thinking that these
checks ought to be sufficient, but I could be wrong.

> 
> > +			return 1;
> > +
> > +		return 0;
> > +	}
> >  
> > -	cred = current_cred();
> >  	if (uid_eq(cred->euid, fc->user_id) &&
> >  	    uid_eq(cred->suid, fc->user_id) &&
> >  	    uid_eq(cred->uid,  fc->user_id) &&
> > @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
> >  	return true;
> >  }
> >  
> > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
> > -			   bool trust_local_cmtime)
> > +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
> > +			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
> >  {
> >  	unsigned ivalid = iattr->ia_valid;
> >  
> >  	if (ivalid & ATTR_MODE)
> >  		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
> > -	if (ivalid & ATTR_UID)
> > -		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
> > -	if (ivalid & ATTR_GID)
> > -		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
> > +	if (ivalid & ATTR_UID) {
> > +		arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
> > +		if (arg->uid == (uid_t)-1)
> 
> Any reason not to use uid_valid() here (and gid_valid() below)?

Yes. arg->uid is a uid_t and not a kuid_t, so it wouldn't be valid to
pass that to uid_valid(). And from_kuid() can return -1 for values other
than INVALID_UID.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] fuse: Add support for mounts from user namespaces
  2014-09-05 17:36     ` Seth Forshee
@ 2014-09-05 19:25       ` Serge Hallyn
  0 siblings, 0 replies; 39+ messages in thread
From: Serge Hallyn @ 2014-09-05 19:25 UTC (permalink / raw)
  To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, fuse-devel,
	linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> On Fri, Sep 05, 2014 at 04:48:11PM +0000, Serge Hallyn wrote:
> > Quoting Seth Forshee (seth.forshee@canonical.com):
> > > Update fuse to support mounts from within user namespaces. This
> > > is mostly a matter of translating uids and gids into the
> > > namespace of the process reading requests before handing the
> > > requests off to userspace.
> > > 
> > > Due to security concerns the namespace used should be fixed,
> > > otherwise a user might be able to pass the fuse fd to
> > > init_user_ns and inject suid files owned by a user outside the
> > > namespace in order to gain elevated privileges. For fuse we
> > > stash current_user_ns() when a filesystem is first mounted and
> > > abort the mount if this namespace is different than the one used
> > > to open the fd passed in the mount options.
> > > 
> > > The allow_others options could also be a problem, as a userns
> > > mount could bypass system policy for this option and thus open
> > > the possiblity of DoS attacks. This is prevented by restricting
> > > the scope of allow_other to apply only to that superblock's
> > > userns and its children, giving the expected behavior within the
> > > userns while preventing DoS attacks on more privileged contexts.
> > > 
> > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

> > 
> > Thanks, Seth, just two little questions below.
> > 
> > > ---
> > >  fs/fuse/dev.c    |  4 ++--
> > >  fs/fuse/dir.c    | 46 +++++++++++++++++++++++++++++++++-------------
> > >  fs/fuse/fuse_i.h |  4 ++++
> > >  fs/fuse/inode.c  | 28 ++++++++++++++++++++--------
> > >  4 files changed, 59 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 839caebd34f1..03c8785ed731 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
> > >  
> > >  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
> > >  {
> > > -	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
> > > -	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> > > +	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
> > > +	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
> > >  	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
> > >  }
> > >  
> > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > index de1d84af9f7c..c0b9968db6a1 100644
> > > --- a/fs/fuse/dir.c
> > > +++ b/fs/fuse/dir.c
> > > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> > >  	stat->ino = attr->ino;
> > >  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> > >  	stat->nlink = attr->nlink;
> > > -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> > > -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> > > +	stat->uid = make_kuid(fc->user_ns, attr->uid);
> > > +	stat->gid = make_kgid(fc->user_ns, attr->gid);
> > >  	stat->rdev = inode->i_rdev;
> > >  	stat->atime.tv_sec = attr->atime;
> > >  	stat->atime.tv_nsec = attr->atimensec;
> > > @@ -1085,12 +1085,20 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
> > >   */
> > >  int fuse_allow_current_process(struct fuse_conn *fc)
> > >  {
> > > -	const struct cred *cred;
> > > +	const struct cred *cred = current_cred();
> > >  
> > > -	if (fc->flags & FUSE_ALLOW_OTHER)
> > > -		return 1;
> > > +	if (fc->flags & FUSE_ALLOW_OTHER) {
> > > +		if (kuid_has_mapping(fc->user_ns, cred->euid) &&
> > > +		    kuid_has_mapping(fc->user_ns, cred->suid) &&
> > > +		    kuid_has_mapping(fc->user_ns, cred->uid) &&
> > > +		    kgid_has_mapping(fc->user_ns, cred->egid) &&
> > > +		    kgid_has_mapping(fc->user_ns, cred->sgid) &&
> > > +		    kgid_has_mapping(fc->user_ns, cred->gid))
> > 
> > Should fsuid be checked here?
> 
> The point of restricting access here is to prevent a DoS type of attack
> on a more privileged context by making a filesystem operation block
> indefinitely. Coming from that perspective I was thinking that these
> checks ought to be sufficient, but I could be wrong.

I supose it would, as euid 0 would already not be mapped.

> > > +			return 1;
> > > +
> > > +		return 0;
> > > +	}
> > >  
> > > -	cred = current_cred();
> > >  	if (uid_eq(cred->euid, fc->user_id) &&
> > >  	    uid_eq(cred->suid, fc->user_id) &&
> > >  	    uid_eq(cred->uid,  fc->user_id) &&
> > > @@ -1556,17 +1564,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
> > >  	return true;
> > >  }
> > >  
> > > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
> > > -			   bool trust_local_cmtime)
> > > +static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
> > > +			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
> > >  {
> > >  	unsigned ivalid = iattr->ia_valid;
> > >  
> > >  	if (ivalid & ATTR_MODE)
> > >  		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
> > > -	if (ivalid & ATTR_UID)
> > > -		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
> > > -	if (ivalid & ATTR_GID)
> > > -		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
> > > +	if (ivalid & ATTR_UID) {
> > > +		arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
> > > +		if (arg->uid == (uid_t)-1)
> > 
> > Any reason not to use uid_valid() here (and gid_valid() below)?
> 
> Yes. arg->uid is a uid_t and not a kuid_t, so it wouldn't be valid to
> pass that to uid_valid(). And from_kuid() can return -1 for values other
> than INVALID_UID.

D'oh.  Right.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
                   ` (2 preceding siblings ...)
  2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee
@ 2014-09-05 20:40 ` Seth Forshee
  2014-09-10 12:35 ` Seth Forshee
  2014-09-23 16:07 ` Miklos Szeredi
  5 siblings, 0 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-05 20:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel, seth.forshee

On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> Here's an updated set of patches for allowing fuse mounts from pid and
> user namespaces. I discussed some of the issues we debated with the last
> patch set (and a few others) with Eric at LinuxCon, and the updates here
> mainly reflect the outcome of those discussions.
> 
> The stickiest issue in the v1 patches was the question of where to get
> the user and pid namespaces from that are used for translating ids for
> communication with userspace. Eric told me that for user namespaces at
> least we need to grab a namespace at open or mount time and use only
> that namespace to prevent certain types of attacks. That rules out the
> suggestion of using the user ns of current in the read/write paths, and
> I think it makes sense to handle pid and user namespaces similarly. So
> in these patches I'm still grabbing the namespaces of current during
> mount, but I've added an additional check to fail the mount if the
> f_cred's userns for the fd to userspace doesn't match.
> 
> Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> from userspace don't map into the user namespace, which is going to be a
> problem for any other filesystems which become mountable from user
> namespaces as well. We discussed a few options for addressing this, the
> most promising of which seems to be either using INVALID_[UG]ID for
> these inodes or creating vfs-wide "nobody" ids for this purpose. After
> thinking about it for a while I'm favoring using the invalid ids, but
> I'm hoping to solicit some more feedback.
> 
> For now these patches are using invalid ids if the user doesn't map into
> the namespace. I went through the vfs code and found one place where
> this could be handled better (addressed in patch 1 of the series). The
> only other issue I found was that currently no one, not even root, can
> change onwership of such inodes, but I suspect we can find a way around
> this.
> 
> The only other change since v1 is that I now fail changing file
> ownership if the new uid or gid does not map into the namespace used for
> userspace communication.

I forgot that I did change one other thing. In v1 I didn't allow fuseblk
mounts from user namespaces since I hadn't gotten around to testing or
looking at the differences between it and normal fuse mounts yet. I've
found time to do so since then and everything seems to be in good order,
so I've enabled mounting fuseblk in user namespaces as well in the v2
patches.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
                   ` (3 preceding siblings ...)
  2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
@ 2014-09-10 12:35 ` Seth Forshee
  2014-09-10 16:21   ` Serge E. Hallyn
  2014-09-23 16:07 ` Miklos Szeredi
  5 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-10 12:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel

On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> from userspace don't map into the user namespace, which is going to be a
> problem for any other filesystems which become mountable from user
> namespaces as well. We discussed a few options for addressing this, the
> most promising of which seems to be either using INVALID_[UG]ID for
> these inodes or creating vfs-wide "nobody" ids for this purpose. After
> thinking about it for a while I'm favoring using the invalid ids, but
> I'm hoping to solicit some more feedback.
> 
> For now these patches are using invalid ids if the user doesn't map into
> the namespace. I went through the vfs code and found one place where
> this could be handled better (addressed in patch 1 of the series). The
> only other issue I found was that currently no one, not even root, can
> change onwership of such inodes, but I suspect we can find a way around
> this.

I started playing around with using -2 as a global nobody id. The
changes below (made on top of this series) are working fine in light
testing. I'm still not sure about the security implications of doing
this though. Possibly the nobody id should be removed from init_user_ns
to prevent any use of the id to gain unauthroized access to such files.
Thoughts?


diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index c0b9968db6a1..893fc0d6ff96 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(fc->user_ns, attr->uid);
-	stat->gid = make_kgid(fc->user_ns, attr->gid);
+	stat->uid = make_kuid_munged(fc->user_ns, attr->uid);
+	stat->gid = make_kgid_munged(fc->user_ns, attr->gid);
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index f3a3ded82f85..330ac3d502a6 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	set_nlink(inode, attr->nlink);
-	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
-	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
+	inode->i_uid     = make_kuid_munged(fc->user_ns, attr->uid);
+	inode->i_gid     = make_kgid_munged(fc->user_ns, attr->gid);
 	inode->i_blocks  = attr->blocks;
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 6c302267f7cc..9054273af163 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid)
 #define INVALID_UID KUIDT_INIT(-1)
 #define INVALID_GID KGIDT_INIT(-1)
 
+#define NOBODY_UID KUIDT_INIT(-2)
+#define NOBODY_GID KGIDT_INIT(-2)
+
 static inline bool uid_eq(kuid_t left, kuid_t right)
 {
 	return __kuid_val(left) == __kuid_val(right);
@@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
 
 #endif /* CONFIG_USER_NS */
 
+/**
+ *	make_kuid_munged - Map a user-namespace uid pair into a kuid
+ * 	@from: User namespace that the uid is in
+ * 	@uid:  User identifier
+ *
+ *	Maps a user-namespace uid pair into a kernel internal kuid,
+ *	and returns that kuid.
+ *
+ *	Unlike make_kuid, make_kuid_munged never fails and always
+ *	returns a valid uid. If @uid has no mapping in @from
+ *	NOBODY_UID is returned.
+ */
+static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid)
+{
+	kuid_t kuid = make_kuid(from, uid);
+	if (!uid_valid(kuid))
+		kuid = NOBODY_UID;
+	return kuid;
+}
+
+/**
+ *	make_kgid_munged - Map a user-namespace gid pair into a kgid
+ * 	@from: User namespace that the gid is in
+ * 	@gid:  User identifier
+ *
+ *	Maps a user-namespace gid pair into a kernel internal kgid,
+ *	and returns that kgid.
+ *
+ *	Unlike make_kgid, make_kgid_munged never fails and always
+ *	returns a valid gid. If @gid has no mapping in @from
+ *	NOBODY_GID is returned.
+ */
+static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid)
+{
+	kgid_t kgid = make_kgid(from, gid);
+	if (!gid_valid(kgid))
+		kgid = NOBODY_GID;
+	return kgid;
+}
+
 #endif /* _LINUX_UIDGID_H */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-10 12:35 ` Seth Forshee
@ 2014-09-10 16:21   ` Serge E. Hallyn
  2014-09-10 16:42     ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Serge E. Hallyn @ 2014-09-10 16:21 UTC (permalink / raw)
  To: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn,
	fuse-devel, linux-kernel, linux-fsdevel

Quoting Seth Forshee (seth.forshee@canonical.com):
> On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> > Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> > from userspace don't map into the user namespace, which is going to be a
> > problem for any other filesystems which become mountable from user
> > namespaces as well. We discussed a few options for addressing this, the
> > most promising of which seems to be either using INVALID_[UG]ID for
> > these inodes or creating vfs-wide "nobody" ids for this purpose. After
> > thinking about it for a while I'm favoring using the invalid ids, but
> > I'm hoping to solicit some more feedback.
> > 
> > For now these patches are using invalid ids if the user doesn't map into
> > the namespace. I went through the vfs code and found one place where
> > this could be handled better (addressed in patch 1 of the series). The
> > only other issue I found was that currently no one, not even root, can
> > change onwership of such inodes, but I suspect we can find a way around
> > this.
> 
> I started playing around with using -2 as a global nobody id. The
> changes below (made on top of this series) are working fine in light
> testing. I'm still not sure about the security implications of doing
> this though. Possibly the nobody id should be removed from init_user_ns
> to prevent any use of the id to gain unauthroized access to such files.
> Thoughts?

Can you explain the downsides of just using -1?  What are we able to do
(as a fuse-in-container user) when we use -2 that we can't do when it
uses -1?

> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index c0b9968db6a1..893fc0d6ff96 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>  	stat->ino = attr->ino;
>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	stat->nlink = attr->nlink;
> -	stat->uid = make_kuid(fc->user_ns, attr->uid);
> -	stat->gid = make_kgid(fc->user_ns, attr->gid);
> +	stat->uid = make_kuid_munged(fc->user_ns, attr->uid);
> +	stat->gid = make_kgid_munged(fc->user_ns, attr->gid);
>  	stat->rdev = inode->i_rdev;
>  	stat->atime.tv_sec = attr->atime;
>  	stat->atime.tv_nsec = attr->atimensec;
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f3a3ded82f85..330ac3d502a6 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>  	inode->i_ino     = fuse_squash_ino(attr->ino);
>  	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	set_nlink(inode, attr->nlink);
> -	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
> -	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
> +	inode->i_uid     = make_kuid_munged(fc->user_ns, attr->uid);
> +	inode->i_gid     = make_kgid_munged(fc->user_ns, attr->gid);
>  	inode->i_blocks  = attr->blocks;
>  	inode->i_atime.tv_sec   = attr->atime;
>  	inode->i_atime.tv_nsec  = attr->atimensec;
> diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
> index 6c302267f7cc..9054273af163 100644
> --- a/include/linux/uidgid.h
> +++ b/include/linux/uidgid.h
> @@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid)
>  #define INVALID_UID KUIDT_INIT(-1)
>  #define INVALID_GID KGIDT_INIT(-1)
>  
> +#define NOBODY_UID KUIDT_INIT(-2)
> +#define NOBODY_GID KGIDT_INIT(-2)
> +
>  static inline bool uid_eq(kuid_t left, kuid_t right)
>  {
>  	return __kuid_val(left) == __kuid_val(right);
> @@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
>  
>  #endif /* CONFIG_USER_NS */
>  
> +/**
> + *	make_kuid_munged - Map a user-namespace uid pair into a kuid
> + * 	@from: User namespace that the uid is in
> + * 	@uid:  User identifier
> + *
> + *	Maps a user-namespace uid pair into a kernel internal kuid,
> + *	and returns that kuid.
> + *
> + *	Unlike make_kuid, make_kuid_munged never fails and always
> + *	returns a valid uid. If @uid has no mapping in @from
> + *	NOBODY_UID is returned.
> + */
> +static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid)
> +{
> +	kuid_t kuid = make_kuid(from, uid);
> +	if (!uid_valid(kuid))
> +		kuid = NOBODY_UID;
> +	return kuid;
> +}
> +
> +/**
> + *	make_kgid_munged - Map a user-namespace gid pair into a kgid
> + * 	@from: User namespace that the gid is in
> + * 	@gid:  User identifier
> + *
> + *	Maps a user-namespace gid pair into a kernel internal kgid,
> + *	and returns that kgid.
> + *
> + *	Unlike make_kgid, make_kgid_munged never fails and always
> + *	returns a valid gid. If @gid has no mapping in @from
> + *	NOBODY_GID is returned.
> + */
> +static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid)
> +{
> +	kgid_t kgid = make_kgid(from, gid);
> +	if (!gid_valid(kgid))
> +		kgid = NOBODY_GID;
> +	return kgid;
> +}
> +
>  #endif /* _LINUX_UIDGID_H */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-10 16:21   ` Serge E. Hallyn
@ 2014-09-10 16:42     ` Seth Forshee
  2014-09-11 18:10       ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-10 16:42 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn,
	fuse-devel, linux-kernel, linux-fsdevel

On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote:
> Quoting Seth Forshee (seth.forshee@canonical.com):
> > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> > > from userspace don't map into the user namespace, which is going to be a
> > > problem for any other filesystems which become mountable from user
> > > namespaces as well. We discussed a few options for addressing this, the
> > > most promising of which seems to be either using INVALID_[UG]ID for
> > > these inodes or creating vfs-wide "nobody" ids for this purpose. After
> > > thinking about it for a while I'm favoring using the invalid ids, but
> > > I'm hoping to solicit some more feedback.
> > > 
> > > For now these patches are using invalid ids if the user doesn't map into
> > > the namespace. I went through the vfs code and found one place where
> > > this could be handled better (addressed in patch 1 of the series). The
> > > only other issue I found was that currently no one, not even root, can
> > > change onwership of such inodes, but I suspect we can find a way around
> > > this.
> > 
> > I started playing around with using -2 as a global nobody id. The
> > changes below (made on top of this series) are working fine in light
> > testing. I'm still not sure about the security implications of doing
> > this though. Possibly the nobody id should be removed from init_user_ns
> > to prevent any use of the id to gain unauthroized access to such files.
> > Thoughts?
> 
> Can you explain the downsides of just using -1?  What are we able to do
> (as a fuse-in-container user) when we use -2 that we can't do when it
> uses -1?

The thing that happens with -1 is that checks like
capable_wrt_inode_uidgid() always fail on those inodes because
INVALID_UID isn't ever mapped into any namespace, even if you're
system-wide root. If we use a real id this check would at least pass in
init_user_ns, assuming that we don't have to remove -2 from init_user_ns
for security reasons.

Or we could modify these checks so that CAP_SYS_ADMIN always gets
permission or something like that, which might be the better way to go.

> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index c0b9968db6a1..893fc0d6ff96 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -905,8 +905,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >  	stat->ino = attr->ino;
> >  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >  	stat->nlink = attr->nlink;
> > -	stat->uid = make_kuid(fc->user_ns, attr->uid);
> > -	stat->gid = make_kgid(fc->user_ns, attr->gid);
> > +	stat->uid = make_kuid_munged(fc->user_ns, attr->uid);
> > +	stat->gid = make_kgid_munged(fc->user_ns, attr->gid);
> >  	stat->rdev = inode->i_rdev;
> >  	stat->atime.tv_sec = attr->atime;
> >  	stat->atime.tv_nsec = attr->atimensec;
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index f3a3ded82f85..330ac3d502a6 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
> >  	inode->i_ino     = fuse_squash_ino(attr->ino);
> >  	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >  	set_nlink(inode, attr->nlink);
> > -	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
> > -	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
> > +	inode->i_uid     = make_kuid_munged(fc->user_ns, attr->uid);
> > +	inode->i_gid     = make_kgid_munged(fc->user_ns, attr->gid);
> >  	inode->i_blocks  = attr->blocks;
> >  	inode->i_atime.tv_sec   = attr->atime;
> >  	inode->i_atime.tv_nsec  = attr->atimensec;
> > diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
> > index 6c302267f7cc..9054273af163 100644
> > --- a/include/linux/uidgid.h
> > +++ b/include/linux/uidgid.h
> > @@ -45,6 +45,9 @@ static inline gid_t __kgid_val(kgid_t gid)
> >  #define INVALID_UID KUIDT_INIT(-1)
> >  #define INVALID_GID KGIDT_INIT(-1)
> >  
> > +#define NOBODY_UID KUIDT_INIT(-2)
> > +#define NOBODY_GID KGIDT_INIT(-2)
> > +
> >  static inline bool uid_eq(kuid_t left, kuid_t right)
> >  {
> >  	return __kuid_val(left) == __kuid_val(right);
> > @@ -175,4 +178,44 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
> >  
> >  #endif /* CONFIG_USER_NS */
> >  
> > +/**
> > + *	make_kuid_munged - Map a user-namespace uid pair into a kuid
> > + * 	@from: User namespace that the uid is in
> > + * 	@uid:  User identifier
> > + *
> > + *	Maps a user-namespace uid pair into a kernel internal kuid,
> > + *	and returns that kuid.
> > + *
> > + *	Unlike make_kuid, make_kuid_munged never fails and always
> > + *	returns a valid uid. If @uid has no mapping in @from
> > + *	NOBODY_UID is returned.
> > + */
> > +static inline kuid_t make_kuid_munged(struct user_namespace *from, uid_t uid)
> > +{
> > +	kuid_t kuid = make_kuid(from, uid);
> > +	if (!uid_valid(kuid))
> > +		kuid = NOBODY_UID;
> > +	return kuid;
> > +}
> > +
> > +/**
> > + *	make_kgid_munged - Map a user-namespace gid pair into a kgid
> > + * 	@from: User namespace that the gid is in
> > + * 	@gid:  User identifier
> > + *
> > + *	Maps a user-namespace gid pair into a kernel internal kgid,
> > + *	and returns that kgid.
> > + *
> > + *	Unlike make_kgid, make_kgid_munged never fails and always
> > + *	returns a valid gid. If @gid has no mapping in @from
> > + *	NOBODY_GID is returned.
> > + */
> > +static inline kgid_t make_kgid_munged(struct user_namespace *from, gid_t gid)
> > +{
> > +	kgid_t kgid = make_kgid(from, gid);
> > +	if (!gid_valid(kgid))
> > +		kgid = NOBODY_GID;
> > +	return kgid;
> > +}
> > +
> >  #endif /* _LINUX_UIDGID_H */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-10 16:42     ` Seth Forshee
@ 2014-09-11 18:10       ` Seth Forshee
  2014-09-23 22:29         ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-11 18:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Eric W. Biederman, Serge Hallyn,
	fuse-devel, linux-kernel, linux-fsdevel

On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote:
> On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote:
> > Quoting Seth Forshee (seth.forshee@canonical.com):
> > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> > > > from userspace don't map into the user namespace, which is going to be a
> > > > problem for any other filesystems which become mountable from user
> > > > namespaces as well. We discussed a few options for addressing this, the
> > > > most promising of which seems to be either using INVALID_[UG]ID for
> > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After
> > > > thinking about it for a while I'm favoring using the invalid ids, but
> > > > I'm hoping to solicit some more feedback.
> > > > 
> > > > For now these patches are using invalid ids if the user doesn't map into
> > > > the namespace. I went through the vfs code and found one place where
> > > > this could be handled better (addressed in patch 1 of the series). The
> > > > only other issue I found was that currently no one, not even root, can
> > > > change onwership of such inodes, but I suspect we can find a way around
> > > > this.
> > > 
> > > I started playing around with using -2 as a global nobody id. The
> > > changes below (made on top of this series) are working fine in light
> > > testing. I'm still not sure about the security implications of doing
> > > this though. Possibly the nobody id should be removed from init_user_ns
> > > to prevent any use of the id to gain unauthroized access to such files.
> > > Thoughts?
> > 
> > Can you explain the downsides of just using -1?  What are we able to do
> > (as a fuse-in-container user) when we use -2 that we can't do when it
> > uses -1?
> 
> The thing that happens with -1 is that checks like
> capable_wrt_inode_uidgid() always fail on those inodes because
> INVALID_UID isn't ever mapped into any namespace, even if you're
> system-wide root. If we use a real id this check would at least pass in
> init_user_ns, assuming that we don't have to remove -2 from init_user_ns
> for security reasons.
> 
> Or we could modify these checks so that CAP_SYS_ADMIN always gets
> permission or something like that, which might be the better way to go.

This ought to do (untested as of yet). I think I like this best, but I'm
also interested in hearing what Eric has to say.


diff --git a/fs/inode.c b/fs/inode.c
index 26753ba7b6d6..1029320ff029 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1840,6 +1840,9 @@ bool inode_owner_or_capable(const struct inode *inode)
 {
        struct user_namespace *ns;
 
+       if (capable(CAP_SYS_ADMIN))
+               return true;
+
        if (uid_eq(current_fsuid(), inode->i_uid))
                return true;
 
diff --git a/kernel/capability.c b/kernel/capability.c
index 989f5bfc57dc..a472eaa52b6a 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -438,8 +438,12 @@ EXPORT_SYMBOL(capable);
  */
 bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
 {
-       struct user_namespace *ns = current_user_ns();
+       struct user_namespace *ns;
 
+       if (capable(CAP_SYS_ADMIN))
+               return true;
+
+       ns = current_user_ns();
        return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
                kgid_has_mapping(ns, inode->i_gid);
 }


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-11 18:10       ` Seth Forshee
@ 2014-09-23 22:29         ` Eric W. Biederman
  2014-09-24 13:29           ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-23 22:29 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel

Seth Forshee <seth.forshee@canonical.com> writes:

> On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote:
>> On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote:
>> > Quoting Seth Forshee (seth.forshee@canonical.com):
>> > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
>> > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids
>> > > > from userspace don't map into the user namespace, which is going to be a
>> > > > problem for any other filesystems which become mountable from user
>> > > > namespaces as well. We discussed a few options for addressing this, the
>> > > > most promising of which seems to be either using INVALID_[UG]ID for
>> > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After
>> > > > thinking about it for a while I'm favoring using the invalid ids, but
>> > > > I'm hoping to solicit some more feedback.
>> > > > 
>> > > > For now these patches are using invalid ids if the user doesn't map into
>> > > > the namespace. I went through the vfs code and found one place where
>> > > > this could be handled better (addressed in patch 1 of the series). The
>> > > > only other issue I found was that currently no one, not even root, can
>> > > > change onwership of such inodes, but I suspect we can find a way around
>> > > > this.
>> > > 
>> > > I started playing around with using -2 as a global nobody id. The
>> > > changes below (made on top of this series) are working fine in light
>> > > testing. I'm still not sure about the security implications of doing
>> > > this though. Possibly the nobody id should be removed from init_user_ns
>> > > to prevent any use of the id to gain unauthroized access to such files.
>> > > Thoughts?
>> > 
>> > Can you explain the downsides of just using -1?  What are we able to do
>> > (as a fuse-in-container user) when we use -2 that we can't do when it
>> > uses -1?
>> 
>> The thing that happens with -1 is that checks like
>> capable_wrt_inode_uidgid() always fail on those inodes because
>> INVALID_UID isn't ever mapped into any namespace, even if you're
>> system-wide root. If we use a real id this check would at least pass in
>> init_user_ns, assuming that we don't have to remove -2 from init_user_ns
>> for security reasons.
>> 
>> Or we could modify these checks so that CAP_SYS_ADMIN always gets
>> permission or something like that, which might be the better way to go.
>
> This ought to do (untested as of yet). I think I like this best, but I'm
> also interested in hearing what Eric has to say.

So thinking about this and staring at fuse a little more I don't like
the approach of mapping bad uids into INVALID_UID in the case of fuse.

What scares me is that we are looking at a very uncommon case that no
one will test much.  And depending for security on a very subtle
interpretation of semantics that only matter when someone is attacking
us seems scary to me.

For fuse at least I would assume that any time a file shows up with a
uid that doesn't map, and we map it to INVALID_UID I would assume it is
the work of a hostile user.  It could be a misconfiguration but it is a
broken action on the part of the filesystem in either case.

Therefore given that a bad uid can only occur as a result of accidental
or hostile action can we please call make_bad_inode in those cases?  And
thus always return -EIO.

That way no one needs to consider what happens if we cache bad data or
we try to use bad data, and it is an easy position to retreat from
if it happens that we need to do something different.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-23 22:29         ` Eric W. Biederman
@ 2014-09-24 13:29           ` Seth Forshee
  2014-09-24 17:10             ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-24 13:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Miklos Szeredi, Alexander Viro, Serge Hallyn,
	fuse-devel, linux-kernel, linux-fsdevel

On Tue, Sep 23, 2014 at 03:29:57PM -0700, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Wed, Sep 10, 2014 at 11:42:12AM -0500, Seth Forshee wrote:
> >> On Wed, Sep 10, 2014 at 06:21:55PM +0200, Serge E. Hallyn wrote:
> >> > Quoting Seth Forshee (seth.forshee@canonical.com):
> >> > > On Tue, Sep 02, 2014 at 10:44:53AM -0500, Seth Forshee wrote:
> >> > > > Another issue mentioned by Eric was what to use for i_[ug]id if the ids
> >> > > > from userspace don't map into the user namespace, which is going to be a
> >> > > > problem for any other filesystems which become mountable from user
> >> > > > namespaces as well. We discussed a few options for addressing this, the
> >> > > > most promising of which seems to be either using INVALID_[UG]ID for
> >> > > > these inodes or creating vfs-wide "nobody" ids for this purpose. After
> >> > > > thinking about it for a while I'm favoring using the invalid ids, but
> >> > > > I'm hoping to solicit some more feedback.
> >> > > > 
> >> > > > For now these patches are using invalid ids if the user doesn't map into
> >> > > > the namespace. I went through the vfs code and found one place where
> >> > > > this could be handled better (addressed in patch 1 of the series). The
> >> > > > only other issue I found was that currently no one, not even root, can
> >> > > > change onwership of such inodes, but I suspect we can find a way around
> >> > > > this.
> >> > > 
> >> > > I started playing around with using -2 as a global nobody id. The
> >> > > changes below (made on top of this series) are working fine in light
> >> > > testing. I'm still not sure about the security implications of doing
> >> > > this though. Possibly the nobody id should be removed from init_user_ns
> >> > > to prevent any use of the id to gain unauthroized access to such files.
> >> > > Thoughts?
> >> > 
> >> > Can you explain the downsides of just using -1?  What are we able to do
> >> > (as a fuse-in-container user) when we use -2 that we can't do when it
> >> > uses -1?
> >> 
> >> The thing that happens with -1 is that checks like
> >> capable_wrt_inode_uidgid() always fail on those inodes because
> >> INVALID_UID isn't ever mapped into any namespace, even if you're
> >> system-wide root. If we use a real id this check would at least pass in
> >> init_user_ns, assuming that we don't have to remove -2 from init_user_ns
> >> for security reasons.
> >> 
> >> Or we could modify these checks so that CAP_SYS_ADMIN always gets
> >> permission or something like that, which might be the better way to go.
> >
> > This ought to do (untested as of yet). I think I like this best, but I'm
> > also interested in hearing what Eric has to say.
> 
> So thinking about this and staring at fuse a little more I don't like
> the approach of mapping bad uids into INVALID_UID in the case of fuse.
> 
> What scares me is that we are looking at a very uncommon case that no
> one will test much.  And depending for security on a very subtle
> interpretation of semantics that only matter when someone is attacking
> us seems scary to me.
> 
> For fuse at least I would assume that any time a file shows up with a
> uid that doesn't map, and we map it to INVALID_UID I would assume it is
> the work of a hostile user.  It could be a misconfiguration but it is a
> broken action on the part of the filesystem in either case.
> 
> Therefore given that a bad uid can only occur as a result of accidental
> or hostile action can we please call make_bad_inode in those cases?  And
> thus always return -EIO.
> 
> That way no one needs to consider what happens if we cache bad data or
> we try to use bad data, and it is an easy position to retreat from
> if it happens that we need to do something different.

One thing I don't understand is why you're qualifying these statements
to be limited to fuse. Normal filesystems will have to deal with the
same problem if/when they are made mountable from user namespaces; is
this concern not valid there? Or would you advocate the same behavior
for normal filesystems as well? Otherwise it seems like we're just
putting off dealing with it for a while.

I'd prefer for the mounts to be as useful as possible when the fs
contains ids that don't map into the namespace, but it's also difficult
to argue against taking the more restricted approach at first and
loosening up if needed. So I guess I'm okay with returning -EIO to start
out.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-24 13:29           ` Seth Forshee
@ 2014-09-24 17:10             ` Eric W. Biederman
  2014-09-25 15:04               ` Miklos Szeredi
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-24 17:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel,
	linux-kernel, linux-fsdevel

Seth Forshee <seth.forshee@canonical.com> writes:

> On Tue, Sep 23, 2014 at 03:29:57PM -0700, Eric W. Biederman wrote:
>> 
>> So thinking about this and staring at fuse a little more I don't like
>> the approach of mapping bad uids into INVALID_UID in the case of fuse.
>> 
>> What scares me is that we are looking at a very uncommon case that no
>> one will test much.  And depending for security on a very subtle
>> interpretation of semantics that only matter when someone is attacking
>> us seems scary to me.
>> 
>> For fuse at least I would assume that any time a file shows up with a
>> uid that doesn't map, and we map it to INVALID_UID I would assume it is
>> the work of a hostile user.  It could be a misconfiguration but it is a
>> broken action on the part of the filesystem in either case.
>> 
>> Therefore given that a bad uid can only occur as a result of accidental
>> or hostile action can we please call make_bad_inode in those cases?  And
>> thus always return -EIO.
>> 
>> That way no one needs to consider what happens if we cache bad data or
>> we try to use bad data, and it is an easy position to retreat from
>> if it happens that we need to do something different.
>
> One thing I don't understand is why you're qualifying these statements
> to be limited to fuse. Normal filesystems will have to deal with the
> same problem if/when they are made mountable from user namespaces; is
> this concern not valid there? Or would you advocate the same behavior
> for normal filesystems as well? Otherwise it seems like we're just
> putting off dealing with it for a while.

I am qualifying them with fuse so that my thinking is concrete.  I
believe my thinking generalizes to all filesystems, but I there may be
cases I have not considered that are different.

> I'd prefer for the mounts to be as useful as possible when the fs
> contains ids that don't map into the namespace, but it's also difficult
> to argue against taking the more restricted approach at first and
> loosening up if needed. So I guess I'm okay with returning -EIO to start
> out.

What bugs me the most about making the mounts as useful as possible is
that it makes an assumption about why the uids are corrupt.

I look at the situtation and I make a different assumption that the most
likely reasons for the uids to be corrupt is someone being malicious.
Being as useful as possible when someone is being malicious feels like
a recipe for the malicious user to exploit your generousity.

So in summary I see:
- Low utility in being able to manipulate files with bad uids.
- Bad uids are mostly likely malicious action.
- make_bad_inode is trivial to analyze.
- No impediments to change if I am wrong.

So unless there is a compelling case, right now I would recommend
returning -EIO initially.   That allows us to concentrate on the easier
parts of this and it leaves the changes only in fuse.

I have no problems revisiting this later and in fact I am half persuaded
we should, but simultaneously dealing working on both fuse and the
generic problem of how to handle files with bad uids and gids seems
too much to analyze and digest all at once.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-24 17:10             ` Eric W. Biederman
@ 2014-09-25 15:04               ` Miklos Szeredi
  2014-09-25 16:21                 ` Seth Forshee
  2014-09-25 18:05                 ` Eric W. Biederman
  0 siblings, 2 replies; 39+ messages in thread
From: Miklos Szeredi @ 2014-09-25 15:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel

On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:


> So in summary I see:
> - Low utility in being able to manipulate files with bad uids.
> - Bad uids are mostly likely malicious action.
> - make_bad_inode is trivial to analyze.
> - No impediments to change if I am wrong.
>
> So unless there is a compelling case, right now I would recommend
> returning -EIO initially.   That allows us to concentrate on the easier
> parts of this and it leaves the changes only in fuse.

The problem with marking the inode bad is that it will mark it bad for
all instances of this filesystem.  Including ones which are in a
namespace where the UIDs make perfect sense.

So that really doesn't look like a good solution.

Doing the check in inode_permission() might be too heavyweight, but
it's still the only one that looks sane.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 15:04               ` Miklos Szeredi
@ 2014-09-25 16:21                 ` Seth Forshee
  2014-09-25 18:05                 ` Eric W. Biederman
  1 sibling, 0 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-25 16:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Serge E. Hallyn, Alexander Viro, Serge Hallyn,
	fuse-devel, Kernel Mailing List, Linux-Fsdevel

On Thu, Sep 25, 2014 at 05:04:04PM +0200, Miklos Szeredi wrote:
> On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> 
> 
> > So in summary I see:
> > - Low utility in being able to manipulate files with bad uids.
> > - Bad uids are mostly likely malicious action.
> > - make_bad_inode is trivial to analyze.
> > - No impediments to change if I am wrong.
> >
> > So unless there is a compelling case, right now I would recommend
> > returning -EIO initially.   That allows us to concentrate on the easier
> > parts of this and it leaves the changes only in fuse.
> 
> The problem with marking the inode bad is that it will mark it bad for
> all instances of this filesystem.  Including ones which are in a
> namespace where the UIDs make perfect sense.

As far as I can tell there can only be one interpretation of the uid
userspace hands us for an inode per superblock. If userspace hands us
1000 it can't map to kuid 101000 in one context and 1000 in another
context because there's only one inode. If the uid doesn't map into
whatever namespace we're using to interpret it there's no value for
i_uid which will cause it to be re-evaluated in a different context.

I don't think it would be a good idea to do that anyway. /proc is
mounted in my unprivileged container (i.e. uid 0 in the container maps
to some unprivileged uid outside the container). Interpreting uid 0 in
/proc relative to my container's namespace would give root in the
container access to things that it shouldn't have access to.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 15:04               ` Miklos Szeredi
  2014-09-25 16:21                 ` Seth Forshee
@ 2014-09-25 18:05                 ` Eric W. Biederman
  2014-09-25 18:44                   ` Seth Forshee
  1 sibling, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-25 18:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>
>
>> So in summary I see:
>> - Low utility in being able to manipulate files with bad uids.
>> - Bad uids are mostly likely malicious action.
>> - make_bad_inode is trivial to analyze.
>> - No impediments to change if I am wrong.
>>
>> So unless there is a compelling case, right now I would recommend
>> returning -EIO initially.   That allows us to concentrate on the easier
>> parts of this and it leaves the changes only in fuse.
>
> The problem with marking the inode bad is that it will mark it bad for
> all instances of this filesystem.  Including ones which are in a
> namespace where the UIDs make perfect sense.

There are two cases:
app <-> fuse
fuse <-> server

I proposed mark_bad_inode for "userspace server -> fuse".
Where we have one superblock and one server so and one namespace that
they decide to talk in when the filesystem was mounted.

I think bad_inode is a reasonable response when the filesystem server
starts spewing non-sense.

> So that really doesn't look like a good solution.
>
> Doing the check in inode_permission() might be too heavyweight, but
> it's still the only one that looks sane.

For the "app <-> fuse" case we already have checks in inode_permision
that are kuid based that handle that case.  We use kuids not for
performance (although there is a small advatnage) but to much more to
keep the logic simple and maintainable.

For the "app -> fuse" case in .setattr we do need a check to verify
that the uid and gid are valid.  However that check was added with
the basic user namespace support and fuse current returns -EOVERFLOW
when that happens.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 18:05                 ` Eric W. Biederman
@ 2014-09-25 18:44                   ` Seth Forshee
  2014-09-25 18:53                     ` Seth Forshee
  2014-09-25 19:14                     ` Eric W. Biederman
  0 siblings, 2 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-25 18:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Serge E. Hallyn, Alexander Viro, Serge Hallyn,
	fuse-devel, Kernel Mailing List, Linux-Fsdevel

On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
> 
> > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >
> >
> >> So in summary I see:
> >> - Low utility in being able to manipulate files with bad uids.
> >> - Bad uids are mostly likely malicious action.
> >> - make_bad_inode is trivial to analyze.
> >> - No impediments to change if I am wrong.
> >>
> >> So unless there is a compelling case, right now I would recommend
> >> returning -EIO initially.   That allows us to concentrate on the easier
> >> parts of this and it leaves the changes only in fuse.
> >
> > The problem with marking the inode bad is that it will mark it bad for
> > all instances of this filesystem.  Including ones which are in a
> > namespace where the UIDs make perfect sense.
> 
> There are two cases:
> app <-> fuse
> fuse <-> server
> 
> I proposed mark_bad_inode for "userspace server -> fuse".
> Where we have one superblock and one server so and one namespace that
> they decide to talk in when the filesystem was mounted.
> 
> I think bad_inode is a reasonable response when the filesystem server
> starts spewing non-sense.
> 
> > So that really doesn't look like a good solution.
> >
> > Doing the check in inode_permission() might be too heavyweight, but
> > it's still the only one that looks sane.
> 
> For the "app <-> fuse" case we already have checks in inode_permision
> that are kuid based that handle that case.  We use kuids not for
> performance (although there is a small advatnage) but to much more to
> keep the logic simple and maintainable.
> 
> 
> For the "app -> fuse" case in .setattr we do need a check to verify
> that the uid and gid are valid.  However that check was added with
> the basic user namespace support and fuse current returns -EOVERFLOW
> when that happens.

Where does this happen? I haven't managed to track it down yet.

I've also added a check in fuse for this. If a uid/gid passed to
fuse_setattr doesn't map into the namespace it will return -EINVAL.
Sounds like maybe it should return -EOVERFLOW instead.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 18:44                   ` Seth Forshee
@ 2014-09-25 18:53                     ` Seth Forshee
  2014-09-25 19:14                     ` Eric W. Biederman
  1 sibling, 0 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-25 18:53 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi, Serge E. Hallyn,
	Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel

On Thu, Sep 25, 2014 at 01:44:03PM -0500, Seth Forshee wrote:
> On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote:
> > Miklos Szeredi <miklos@szeredi.hu> writes:
> > 
> > > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
> > > <ebiederm@xmission.com> wrote:
> > >
> > >
> > >> So in summary I see:
> > >> - Low utility in being able to manipulate files with bad uids.
> > >> - Bad uids are mostly likely malicious action.
> > >> - make_bad_inode is trivial to analyze.
> > >> - No impediments to change if I am wrong.
> > >>
> > >> So unless there is a compelling case, right now I would recommend
> > >> returning -EIO initially.   That allows us to concentrate on the easier
> > >> parts of this and it leaves the changes only in fuse.
> > >
> > > The problem with marking the inode bad is that it will mark it bad for
> > > all instances of this filesystem.  Including ones which are in a
> > > namespace where the UIDs make perfect sense.
> > 
> > There are two cases:
> > app <-> fuse
> > fuse <-> server
> > 
> > I proposed mark_bad_inode for "userspace server -> fuse".
> > Where we have one superblock and one server so and one namespace that
> > they decide to talk in when the filesystem was mounted.
> > 
> > I think bad_inode is a reasonable response when the filesystem server
> > starts spewing non-sense.
> > 
> > > So that really doesn't look like a good solution.
> > >
> > > Doing the check in inode_permission() might be too heavyweight, but
> > > it's still the only one that looks sane.
> > 
> > For the "app <-> fuse" case we already have checks in inode_permision
> > that are kuid based that handle that case.  We use kuids not for
> > performance (although there is a small advatnage) but to much more to
> > keep the logic simple and maintainable.
> > 
> > 
> > For the "app -> fuse" case in .setattr we do need a check to verify
> > that the uid and gid are valid.  However that check was added with
> > the basic user namespace support and fuse current returns -EOVERFLOW
> > when that happens.
> 
> Where does this happen? I haven't managed to track it down yet.

I guess it must be the one in chown_common()? Except that returns
EINVAL, not EOVERFLOW.

> 
> I've also added a check in fuse for this. If a uid/gid passed to
> fuse_setattr doesn't map into the namespace it will return -EINVAL.
> Sounds like maybe it should return -EOVERFLOW instead.
> 
> Thanks,
> Seth
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 18:44                   ` Seth Forshee
  2014-09-25 18:53                     ` Seth Forshee
@ 2014-09-25 19:14                     ` Eric W. Biederman
  2014-09-25 19:48                       ` Seth Forshee
  1 sibling, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-25 19:14 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Miklos Szeredi

Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote:
>> Miklos Szeredi <miklos@szeredi.hu> writes:
>> 
>> > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
>> > <ebiederm@xmission.com> wrote:
>> >
>> >
>> >> So in summary I see:
>> >> - Low utility in being able to manipulate files with bad uids.
>> >> - Bad uids are mostly likely malicious action.
>> >> - make_bad_inode is trivial to analyze.
>> >> - No impediments to change if I am wrong.
>> >>
>> >> So unless there is a compelling case, right now I would recommend
>> >> returning -EIO initially.   That allows us to concentrate on the easier
>> >> parts of this and it leaves the changes only in fuse.
>> >
>> > The problem with marking the inode bad is that it will mark it bad for
>> > all instances of this filesystem.  Including ones which are in a
>> > namespace where the UIDs make perfect sense.
>> 
>> There are two cases:
>> app <-> fuse
>> fuse <-> server
>> 
>> I proposed mark_bad_inode for "userspace server -> fuse".
>> Where we have one superblock and one server so and one namespace that
>> they decide to talk in when the filesystem was mounted.
>> 
>> I think bad_inode is a reasonable response when the filesystem server
>> starts spewing non-sense.
>> 
>> > So that really doesn't look like a good solution.
>> >
>> > Doing the check in inode_permission() might be too heavyweight, but
>> > it's still the only one that looks sane.
>> 
>> For the "app <-> fuse" case we already have checks in inode_permision
>> that are kuid based that handle that case.  We use kuids not for
>> performance (although there is a small advatnage) but to much more to
>> keep the logic simple and maintainable.
>> 
>> 
>> For the "app -> fuse" case in .setattr we do need a check to verify
>> that the uid and gid are valid.  However that check was added with
>> the basic user namespace support and fuse current returns -EOVERFLOW
>> when that happens.
>
> Where does this happen? I haven't managed to track it down yet.

Sorry iattr_to_setattr look for from_kuid and from_kgid.

The call path is
fuse_setattr
   fuse_do_setattr
      iattr_to_fattr.

Keeping everything in kuids for as long as possible and only converting
at the edges tends to mean that I caught most of the conversion issues
with my basic support for user namespaces.

> I've also added a check in fuse for this. If a uid/gid passed to
> fuse_setattr doesn't map into the namespace it will return -EINVAL.
> Sounds like maybe it should return -EOVERFLOW instead.

I am not 100% about -EOVERLFLOW but it is the closest I could come up
with.

Frankly looking at what I have coded I am inconsistent.  In chown_common
I use -EINVAL, whereas in fuse I use -EOVERFLOW.  So it is probably
worth a second look, and probably a patch to a man page or two
just to document this weird case.

One document that has advised some of my decisions is the rational for
how 64bit file offset support was added on 32bit systems ages ago.
That is where I got -EOVERFLOW.  The logic for handling 64bit file
offsets was ultimately that any operation that could cause
corruption would return an error (typically EOVERFLOW).

Most of the vfs operations with uids are unlikely to cause corruption
and are more likely to be problematic if they don't work which is why
I tend to use overflow_uid/-1.

For when uid/gid values don't map into a kuid/kgid value. Some error
is definitely required.

I am on the fence about what to do when a uid from the filesystem server
or for other filesystems the on-disk data structures does not map, but
make_bad_inode is simpler in conception.  So make_bad_inode seems like
a good place to start.   For fuse especially this isn't hard because
the make_bad_inode calls are already there to handle a change in i_mode.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 19:14                     ` Eric W. Biederman
@ 2014-09-25 19:48                       ` Seth Forshee
  2014-09-27  1:41                         ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-25 19:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Miklos Szeredi

On Thu, Sep 25, 2014 at 12:14:01PM -0700, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Thu, Sep 25, 2014 at 11:05:36AM -0700, Eric W. Biederman wrote:
> >> Miklos Szeredi <miklos@szeredi.hu> writes:
> >> 
> >> > On Wed, Sep 24, 2014 at 7:10 PM, Eric W. Biederman
> >> > <ebiederm@xmission.com> wrote:
> >> >
> >> >
> >> >> So in summary I see:
> >> >> - Low utility in being able to manipulate files with bad uids.
> >> >> - Bad uids are mostly likely malicious action.
> >> >> - make_bad_inode is trivial to analyze.
> >> >> - No impediments to change if I am wrong.
> >> >>
> >> >> So unless there is a compelling case, right now I would recommend
> >> >> returning -EIO initially.   That allows us to concentrate on the easier
> >> >> parts of this and it leaves the changes only in fuse.
> >> >
> >> > The problem with marking the inode bad is that it will mark it bad for
> >> > all instances of this filesystem.  Including ones which are in a
> >> > namespace where the UIDs make perfect sense.
> >> 
> >> There are two cases:
> >> app <-> fuse
> >> fuse <-> server
> >> 
> >> I proposed mark_bad_inode for "userspace server -> fuse".
> >> Where we have one superblock and one server so and one namespace that
> >> they decide to talk in when the filesystem was mounted.
> >> 
> >> I think bad_inode is a reasonable response when the filesystem server
> >> starts spewing non-sense.
> >> 
> >> > So that really doesn't look like a good solution.
> >> >
> >> > Doing the check in inode_permission() might be too heavyweight, but
> >> > it's still the only one that looks sane.
> >> 
> >> For the "app <-> fuse" case we already have checks in inode_permision
> >> that are kuid based that handle that case.  We use kuids not for
> >> performance (although there is a small advatnage) but to much more to
> >> keep the logic simple and maintainable.
> >> 
> >> 
> >> For the "app -> fuse" case in .setattr we do need a check to verify
> >> that the uid and gid are valid.  However that check was added with
> >> the basic user namespace support and fuse current returns -EOVERFLOW
> >> when that happens.
> >
> > Where does this happen? I haven't managed to track it down yet.
> 
> Sorry iattr_to_setattr look for from_kuid and from_kgid.
> 
> The call path is
> fuse_setattr
>    fuse_do_setattr
>       iattr_to_fattr.

Bah. Sorry, I misread that originally and thought you were talking about
something outside of fuse. And I was looking at a tree with my fuse
changes, so of course I wouldn't have found it.

Actually in 3.17-rc6 I still don't see that iattr_to_fattr (I assume
this is what you meant) checks the results of the conversion (not that
it really needs to since it uses init_user_ns), nor any use of EOVERFLOW
in fuse. Anyway, it's not really important.

> Keeping everything in kuids for as long as possible and only converting
> at the edges tends to mean that I caught most of the conversion issues
> with my basic support for user namespaces.
> 
> > I've also added a check in fuse for this. If a uid/gid passed to
> > fuse_setattr doesn't map into the namespace it will return -EINVAL.
> > Sounds like maybe it should return -EOVERFLOW instead.
> 
> I am not 100% about -EOVERLFLOW but it is the closest I could come up
> with.
> 
> Frankly looking at what I have coded I am inconsistent.  In chown_common
> I use -EINVAL, whereas in fuse I use -EOVERFLOW.  So it is probably
> worth a second look, and probably a patch to a man page or two
> just to document this weird case.
> 
> One document that has advised some of my decisions is the rational for
> how 64bit file offset support was added on 32bit systems ages ago.
> That is where I got -EOVERFLOW.  The logic for handling 64bit file
> offsets was ultimately that any operation that could cause
> corruption would return an error (typically EOVERFLOW).
> 
> Most of the vfs operations with uids are unlikely to cause corruption
> and are more likely to be problematic if they don't work which is why
> I tend to use overflow_uid/-1.
> 
> For when uid/gid values don't map into a kuid/kgid value. Some error
> is definitely required.

Well, unless you say otherwise I guess I'll leave it -EINVAL to be
consistent with chown_common().

> I am on the fence about what to do when a uid from the filesystem server
> or for other filesystems the on-disk data structures does not map, but
> make_bad_inode is simpler in conception.  So make_bad_inode seems like
> a good place to start.   For fuse especially this isn't hard because
> the make_bad_inode calls are already there to handle a change in i_mode.

I agree that if we're unsure then make_bad_inode is a more logical place
to start, since it's easier to loosen the restrictions later than to
tighten them. I've got an initiail implementation that I'm in the
process of testing. If you want to take a look I've pushed it to:

  git://kernel.ubuntu.com/sforshee/linux.git fuse-userns 

Thanks,
Seth


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-25 19:48                       ` Seth Forshee
@ 2014-09-27  1:41                         ` Eric W. Biederman
  2014-09-27  4:24                           ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-27  1:41 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel, Miklos Szeredi, Serge E. Hallyn

[-- Attachment #1: Type: text/plain, Size: 2143 bytes --]

Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Sep 25, 2014 at 12:14:01PM -0700, Eric W. Biederman wrote:

>> Sorry iattr_to_setattr look for from_kuid and from_kgid.
>> 
>> The call path is
>> fuse_setattr
>>    fuse_do_setattr
>>       iattr_to_fattr.
>
> Bah. Sorry, I misread that originally and thought you were talking about
> something outside of fuse. And I was looking at a tree with my fuse
> changes, so of course I wouldn't have found it.
>
> Actually in 3.17-rc6 I still don't see that iattr_to_fattr (I assume
> this is what you meant) checks the results of the conversion (not that
> it really needs to since it uses init_user_ns), nor any use of EOVERFLOW
> in fuse. Anyway, it's not really important.

True.  I also goofed up in that I was looking at the wrong tree.  My
tree had all of my preliminary fuse patches I worked up a while ago
applied so I did have the error handling in there.

> Well, unless you say otherwise I guess I'll leave it -EINVAL to be
> consistent with chown_common().

That sounds like a good plan.

>> I am on the fence about what to do when a uid from the filesystem server
>> or for other filesystems the on-disk data structures does not map, but
>> make_bad_inode is simpler in conception.  So make_bad_inode seems like
>> a good place to start.   For fuse especially this isn't hard because
>> the make_bad_inode calls are already there to handle a change in i_mode.
>
> I agree that if we're unsure then make_bad_inode is a more logical place
> to start, since it's easier to loosen the restrictions later than to
> tighten them. I've got an initiail implementation that I'm in the
> process of testing. If you want to take a look I've pushed it to:
>
>   git://kernel.ubuntu.com/sforshee/linux.git fuse-userns 

Thanks.  If I can scrape together enough focus I will look at it.

As a second best thing here are my prototype from when I was looking at
performing this fuse conversion last.  Given that you had missed
checking the from_kuid permission checks, it might be worth comparing
and seeing if there is something else in these patches that would be
worth including.

Eric


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-userns-Allow-for-fuse-filesystems-outside-the-initia.patch --]
[-- Type: text/x-diff, Size: 7489 bytes --]

>From 94195ce5b06915846d14eaa35d9274c0315b46a0 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 4 Oct 2012 13:34:45 -0700
Subject: [PATCH 1/4] userns: Allow for fuse filesystems outside the initial user
 namespace

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/fuse/dev.c    | 10 +++++-----
 fs/fuse/dir.c    | 41 ++++++++++++++++++++++++++++++-----------
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 10 ++++++++--
 4 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ca887314aba9..e01f30c51b3c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -124,10 +124,10 @@ static void __fuse_put_request(struct fuse_req *req)
 	atomic_dec(&req->count);
 }
 
-static void fuse_req_init_context(struct fuse_req *req)
+static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
+	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
 	req->in.h.pid = current->pid;
 }
 
@@ -168,7 +168,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 		goto out;
 	}
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	req->waiting = 1;
 	req->background = for_background;
 	return req;
@@ -257,7 +257,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
 	if (!req)
 		req = get_reserved_req(fc, file);
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	req->waiting = 1;
 	req->background = 0;
 	return req;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index de1d84af9f7c..d74c75a057cd 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -886,7 +886,7 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
 	return err;
 }
 
-static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
+static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 			  struct kstat *stat)
 {
 	unsigned int blkbits;
@@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(&init_user_ns, attr->uid);
-	stat->gid = make_kgid(&init_user_ns, attr->gid);
+	stat->uid = make_kuid(fc->user_ns, attr->uid);
+	if (!uid_valid(stat->uid))
+		return -EOVERFLOW;
+	stat->gid = make_kgid(fc->user_ns, attr->gid);
+	if (!gid_valid(stat->gid))
+		return -EOVERFLOW;
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
@@ -923,6 +927,7 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 		blkbits = inode->i_sb->s_blocksize_bits;
 
 	stat->blksize = 1 << blkbits;
+	return 0;
 }
 
 static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
@@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
 					       attr_timeout(&outarg),
 					       attr_version);
 			if (stat)
-				fuse_fillattr(inode, &outarg.attr, stat);
+				err = fuse_fillattr(inode, &outarg.attr,
+						    stat);
 		}
 	}
 	return err;
@@ -1556,17 +1562,25 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
 	return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-			   bool trust_local_cmtime)
+static int iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
 	unsigned ivalid = iattr->ia_valid;
 
 	if (ivalid & ATTR_MODE)
 		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
-	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
-	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+	if (ivalid & ATTR_UID) {
+		arg->valid |= FATTR_UID;
+		arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
+		if (arg->uid == (uid_t)-1)
+			return -EOVERFLOW;
+	}
+	if (ivalid & ATTR_GID) {
+		arg->valid |= FATTR_GID;
+		arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
+		if (arg->gid == (gid_t)-1)
+			return -EOVERFLOW;
+	}
 	if (ivalid & ATTR_SIZE)
 		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
 	if (ivalid & ATTR_ATIME) {
@@ -1588,6 +1602,7 @@ static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
 		arg->ctime = iattr->ia_ctime.tv_sec;
 		arg->ctimensec = iattr->ia_ctime.tv_nsec;
 	}
+	return 0;
 }
 
 /*
@@ -1741,7 +1756,11 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
 
 	memset(&inarg, 0, sizeof(inarg));
 	memset(&outarg, 0, sizeof(outarg));
-	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+	err = iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
+	if (err) {
+		goto error;
+	}
+
 	if (file) {
 		struct fuse_file *ff = file->private_data;
 		inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e47a6ab518..e7dbbb5d62b4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -598,6 +598,9 @@ struct fuse_conn {
 
 	/** Read/write semaphore to hold when accessing sb. */
 	struct rw_semaphore killsb;
+
+	/** User namespace to communicate uids and gids to the fuse daemon */
+	struct user_namespace *user_ns;
 };
 
 static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 03246cd9d47a..894288f7ad67 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/exportfs.h>
+#include <linux/user_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -577,8 +578,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
 
-	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+	seq_printf(m, ",user_id=%u", from_kuid_munged(fc->user_ns, fc->user_id));
+	seq_printf(m, ",group_id=%u", from_kgid_munged(fc->user_ns, fc->group_id));
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
@@ -616,6 +617,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->initialized = 0;
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
+	fc->user_ns = get_user_ns(current_user_ns());
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc)
 	if (atomic_dec_and_test(&fc->count)) {
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
+		put_user_ns(fc->user_ns);
+		fc->user_ns = NULL;
 		fc->release(fc);
 	}
 }
@@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_time_gran = 1;
 	sb->s_export_op = &fuse_export_operations;
+	sb->s_user_ns = get_user_ns(current_user_ns());
 
 	file = fget(d.fd);
 	err = -EINVAL;
@@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 	}
 
 	kill_anon_super(sb);
+	put_user_ns(sb->s_user_ns);
 }
 
 static struct file_system_type fuse_fs_type = {
-- 
1.9.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-fuse-Teach-fuse-how-to-handle-the-pid-namespace.patch --]
[-- Type: text/x-diff, Size: 4851 bytes --]

>From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 4 Oct 2012 13:42:34 -0700
Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/fuse/dev.c    |  2 +-
 fs/fuse/file.c   | 17 ++++++++++-------
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  |  4 ++++
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index e01f30c51b3c..448775701763 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -128,7 +128,7 @@ static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
 	req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid());
 	req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid());
-	req->in.h.pid = current->pid;
+	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
 static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 912061ac4baf..2d52801fa5dd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2130,7 +2130,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
 	return generic_file_mmap(file, vma);
 }
 
-static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
+static int convert_fuse_file_lock(struct fuse_conn *fc,
+				  const struct fuse_file_lock *ffl,
 				  struct file_lock *fl)
 {
 	switch (ffl->type) {
@@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
 
 		fl->fl_start = ffl->start;
 		fl->fl_end = ffl->end;
-		fl->fl_pid = ffl->pid;
+		rcu_read_lock();
+		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
+		rcu_read_unlock();
 		break;
 
 	default:
@@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
 }
 
 static void fuse_lk_fill(struct fuse_req *req, struct file *file,
-			 const struct file_lock *fl, int opcode, pid_t pid,
+			 const struct file_lock *fl, int opcode, struct pid *pid,
 			 int flock)
 {
 	struct inode *inode = file_inode(file);
@@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct file *file,
 	arg->lk.start = fl->fl_start;
 	arg->lk.end = fl->fl_end;
 	arg->lk.type = fl->fl_type;
-	arg->lk.pid = pid;
+	arg->lk.pid = pid_nr_ns(pid, fc->pid_ns);
 	if (flock)
 		arg->lk_flags |= FUSE_LK_FLOCK;
 	req->in.h.opcode = opcode;
@@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0);
+	fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0);
 	req->out.numargs = 1;
 	req->out.args[0].size = sizeof(outarg);
 	req->out.args[0].value = &outarg;
@@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
 	err = req->out.h.error;
 	fuse_put_request(fc, req);
 	if (!err)
-		err = convert_fuse_file_lock(&outarg.lk, fl);
+		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
 
 	return err;
 }
@@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_req *req;
 	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
+	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
 	int err;
 
 	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e7dbbb5d62b4..5d93f87e9960 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -601,6 +601,9 @@ struct fuse_conn {
 
 	/** User namespace to communicate uids and gids to the fuse daemon */
 	struct user_namespace *user_ns;
+
+	/** Pid namespace to communicate pids to the fuse daemon */
+	struct pid_namespace *pid_ns;
 };
 
 static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 894288f7ad67..5284d7fda269 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -21,6 +21,7 @@
 #include <linux/sched.h>
 #include <linux/exportfs.h>
 #include <linux/user_namespace.h>
+#include <linux/pid_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -618,6 +619,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	fc->user_ns = get_user_ns(current_user_ns());
+	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc)
 			fuse_request_free(fc->destroy_req);
 		put_user_ns(fc->user_ns);
 		fc->user_ns = NULL;
+		put_pid_ns(fc->pid_ns);
+		fc->pid_ns = NULL;
 		fc->release(fc);
 	}
 }
-- 
1.9.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-userns-fuse-unprivileged-mount-suport.patch --]
[-- Type: text/x-diff, Size: 804 bytes --]

>From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Fri, 5 Oct 2012 10:18:28 -0700
Subject: [PATCH 3/4] userns: fuse unprivileged mount suport

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/fuse/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 5284d7fda269..75f5326868e0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 static struct file_system_type fuse_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "fuse",
-	.fs_flags	= FS_HAS_SUBTYPE,
+	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 	.mount		= fuse_mount,
 	.kill_sb	= fuse_kill_sb_anon,
 };
-- 
1.9.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0004-fuse-Only-allow-read-writing-user-xattrs.patch --]
[-- Type: text/x-diff, Size: 1500 bytes --]

>From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Fri, 5 Oct 2012 14:33:36 -0700
Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs

In the context of unprivileged mounts supporting anything except
xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
for all other types of xattrs.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/fuse/dir.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d74c75a057cd..d84f5b819fab 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/namei.h>
 #include <linux/slab.h>
+#include <linux/xattr.h>
 
 static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
 {
@@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
 	if (fc->no_setxattr)
 		return -EOPNOTSUPP;
 
+	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
+		return -EOPNOTSUPP;
+
 	req = fuse_get_req_nopages(fc);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
@@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
 	if (fc->no_getxattr)
 		return -EOPNOTSUPP;
 
+	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
+		return -EOPNOTSUPP;
+
 	req = fuse_get_req_nopages(fc);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-27  1:41                         ` Eric W. Biederman
@ 2014-09-27  4:24                           ` Seth Forshee
  2014-09-29 19:34                             ` Eric W. Biederman
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-27  4:24 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel, Serge E. Hallyn

On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote:
> >> I am on the fence about what to do when a uid from the filesystem server
> >> or for other filesystems the on-disk data structures does not map, but
> >> make_bad_inode is simpler in conception.  So make_bad_inode seems like
> >> a good place to start.   For fuse especially this isn't hard because
> >> the make_bad_inode calls are already there to handle a change in i_mode.
> >
> > I agree that if we're unsure then make_bad_inode is a more logical place
> > to start, since it's easier to loosen the restrictions later than to
> > tighten them. I've got an initiail implementation that I'm in the
> > process of testing. If you want to take a look I've pushed it to:
> >
> >   git://kernel.ubuntu.com/sforshee/linux.git fuse-userns 
> 
> Thanks.  If I can scrape together enough focus I will look at it.
> 
> As a second best thing here are my prototype from when I was looking at
> performing this fuse conversion last.  Given that you had missed
> checking the from_kuid permission checks, it might be worth comparing
> and seeing if there is something else in these patches that would be
> worth including.

I think all of the from_kuid checks have been there for at least the
last two rounds of patches, but let me know if you see one I missed.

Looking at your patches, I see a lot that looks familiar. Seems like a
good sign :-)

I do see a few things I missed though. I've pointed those out below, and
in those cases I'm updating my patches. There are also some other
differences which I don't believe require any changes on my part, and
I've provided explanations for those. I also asked a few questions.

Miklos, are you satisfied with Eric's explanations about using a single
namespace and the permissions checks on uids? If so I should be able to
send updated patches early next week, otherwise let's continue trying to
resolve the points of disagreement.

> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>  			  struct kstat *stat)
>  {
>  	unsigned int blkbits;
> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>  	stat->ino = attr->ino;
>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	stat->nlink = attr->nlink;
> -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> +	stat->uid = make_kuid(fc->user_ns, attr->uid);
> +	if (!uid_valid(stat->uid))
> +		return -EOVERFLOW;
> +	stat->gid = make_kgid(fc->user_ns, attr->gid);
> +	if (!gid_valid(stat->gid))
> +		return -EOVERFLOW;

If we were to go with the invalid id approach these checks shouldn't be
necessary, and they'll get converted to the overflow ids for userspace.
In my make_bad_inode patches I'm doing this check when we update the
inode and returning -EINVAL if it doesn't map. Then I changed
fuse_fillattr to just copy them from the inode, since fuse always
updates the inode right before calling fuse_fillattr anyway and that
makes it clear why we don't need to check the results of the conversion.

>  static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
> @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
>  					       attr_timeout(&outarg),
>  					       attr_version);
>  			if (stat)
> -				fuse_fillattr(inode, &outarg.attr, stat);
> +				err = fuse_fillattr(inode, &outarg.attr,
> +						    stat);
>  		}
>  	}
>  	return err;

So right before calling fuse_fillattr here fuse_change_attributes is
called to update the inode, so it appears your patches would have
allowed setting inode->i_iuid to INVALID_UID?

> @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>  	if (atomic_dec_and_test(&fc->count)) {
>  		if (fc->destroy_req)
>  			fuse_request_free(fc->destroy_req);
> +		put_user_ns(fc->user_ns);
> +		fc->user_ns = NULL;
>  		fc->release(fc);
>  	}
>  }

I did this in fuse_free_con, but now looking again this is a bug for
cuse since it uses a different release method. I've updated my tree to
do it here instead.

> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>  	sb->s_maxbytes = MAX_LFS_FILESIZE;
>  	sb->s_time_gran = 1;
>  	sb->s_export_op = &fuse_export_operations;
> +	sb->s_user_ns = get_user_ns(current_user_ns());
>  
>  	file = fget(d.fd);
>  	err = -EINVAL;
> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>  	}
>  
>  	kill_anon_super(sb);
> +	put_user_ns(sb->s_user_ns);
>  }

What's the point of s_user_ns? It doesn't seem to be used anywhere.

> From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Thu, 4 Oct 2012 13:42:34 -0700
> Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace.

<snip>

> -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
> +static int convert_fuse_file_lock(struct fuse_conn *fc,
> +				  const struct fuse_file_lock *ffl,
>  				  struct file_lock *fl)
>  {
>  	switch (ffl->type) {
> @@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
>  
>  		fl->fl_start = ffl->start;
>  		fl->fl_end = ffl->end;
> -		fl->fl_pid = ffl->pid;
> +		rcu_read_lock();
> +		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
> +		rcu_read_unlock();
>  		break;
>  
>  	default:
> @@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
>  }
>  
>  static void fuse_lk_fill(struct fuse_req *req, struct file *file,
> -			 const struct file_lock *fl, int opcode, pid_t pid,
> +			 const struct file_lock *fl, int opcode, struct pid *pid,
>  			 int flock)
>  {
>  	struct inode *inode = file_inode(file);
> @@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct file *file,
>  	arg->lk.start = fl->fl_start;
>  	arg->lk.end = fl->fl_end;
>  	arg->lk.type = fl->fl_type;
> -	arg->lk.pid = pid;
> +	arg->lk.pid = pid_nr_ns(pid, fc->pid_ns);
>  	if (flock)
>  		arg->lk_flags |= FUSE_LK_FLOCK;
>  	req->in.h.opcode = opcode;
> @@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
>  
> -	fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0);
> +	fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0);
>  	req->out.numargs = 1;
>  	req->out.args[0].size = sizeof(outarg);
>  	req->out.args[0].value = &outarg;
> @@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
>  	err = req->out.h.error;
>  	fuse_put_request(fc, req);
>  	if (!err)
> -		err = convert_fuse_file_lock(&outarg.lk, fl);
> +		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
>  
>  	return err;
>  }
> @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  	struct fuse_req *req;
>  	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
> -	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
> +	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
>  	int err;
>  
>  	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {

D'oh, I did totally miss this file locking stuff. Your changes look
good, so I'll pretty much take them verbatim.

> @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>  			fuse_request_free(fc->destroy_req);
>  		put_user_ns(fc->user_ns);
>  		fc->user_ns = NULL;
> +		put_pid_ns(fc->pid_ns);
> +		fc->pid_ns = NULL;
>  		fc->release(fc);
>  	}
>  }

And I had a leak here for cuse as with the userns.

> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Fri, 5 Oct 2012 10:18:28 -0700
> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/fuse/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 5284d7fda269..75f5326868e0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>  static struct file_system_type fuse_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "fuse",
> -	.fs_flags	= FS_HAS_SUBTYPE,
> +	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>  	.mount		= fuse_mount,
>  	.kill_sb	= fuse_kill_sb_anon,
>  };

I have the order of my patches switched, then this one squashed in with
the one which adds userns support to fuse.

> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Fri, 5 Oct 2012 14:33:36 -0700
> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
> 
> In the context of unprivileged mounts supporting anything except
> xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
> for all other types of xattrs.
> 
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/fuse/dir.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d74c75a057cd..d84f5b819fab 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched.h>
>  #include <linux/namei.h>
>  #include <linux/slab.h>
> +#include <linux/xattr.h>
>  
>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>  {
> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
>  	if (fc->no_setxattr)
>  		return -EOPNOTSUPP;
>  
> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +		return -EOPNOTSUPP;
> +
>  	req = fuse_get_req_nopages(fc);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
>  	if (fc->no_getxattr)
>  		return -EOPNOTSUPP;
>  
> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +		return -EOPNOTSUPP;
> +
>  	req = fuse_get_req_nopages(fc);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);

This I hadn't considered, but it seems reasonable. Two questions though.

Why shouldn't we let root-owned mounts support namespaces other than
"user."?

Thinking ahead to (hopefully) allowing other filesystems to be mounted
from user namespaces, should this be enforced by the vfs instead?

Thanks,
Seth


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-27  4:24                           ` Seth Forshee
@ 2014-09-29 19:34                             ` Eric W. Biederman
  2014-09-30 16:25                               ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-29 19:34 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel, Serge E. Hallyn

Seth Forshee <seth.forshee@canonical.com> writes:

> On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote:
>> >> I am on the fence about what to do when a uid from the filesystem server
>> >> or for other filesystems the on-disk data structures does not map, but
>> >> make_bad_inode is simpler in conception.  So make_bad_inode seems like
>> >> a good place to start.   For fuse especially this isn't hard because
>> >> the make_bad_inode calls are already there to handle a change in i_mode.
>> >
>> > I agree that if we're unsure then make_bad_inode is a more logical place
>> > to start, since it's easier to loosen the restrictions later than to
>> > tighten them. I've got an initiail implementation that I'm in the
>> > process of testing. If you want to take a look I've pushed it to:
>> >
>> >   git://kernel.ubuntu.com/sforshee/linux.git fuse-userns 
>> 
>> Thanks.  If I can scrape together enough focus I will look at it.
>> 
>> As a second best thing here are my prototype from when I was looking at
>> performing this fuse conversion last.  Given that you had missed
>> checking the from_kuid permission checks, it might be worth comparing
>> and seeing if there is something else in these patches that would be
>> worth including.
>
> I think all of the from_kuid checks have been there for at least the
> last two rounds of patches, but let me know if you see one I missed.
>
> Looking at your patches, I see a lot that looks familiar. Seems like a
> good sign :-)
>
> I do see a few things I missed though. I've pointed those out below, and
> in those cases I'm updating my patches. There are also some other
> differences which I don't believe require any changes on my part, and
> I've provided explanations for those. I also asked a few questions.
>
> Miklos, are you satisfied with Eric's explanations about using a single
> namespace and the permissions checks on uids? If so I should be able to
> send updated patches early next week, otherwise let's continue trying to
> resolve the points of disagreement.
>
>> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>>  			  struct kstat *stat)
>>  {
>>  	unsigned int blkbits;
>> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>>  	stat->ino = attr->ino;
>>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>>  	stat->nlink = attr->nlink;
>> -	stat->uid = make_kuid(&init_user_ns, attr->uid);
>> -	stat->gid = make_kgid(&init_user_ns, attr->gid);
>> +	stat->uid = make_kuid(fc->user_ns, attr->uid);
>> +	if (!uid_valid(stat->uid))
>> +		return -EOVERFLOW;
>> +	stat->gid = make_kgid(fc->user_ns, attr->gid);
>> +	if (!gid_valid(stat->gid))
>> +		return -EOVERFLOW;
>
> If we were to go with the invalid id approach these checks shouldn't be
> necessary, and they'll get converted to the overflow ids for
> userspace.

Correct.  

> In my make_bad_inode patches I'm doing this check when we update the
> inode and returning -EINVAL if it doesn't map. 

In fuse_change_attributes?  That seems reasonable.

> fuse_fillattr to just copy them from the inode, since fuse always
> updates the inode right before calling fuse_fillattr anyway and that
> makes it clear why we don't need to check the results of the
> conversion.

Agreed.  There is no point in having that code twice.

>>  static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
>> @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
>>  					       attr_timeout(&outarg),
>>  					       attr_version);
>>  			if (stat)
>> -				fuse_fillattr(inode, &outarg.attr, stat);
>> +				err = fuse_fillattr(inode, &outarg.attr,
>> +						    stat);
>>  		}
>>  	}
>>  	return err;
>
> So right before calling fuse_fillattr here fuse_change_attributes is
> called to update the inode, so it appears your patches would have
> allowed setting inode->i_iuid to INVALID_UID?

Quite possibly. I am scratching my head about that one.

>> @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>>  	if (atomic_dec_and_test(&fc->count)) {
>>  		if (fc->destroy_req)
>>  			fuse_request_free(fc->destroy_req);
>> +		put_user_ns(fc->user_ns);
>> +		fc->user_ns = NULL;
>>  		fc->release(fc);
>>  	}
>>  }
>
> I did this in fuse_free_con, but now looking again this is a bug for
> cuse since it uses a different release method. I've updated my tree to
> do it here instead.


>> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>>  	sb->s_maxbytes = MAX_LFS_FILESIZE;
>>  	sb->s_time_gran = 1;
>>  	sb->s_export_op = &fuse_export_operations;
>> +	sb->s_user_ns = get_user_ns(current_user_ns());
>>  
>>  	file = fget(d.fd);
>>  	err = -EINVAL;
>> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>>  	}
>>  
>>  	kill_anon_super(sb);
>> +	put_user_ns(sb->s_user_ns);
>>  }
>
> What's the point of s_user_ns? It doesn't seem to be used anywhere.

Not in these patches, and the field isn't added here either.  It was my
exporting of the user namespace that ids are enconded in the filesystem
to generic code in the vfs.  By default that I am setting that field
to init_user_ns.

Looking through my tree the only real user of it is the quota code.

However for acls and other xattrs we may also care.  

For block based filesystems I woudn't expect to need the fuse connection
structure and storing the user namespace there.  So that field becomes
a lot less redundant in the general case.

>> @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
>>  	struct fuse_conn *fc = get_fuse_conn(inode);
>>  	struct fuse_req *req;
>>  	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
>> -	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
>> +	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
>>  	int err;
>>  
>>  	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
>
> D'oh, I did totally miss this file locking stuff. Your changes look
> good, so I'll pretty much take them verbatim.

Sounds good.

>> @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>>  			fuse_request_free(fc->destroy_req);
>>  		put_user_ns(fc->user_ns);
>>  		fc->user_ns = NULL;
>> +		put_pid_ns(fc->pid_ns);
>> +		fc->pid_ns = NULL;
>>  		fc->release(fc);
>>  	}
>>  }
>
> And I had a leak here for cuse as with the userns.
>
>> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>> Date: Fri, 5 Oct 2012 10:18:28 -0700
>> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  fs/fuse/inode.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 5284d7fda269..75f5326868e0 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>>  static struct file_system_type fuse_fs_type = {
>>  	.owner		= THIS_MODULE,
>>  	.name		= "fuse",
>> -	.fs_flags	= FS_HAS_SUBTYPE,
>> +	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>>  	.mount		= fuse_mount,
>>  	.kill_sb	= fuse_kill_sb_anon,
>>  };
>
> I have the order of my patches switched, then this one squashed in with
> the one which adds userns support to fuse.

It is nice having this as a separate patch because it allows incremental
progress.   Then this patch can be added when victory is declared.

Although looking at my code this looks like this patch is probably a
little early in the series ;-)

>> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>> Date: Fri, 5 Oct 2012 14:33:36 -0700
>> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
>> 
>> In the context of unprivileged mounts supporting anything except
>> xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
>> for all other types of xattrs.
>> 
>> Cc: Miklos Szeredi <miklos@szeredi.hu>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  fs/fuse/dir.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>> 
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index d74c75a057cd..d84f5b819fab 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/sched.h>
>>  #include <linux/namei.h>
>>  #include <linux/slab.h>
>> +#include <linux/xattr.h>
>>  
>>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>>  {
>> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
>>  	if (fc->no_setxattr)
>>  		return -EOPNOTSUPP;
>>  
>> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
>> +		return -EOPNOTSUPP;
>> +
>>  	req = fuse_get_req_nopages(fc);
>>  	if (IS_ERR(req))
>>  		return PTR_ERR(req);
>> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
>>  	if (fc->no_getxattr)
>>  		return -EOPNOTSUPP;
>>  
>> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
>> +		return -EOPNOTSUPP;
>> +
>>  	req = fuse_get_req_nopages(fc);
>>  	if (IS_ERR(req))
>>  		return PTR_ERR(req);
>
> This I hadn't considered, but it seems reasonable. Two questions though.
>
> Why shouldn't we let root-owned mounts support namespaces other than
> "user."?

Are there any users of fuse who care.  Certainly the core use case of
fuse is unprivileged mounts and in that case nosuid should take care of
this.

> Thinking ahead to (hopefully) allowing other filesystems to be mounted
> from user namespaces, should this be enforced by the vfs instead?

Potentially.  I would have to look to see how hard it is to lift this
code into the vfs.  At least historically the xattr interface was ugly
enough getting some of this stuff into the vfs would require
refactoring.

My tenative patches in this area look pretty rough.  For ext2 I
just implemented:

+       if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns)
+               return -EOPNOTSUPP;
+

So for ext2 I did honor allow things to happen as you would have
suspected.

But if we are not going to require specifying nosuid looking closely at
xattrs and acls and security attributes seems pretty important.

Looking at all of this my guess is we probably want to write a list of
rules for unprivileged mounting of filesystems.  So that people can
(a) review the basic theory in case they are aware of anything we may
    have missed.
(b) Have a reference for the whatever filesystems come next.

Eric




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-29 19:34                             ` Eric W. Biederman
@ 2014-09-30 16:25                               ` Seth Forshee
  2014-10-05 16:48                                 ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-30 16:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn

On Mon, Sep 29, 2014 at 12:34:44PM -0700, Eric W. Biederman wrote:
> >> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >>  			  struct kstat *stat)
> >>  {
> >>  	unsigned int blkbits;
> >> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >>  	stat->ino = attr->ino;
> >>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >>  	stat->nlink = attr->nlink;
> >> -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> >> -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> >> +	stat->uid = make_kuid(fc->user_ns, attr->uid);
> >> +	if (!uid_valid(stat->uid))
> >> +		return -EOVERFLOW;
> >> +	stat->gid = make_kgid(fc->user_ns, attr->gid);
> >> +	if (!gid_valid(stat->gid))
> >> +		return -EOVERFLOW;
> >
> > If we were to go with the invalid id approach these checks shouldn't be
> > necessary, and they'll get converted to the overflow ids for
> > userspace.
> 
> Correct.  
> 
> > In my make_bad_inode patches I'm doing this check when we update the
> > inode and returning -EINVAL if it doesn't map. 
> 
> In fuse_change_attributes?  That seems reasonable.

In fuse_change_attributes_common, which then filters up to
fuse_change_attributes.

> >> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
> >>  	sb->s_maxbytes = MAX_LFS_FILESIZE;
> >>  	sb->s_time_gran = 1;
> >>  	sb->s_export_op = &fuse_export_operations;
> >> +	sb->s_user_ns = get_user_ns(current_user_ns());
> >>  
> >>  	file = fget(d.fd);
> >>  	err = -EINVAL;
> >> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
> >>  	}
> >>  
> >>  	kill_anon_super(sb);
> >> +	put_user_ns(sb->s_user_ns);
> >>  }
> >
> > What's the point of s_user_ns? It doesn't seem to be used anywhere.
> 
> Not in these patches, and the field isn't added here either.  It was my
> exporting of the user namespace that ids are enconded in the filesystem
> to generic code in the vfs.  By default that I am setting that field
> to init_user_ns.
> 
> Looking through my tree the only real user of it is the quota code.
> 
> However for acls and other xattrs we may also care.  
> 
> For block based filesystems I woudn't expect to need the fuse connection
> structure and storing the user namespace there.  So that field becomes
> a lot less redundant in the general case.

Hmm, so that does seem to make sense for acls. The ids should be getting
translated relative to the namespace used for the superblock.  Currently
they're translated relative to init_user_ns.

> >> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
> >> From: "Eric W. Biederman" <ebiederm@xmission.com>
> >> Date: Fri, 5 Oct 2012 10:18:28 -0700
> >> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport
> >> 
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >>  fs/fuse/inode.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> >> index 5284d7fda269..75f5326868e0 100644
> >> --- a/fs/fuse/inode.c
> >> +++ b/fs/fuse/inode.c
> >> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
> >>  static struct file_system_type fuse_fs_type = {
> >>  	.owner		= THIS_MODULE,
> >>  	.name		= "fuse",
> >> -	.fs_flags	= FS_HAS_SUBTYPE,
> >> +	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
> >>  	.mount		= fuse_mount,
> >>  	.kill_sb	= fuse_kill_sb_anon,
> >>  };
> >
> > I have the order of my patches switched, then this one squashed in with
> > the one which adds userns support to fuse.
> 
> It is nice having this as a separate patch because it allows incremental
> progress.   Then this patch can be added when victory is declared.
> 
> Although looking at my code this looks like this patch is probably a
> little early in the series ;-)

It doesn't really make any difference to me, so I'll go ahead and split
it out.

> >> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
> >> From: "Eric W. Biederman" <ebiederm@xmission.com>
> >> Date: Fri, 5 Oct 2012 14:33:36 -0700
> >> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
> >> 
> >> In the context of unprivileged mounts supporting anything except
> >> xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
> >> for all other types of xattrs.
> >> 
> >> Cc: Miklos Szeredi <miklos@szeredi.hu>
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >>  fs/fuse/dir.c | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >> 
> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> >> index d74c75a057cd..d84f5b819fab 100644
> >> --- a/fs/fuse/dir.c
> >> +++ b/fs/fuse/dir.c
> >> @@ -13,6 +13,7 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/namei.h>
> >>  #include <linux/slab.h>
> >> +#include <linux/xattr.h>
> >>  
> >>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
> >>  {
> >> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
> >>  	if (fc->no_setxattr)
> >>  		return -EOPNOTSUPP;
> >>  
> >> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> >> +		return -EOPNOTSUPP;
> >> +
> >>  	req = fuse_get_req_nopages(fc);
> >>  	if (IS_ERR(req))
> >>  		return PTR_ERR(req);
> >> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
> >>  	if (fc->no_getxattr)
> >>  		return -EOPNOTSUPP;
> >>  
> >> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> >> +		return -EOPNOTSUPP;
> >> +
> >>  	req = fuse_get_req_nopages(fc);
> >>  	if (IS_ERR(req))
> >>  		return PTR_ERR(req);
> >
> > This I hadn't considered, but it seems reasonable. Two questions though.
> >
> > Why shouldn't we let root-owned mounts support namespaces other than
> > "user."?
> 
> Are there any users of fuse who care.  Certainly the core use case of
> fuse is unprivileged mounts and in that case nosuid should take care of
> this.

I couldn't say for sure - I'm thinking that there are some filesystems
which are only supported via fuse, and if distros are automounting any
of those then it would likely be via root-owned fuse mounts. And if any
of those supports xattrs then this could be considered a regression.

> > Thinking ahead to (hopefully) allowing other filesystems to be mounted
> > from user namespaces, should this be enforced by the vfs instead?
> 
> Potentially.  I would have to look to see how hard it is to lift this
> code into the vfs.  At least historically the xattr interface was ugly
> enough getting some of this stuff into the vfs would require
> refactoring.
> 
> My tenative patches in this area look pretty rough.  For ext2 I
> just implemented:
> 
> +       if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns)
> +               return -EOPNOTSUPP;
> +
> 
> So for ext2 I did honor allow things to happen as you would have
> suspected.
> 
> But if we are not going to require specifying nosuid looking closely at
> xattrs and acls and security attributes seems pretty important.
> 
> Looking at all of this my guess is we probably want to write a list of
> rules for unprivileged mounting of filesystems.  So that people can
> (a) review the basic theory in case they are aware of anything we may
>     have missed.
> (b) Have a reference for the whatever filesystems come next.

Several namespaces are restricted at the vfs level right now, though
system.* and security.* are specifically called out as being left to the
individual filesystem to decide.

I'm not well versed in the use of xattrs, so I'll need to go do some
studying before I'll fully understand everything that needs to be done.
I guess there are a couple of options: try to tackle all of this before
merging any patches for fuse, or put some limits if fuse right now that
could potentially be lifted after we have more general rules. I'd prefer
the second option so we can get some level of support for fuse in
containers sooner.

For xattrs I could do something like this to avoid regressions in
init_user_ns while restricting to user.* in other namespaces for now:

        if (fc->user_ns != &init_user_ns &&
            strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
        	return -EOPNOTSUPP;

Thoughts?

Thanks,
Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-30 16:25                               ` Seth Forshee
@ 2014-10-05 16:48                                 ` Seth Forshee
  2014-10-06 16:00                                   ` Serge Hallyn
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-10-05 16:48 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel, Serge E. Hallyn

On Tue, Sep 30, 2014 at 11:25:59AM -0500, Seth Forshee wrote:
> > >> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
> > >> From: "Eric W. Biederman" <ebiederm@xmission.com>
> > >> Date: Fri, 5 Oct 2012 14:33:36 -0700
> > >> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
> > >> 
> > >> In the context of unprivileged mounts supporting anything except
> > >> xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
> > >> for all other types of xattrs.
> > >> 
> > >> Cc: Miklos Szeredi <miklos@szeredi.hu>
> > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > >> ---
> > >>  fs/fuse/dir.c | 7 +++++++
> > >>  1 file changed, 7 insertions(+)
> > >> 
> > >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > >> index d74c75a057cd..d84f5b819fab 100644
> > >> --- a/fs/fuse/dir.c
> > >> +++ b/fs/fuse/dir.c
> > >> @@ -13,6 +13,7 @@
> > >>  #include <linux/sched.h>
> > >>  #include <linux/namei.h>
> > >>  #include <linux/slab.h>
> > >> +#include <linux/xattr.h>
> > >>  
> > >>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
> > >>  {
> > >> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
> > >>  	if (fc->no_setxattr)
> > >>  		return -EOPNOTSUPP;
> > >>  
> > >> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > >> +		return -EOPNOTSUPP;
> > >> +
> > >>  	req = fuse_get_req_nopages(fc);
> > >>  	if (IS_ERR(req))
> > >>  		return PTR_ERR(req);
> > >> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
> > >>  	if (fc->no_getxattr)
> > >>  		return -EOPNOTSUPP;
> > >>  
> > >> +	if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > >> +		return -EOPNOTSUPP;
> > >> +
> > >>  	req = fuse_get_req_nopages(fc);
> > >>  	if (IS_ERR(req))
> > >>  		return PTR_ERR(req);
> > >
> > > This I hadn't considered, but it seems reasonable. Two questions though.
> > >
> > > Why shouldn't we let root-owned mounts support namespaces other than
> > > "user."?
> > 
> > Are there any users of fuse who care.  Certainly the core use case of
> > fuse is unprivileged mounts and in that case nosuid should take care of
> > this.
> 
> I couldn't say for sure - I'm thinking that there are some filesystems
> which are only supported via fuse, and if distros are automounting any
> of those then it would likely be via root-owned fuse mounts. And if any
> of those supports xattrs then this could be considered a regression.
> 
> > > Thinking ahead to (hopefully) allowing other filesystems to be mounted
> > > from user namespaces, should this be enforced by the vfs instead?
> > 
> > Potentially.  I would have to look to see how hard it is to lift this
> > code into the vfs.  At least historically the xattr interface was ugly
> > enough getting some of this stuff into the vfs would require
> > refactoring.
> > 
> > My tenative patches in this area look pretty rough.  For ext2 I
> > just implemented:
> > 
> > +       if (dentry->d_inode->i_sb->s_user_ns != &init_user_ns)
> > +               return -EOPNOTSUPP;
> > +
> > 
> > So for ext2 I did honor allow things to happen as you would have
> > suspected.
> > 
> > But if we are not going to require specifying nosuid looking closely at
> > xattrs and acls and security attributes seems pretty important.
> > 
> > Looking at all of this my guess is we probably want to write a list of
> > rules for unprivileged mounting of filesystems.  So that people can
> > (a) review the basic theory in case they are aware of anything we may
> >     have missed.
> > (b) Have a reference for the whatever filesystems come next.
> 
> Several namespaces are restricted at the vfs level right now, though
> system.* and security.* are specifically called out as being left to the
> individual filesystem to decide.
> 
> I'm not well versed in the use of xattrs, so I'll need to go do some
> studying before I'll fully understand everything that needs to be done.
> I guess there are a couple of options: try to tackle all of this before
> merging any patches for fuse, or put some limits if fuse right now that
> could potentially be lifted after we have more general rules. I'd prefer
> the second option so we can get some level of support for fuse in
> containers sooner.
> 
> For xattrs I could do something like this to avoid regressions in
> init_user_ns while restricting to user.* in other namespaces for now:
> 
>         if (fc->user_ns != &init_user_ns &&
>             strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
>         	return -EOPNOTSUPP;

After digging into this some more I think I agree with you. At minimum
letting users insert arbitrary xattrs via fuse bypasses the usual
restrictions on setting xattrs. This is probably mitigated by the
limited visibility of the fuse mount in the usual case for unprivileged
users, but it does seem like a bad idea fundamentally.

So I was thinking of something like the following (untested) to let root
in the host support privileged xattrs while limiting unprivileged users
to user.*. Miklos, does this look acceptable or would you prefer
something different?


diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index e3123bfbc711..1a3ee5663dea 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
 	if (fc->no_setxattr)
 		return -EOPNOTSUPP;
 
+	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
+	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
+		return -EOPNOTSUPP;
+
 	req = fuse_get_req_nopages(fc);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
@@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
 	if (fc->no_getxattr)
 		return -EOPNOTSUPP;
 
+	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
+	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
+		return -EOPNOTSUPP;
+
 	req = fuse_get_req_nopages(fc);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 81187ba04e4a..bc0fd14b962a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -46,6 +46,11 @@
     doing the mount will be allowed to access the filesystem */
 #define FUSE_ALLOW_OTHER         (1 << 1)
 
+/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the
+    user.* namespace are allowed. This option is only allowed for
+    system root. */
+#define FUSE_PRIV_XATTRS	(1 << 2)
+
 /** Number of page pointers embedded in fuse_req */
 #define FUSE_REQ_INLINE_PAGES 1
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b88b5a780228..6716b56d43a1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -493,6 +493,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_PRIV_XATTRS,
 	OPT_ERR
 };
 
@@ -505,6 +506,7 @@ static const match_table_t tokens = {
 	{OPT_ALLOW_OTHER,		"allow_other"},
 	{OPT_MAX_READ,			"max_read=%u"},
 	{OPT_BLKSIZE,			"blksize=%u"},
+	{OPT_PRIV_XATTRS,		"priv_xattr"},
 	{OPT_ERR,			NULL}
 };
 
@@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 			d->blksize = value;
 			break;
 
+		case OPT_PRIV_XATTRS:
+			if (!capable(CAP_SYS_ADMIN))
+				return 0;
+			d->flags |= FUSE_PRIV_XATTRS;
+			break;
+
 		default:
 			return 0;
 		}


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-10-05 16:48                                 ` Seth Forshee
@ 2014-10-06 16:00                                   ` Serge Hallyn
  2014-10-06 16:31                                     ` Seth Forshee
  0 siblings, 1 reply; 39+ messages in thread
From: Serge Hallyn @ 2014-10-06 16:00 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn

Quoting Seth Forshee (seth.forshee@canonical.com):
...
> After digging into this some more I think I agree with you. At minimum
> letting users insert arbitrary xattrs via fuse bypasses the usual
> restrictions on setting xattrs. This is probably mitigated by the
> limited visibility of the fuse mount in the usual case for unprivileged
> users, but it does seem like a bad idea fundamentally.
> 
> So I was thinking of something like the following (untested) to let root
> in the host support privileged xattrs while limiting unprivileged users
> to user.*. Miklos, does this look acceptable or would you prefer
> something different?

So it won't be possible to set capabilities in a fuse fs?  This may
be necessary, but it will prevent i.e. live-iso builders from writing
for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the
iso.

> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index e3123bfbc711..1a3ee5663dea 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
>  	if (fc->no_setxattr)
>  		return -EOPNOTSUPP;
>  
> +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +		return -EOPNOTSUPP;
> +
>  	req = fuse_get_req_nopages(fc);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
>  	if (fc->no_getxattr)
>  		return -EOPNOTSUPP;
>  
> +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +		return -EOPNOTSUPP;
> +
>  	req = fuse_get_req_nopages(fc);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 81187ba04e4a..bc0fd14b962a 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -46,6 +46,11 @@
>      doing the mount will be allowed to access the filesystem */
>  #define FUSE_ALLOW_OTHER         (1 << 1)
>  
> +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the
> +    user.* namespace are allowed. This option is only allowed for
> +    system root. */
> +#define FUSE_PRIV_XATTRS	(1 << 2)
> +
>  /** Number of page pointers embedded in fuse_req */
>  #define FUSE_REQ_INLINE_PAGES 1
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b88b5a780228..6716b56d43a1 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -493,6 +493,7 @@ enum {
>  	OPT_ALLOW_OTHER,
>  	OPT_MAX_READ,
>  	OPT_BLKSIZE,
> +	OPT_PRIV_XATTRS,
>  	OPT_ERR
>  };
>  
> @@ -505,6 +506,7 @@ static const match_table_t tokens = {
>  	{OPT_ALLOW_OTHER,		"allow_other"},
>  	{OPT_MAX_READ,			"max_read=%u"},
>  	{OPT_BLKSIZE,			"blksize=%u"},
> +	{OPT_PRIV_XATTRS,		"priv_xattr"},
>  	{OPT_ERR,			NULL}
>  };
>  
> @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>  			d->blksize = value;
>  			break;
>  
> +		case OPT_PRIV_XATTRS:
> +			if (!capable(CAP_SYS_ADMIN))
> +				return 0;
> +			d->flags |= FUSE_PRIV_XATTRS;
> +			break;
> +
>  		default:
>  			return 0;
>  		}
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-10-06 16:00                                   ` Serge Hallyn
@ 2014-10-06 16:31                                     ` Seth Forshee
  2014-10-06 16:36                                       ` Serge Hallyn
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-10-06 16:31 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn

On Mon, Oct 06, 2014 at 04:00:06PM +0000, Serge Hallyn wrote:
> Quoting Seth Forshee (seth.forshee@canonical.com):
> ...
> > After digging into this some more I think I agree with you. At minimum
> > letting users insert arbitrary xattrs via fuse bypasses the usual
> > restrictions on setting xattrs. This is probably mitigated by the
> > limited visibility of the fuse mount in the usual case for unprivileged
> > users, but it does seem like a bad idea fundamentally.
> > 
> > So I was thinking of something like the following (untested) to let root
> > in the host support privileged xattrs while limiting unprivileged users
> > to user.*. Miklos, does this look acceptable or would you prefer
> > something different?
> 
> So it won't be possible to set capabilities in a fuse fs?  This may
> be necessary, but it will prevent i.e. live-iso builders from writing
> for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the
> iso.

cap_inode_setxattr() already requires CAP_SETFCAP in the host to do
this, which I'd think root in in an unpriv container wouldn't have, so
aren't you prevented from doing so already? I suppose the LSM could
override this restriction though.

> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index e3123bfbc711..1a3ee5663dea 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
> >  	if (fc->no_setxattr)
> >  		return -EOPNOTSUPP;
> >  
> > +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> > +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > +		return -EOPNOTSUPP;
> > +
> >  	req = fuse_get_req_nopages(fc);
> >  	if (IS_ERR(req))
> >  		return PTR_ERR(req);
> > @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
> >  	if (fc->no_getxattr)
> >  		return -EOPNOTSUPP;
> >  
> > +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> > +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > +		return -EOPNOTSUPP;
> > +
> >  	req = fuse_get_req_nopages(fc);
> >  	if (IS_ERR(req))
> >  		return PTR_ERR(req);
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 81187ba04e4a..bc0fd14b962a 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -46,6 +46,11 @@
> >      doing the mount will be allowed to access the filesystem */
> >  #define FUSE_ALLOW_OTHER         (1 << 1)
> >  
> > +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the
> > +    user.* namespace are allowed. This option is only allowed for
> > +    system root. */
> > +#define FUSE_PRIV_XATTRS	(1 << 2)
> > +
> >  /** Number of page pointers embedded in fuse_req */
> >  #define FUSE_REQ_INLINE_PAGES 1
> >  
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index b88b5a780228..6716b56d43a1 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -493,6 +493,7 @@ enum {
> >  	OPT_ALLOW_OTHER,
> >  	OPT_MAX_READ,
> >  	OPT_BLKSIZE,
> > +	OPT_PRIV_XATTRS,
> >  	OPT_ERR
> >  };
> >  
> > @@ -505,6 +506,7 @@ static const match_table_t tokens = {
> >  	{OPT_ALLOW_OTHER,		"allow_other"},
> >  	{OPT_MAX_READ,			"max_read=%u"},
> >  	{OPT_BLKSIZE,			"blksize=%u"},
> > +	{OPT_PRIV_XATTRS,		"priv_xattr"},
> >  	{OPT_ERR,			NULL}
> >  };
> >  
> > @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> >  			d->blksize = value;
> >  			break;
> >  
> > +		case OPT_PRIV_XATTRS:
> > +			if (!capable(CAP_SYS_ADMIN))
> > +				return 0;
> > +			d->flags |= FUSE_PRIV_XATTRS;
> > +			break;
> > +
> >  		default:
> >  			return 0;
> >  		}
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-10-06 16:31                                     ` Seth Forshee
@ 2014-10-06 16:36                                       ` Serge Hallyn
  0 siblings, 0 replies; 39+ messages in thread
From: Serge Hallyn @ 2014-10-06 16:36 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi, Alexander Viro, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel, Serge E. Hallyn

Quoting Seth Forshee (seth.forshee@canonical.com):
> On Mon, Oct 06, 2014 at 04:00:06PM +0000, Serge Hallyn wrote:
> > Quoting Seth Forshee (seth.forshee@canonical.com):
> > ...
> > > After digging into this some more I think I agree with you. At minimum
> > > letting users insert arbitrary xattrs via fuse bypasses the usual
> > > restrictions on setting xattrs. This is probably mitigated by the
> > > limited visibility of the fuse mount in the usual case for unprivileged
> > > users, but it does seem like a bad idea fundamentally.
> > > 
> > > So I was thinking of something like the following (untested) to let root
> > > in the host support privileged xattrs while limiting unprivileged users
> > > to user.*. Miklos, does this look acceptable or would you prefer
> > > something different?
> > 
> > So it won't be possible to set capabilities in a fuse fs?  This may
> > be necessary, but it will prevent i.e. live-iso builders from writing
> > for instance a CAP_NET_RAW=pe (instead of setuid-root) /bin/ping in the
> > iso.
> 
> cap_inode_setxattr() already requires CAP_SETFCAP in the host to do
> this, which I'd think root in in an unpriv container wouldn't have, so
> aren't you prevented from doing so already? I suppose the LSM could
> override this restriction though.

It's true we'd have to complicated that path.  And really I was being silly.
It's not safe (at least not trivially).

> > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > index e3123bfbc711..1a3ee5663dea 100644
> > > --- a/fs/fuse/dir.c
> > > +++ b/fs/fuse/dir.c
> > > @@ -1882,6 +1882,10 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
> > >  	if (fc->no_setxattr)
> > >  		return -EOPNOTSUPP;
> > >  
> > > +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> > > +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > > +		return -EOPNOTSUPP;
> > > +
> > >  	req = fuse_get_req_nopages(fc);
> > >  	if (IS_ERR(req))
> > >  		return PTR_ERR(req);
> > > @@ -1925,6 +1929,10 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
> > >  	if (fc->no_getxattr)
> > >  		return -EOPNOTSUPP;
> > >  
> > > +	if (!(fc->flags & FUSE_PRIV_XATTRS) &&
> > > +	    strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> > > +		return -EOPNOTSUPP;
> > > +
> > >  	req = fuse_get_req_nopages(fc);
> > >  	if (IS_ERR(req))
> > >  		return PTR_ERR(req);
> > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > index 81187ba04e4a..bc0fd14b962a 100644
> > > --- a/fs/fuse/fuse_i.h
> > > +++ b/fs/fuse/fuse_i.h
> > > @@ -46,6 +46,11 @@
> > >      doing the mount will be allowed to access the filesystem */
> > >  #define FUSE_ALLOW_OTHER         (1 << 1)
> > >  
> > > +/** If the FUSE_PRIV_XATTRS flag is given, then xattrs outside the
> > > +    user.* namespace are allowed. This option is only allowed for
> > > +    system root. */
> > > +#define FUSE_PRIV_XATTRS	(1 << 2)
> > > +
> > >  /** Number of page pointers embedded in fuse_req */
> > >  #define FUSE_REQ_INLINE_PAGES 1
> > >  
> > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > index b88b5a780228..6716b56d43a1 100644
> > > --- a/fs/fuse/inode.c
> > > +++ b/fs/fuse/inode.c
> > > @@ -493,6 +493,7 @@ enum {
> > >  	OPT_ALLOW_OTHER,
> > >  	OPT_MAX_READ,
> > >  	OPT_BLKSIZE,
> > > +	OPT_PRIV_XATTRS,
> > >  	OPT_ERR
> > >  };
> > >  
> > > @@ -505,6 +506,7 @@ static const match_table_t tokens = {
> > >  	{OPT_ALLOW_OTHER,		"allow_other"},
> > >  	{OPT_MAX_READ,			"max_read=%u"},
> > >  	{OPT_BLKSIZE,			"blksize=%u"},
> > > +	{OPT_PRIV_XATTRS,		"priv_xattr"},
> > >  	{OPT_ERR,			NULL}
> > >  };
> > >  
> > > @@ -592,6 +594,12 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> > >  			d->blksize = value;
> > >  			break;
> > >  
> > > +		case OPT_PRIV_XATTRS:
> > > +			if (!capable(CAP_SYS_ADMIN))
> > > +				return 0;
> > > +			d->flags |= FUSE_PRIV_XATTRS;
> > > +			break;
> > > +
> > >  		default:
> > >  			return 0;
> > >  		}
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
                   ` (4 preceding siblings ...)
  2014-09-10 12:35 ` Seth Forshee
@ 2014-09-23 16:07 ` Miklos Szeredi
  2014-09-23 16:26   ` Seth Forshee
  5 siblings, 1 reply; 39+ messages in thread
From: Miklos Szeredi @ 2014-09-23 16:07 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Eric W. Biederman, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel

On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> Here's an updated set of patches for allowing fuse mounts from pid and
> user namespaces. I discussed some of the issues we debated with the last
> patch set (and a few others) with Eric at LinuxCon, and the updates here
> mainly reflect the outcome of those discussions.
>
> The stickiest issue in the v1 patches was the question of where to get
> the user and pid namespaces from that are used for translating ids for
> communication with userspace. Eric told me that for user namespaces at
> least we need to grab a namespace at open or mount time and use only
> that namespace to prevent certain types of attacks.

I'm not convinced.  Let us have the gory details, please.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-23 16:07 ` Miklos Szeredi
@ 2014-09-23 16:26   ` Seth Forshee
  2014-09-23 17:03     ` Miklos Szeredi
  0 siblings, 1 reply; 39+ messages in thread
From: Seth Forshee @ 2014-09-23 16:26 UTC (permalink / raw)
  To: Miklos Szeredi, Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel

On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> > Here's an updated set of patches for allowing fuse mounts from pid and
> > user namespaces. I discussed some of the issues we debated with the last
> > patch set (and a few others) with Eric at LinuxCon, and the updates here
> > mainly reflect the outcome of those discussions.
> >
> > The stickiest issue in the v1 patches was the question of where to get
> > the user and pid namespaces from that are used for translating ids for
> > communication with userspace. Eric told me that for user namespaces at
> > least we need to grab a namespace at open or mount time and use only
> > that namespace to prevent certain types of attacks.
> 
> I'm not convinced.  Let us have the gory details, please.

I'll do my best, and hopefully Eric will fill in any details I miss.

I think there may have been more than one possible scenario that Eric
was describing to me, but this is the one I remember. A user could
create a namespace and mount a fuse filesystem without nosuid. It could
then pass the /dev/fuse fd to a process in init_user_ns, which could
expose a suid file owned by root (or any other user) and use it to gain
elevated privileges.

On the other hand, if file ownership is always interpreted in the
context of the namespace from which the filesystem is mounted then suid
can only be used to become another uid already under that user's
control.

Eric, does that sound right? Did I miss anything?

Seth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-23 16:26   ` Seth Forshee
@ 2014-09-23 17:03     ` Miklos Szeredi
  2014-09-23 17:33       ` Seth Forshee
  2014-09-23 21:46       ` Eric W. Biederman
  0 siblings, 2 replies; 39+ messages in thread
From: Miklos Szeredi @ 2014-09-23 17:03 UTC (permalink / raw)
  To: Miklos Szeredi, Eric W. Biederman, Alexander Viro, Serge Hallyn,
	fuse-devel, Kernel Mailing List, Linux-Fsdevel

On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote:
>> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
>> > Here's an updated set of patches for allowing fuse mounts from pid and
>> > user namespaces. I discussed some of the issues we debated with the last
>> > patch set (and a few others) with Eric at LinuxCon, and the updates here
>> > mainly reflect the outcome of those discussions.
>> >
>> > The stickiest issue in the v1 patches was the question of where to get
>> > the user and pid namespaces from that are used for translating ids for
>> > communication with userspace. Eric told me that for user namespaces at
>> > least we need to grab a namespace at open or mount time and use only
>> > that namespace to prevent certain types of attacks.
>>
>> I'm not convinced.  Let us have the gory details, please.
>
> I'll do my best, and hopefully Eric will fill in any details I miss.
>
> I think there may have been more than one possible scenario that Eric
> was describing to me, but this is the one I remember. A user could
> create a namespace and mount a fuse filesystem without nosuid. It could
> then pass the /dev/fuse fd to a process in init_user_ns, which could
> expose a suid file owned by root (or any other user) and use it to gain
> elevated privileges.
>
> On the other hand, if file ownership is always interpreted in the
> context of the namespace from which the filesystem is mounted then suid
> can only be used to become another uid already under that user's
> control.

Much simpler solution: don't allow SUID in unprivileged namespaces.
SUID is a really ugly hack with many problems, just get rid of it.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-23 17:03     ` Miklos Szeredi
@ 2014-09-23 17:33       ` Seth Forshee
  2014-09-23 21:46       ` Eric W. Biederman
  1 sibling, 0 replies; 39+ messages in thread
From: Seth Forshee @ 2014-09-23 17:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, fuse-devel,
	Kernel Mailing List, Linux-Fsdevel

On Tue, Sep 23, 2014 at 07:03:47PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee
> <seth.forshee@canonical.com> wrote:
> > On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote:
> >> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> >> > Here's an updated set of patches for allowing fuse mounts from pid and
> >> > user namespaces. I discussed some of the issues we debated with the last
> >> > patch set (and a few others) with Eric at LinuxCon, and the updates here
> >> > mainly reflect the outcome of those discussions.
> >> >
> >> > The stickiest issue in the v1 patches was the question of where to get
> >> > the user and pid namespaces from that are used for translating ids for
> >> > communication with userspace. Eric told me that for user namespaces at
> >> > least we need to grab a namespace at open or mount time and use only
> >> > that namespace to prevent certain types of attacks.
> >>
> >> I'm not convinced.  Let us have the gory details, please.
> >
> > I'll do my best, and hopefully Eric will fill in any details I miss.
> >
> > I think there may have been more than one possible scenario that Eric
> > was describing to me, but this is the one I remember. A user could
> > create a namespace and mount a fuse filesystem without nosuid. It could
> > then pass the /dev/fuse fd to a process in init_user_ns, which could
> > expose a suid file owned by root (or any other user) and use it to gain
> > elevated privileges.
> >
> > On the other hand, if file ownership is always interpreted in the
> > context of the namespace from which the filesystem is mounted then suid
> > can only be used to become another uid already under that user's
> > control.
> 
> Much simpler solution: don't allow SUID in unprivileged namespaces.
> SUID is a really ugly hack with many problems, just get rid of it.

Yeah, that's an option but will require some vfs support similar to with
nodev, but I wouldn't call it simpler. The implementation of using a
single namespace for all uid/gid translation is actually quite a bit
simpler than trying to use the context of the /dev/fuse reads and
writes. The reason is that much of the translation happens in the
context of the process doing fs operations on the fuse mount, not in the
context of the /dev/fuse reads and writes, so it involves a lot of
grabbing references to namespaces and passing them around. Either that
or moving that code to happen in the read/write path, which would also
require some substantial changes.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces
  2014-09-23 17:03     ` Miklos Szeredi
  2014-09-23 17:33       ` Seth Forshee
@ 2014-09-23 21:46       ` Eric W. Biederman
  1 sibling, 0 replies; 39+ messages in thread
From: Eric W. Biederman @ 2014-09-23 21:46 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, fuse-devel, Kernel Mailing List,
	Linux-Fsdevel

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Tue, Sep 23, 2014 at 6:26 PM, Seth Forshee
> <seth.forshee@canonical.com> wrote:
>> On Tue, Sep 23, 2014 at 06:07:35PM +0200, Miklos Szeredi wrote:
>>> On Tue, Sep 2, 2014 at 5:44 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
>>> > Here's an updated set of patches for allowing fuse mounts from pid and
>>> > user namespaces. I discussed some of the issues we debated with the last
>>> > patch set (and a few others) with Eric at LinuxCon, and the updates here
>>> > mainly reflect the outcome of those discussions.
>>> >
>>> > The stickiest issue in the v1 patches was the question of where to get
>>> > the user and pid namespaces from that are used for translating ids for
>>> > communication with userspace. Eric told me that for user namespaces at
>>> > least we need to grab a namespace at open or mount time and use only
>>> > that namespace to prevent certain types of attacks.
>>>
>>> I'm not convinced.  Let us have the gory details, please.
>>
>> I'll do my best, and hopefully Eric will fill in any details I miss.
>>
>> I think there may have been more than one possible scenario that Eric
>> was describing to me, but this is the one I remember. A user could
>> create a namespace and mount a fuse filesystem without nosuid. It could
>> then pass the /dev/fuse fd to a process in init_user_ns, which could
>> expose a suid file owned by root (or any other user) and use it to gain
>> elevated privileges.
>>
>> On the other hand, if file ownership is always interpreted in the
>> context of the namespace from which the filesystem is mounted then suid
>> can only be used to become another uid already under that user's
>> control.
>
> Much simpler solution: don't allow SUID in unprivileged namespaces.
> SUID is a really ugly hack with many problems, just get rid of it.

Except that doesn't solve the problem.

The core problem is how do we avoid allowing letting a processes
implementing a fuse filesystem to manipulate a process with privileges
that it does not.

The classic fuse solution to this is to only allow a single uid to
access the fuse filesystem.

With user namespaces we can relax that restriction when mounted with
just user namespace root permissions to allow a filesystem to use any
uids mapped into the user namespace, as the mounter of the filesystem.
The user namespace root that is mounting the filesystem already has
privileges to manipulate those users already.

Which means the simple straight forward and understandable restriction
to enforce is:

- The user namespace of the opener of /dev/fuse is the same as
  the user namespace the fuse filesystem is mounted in is the same as
  the user namespace we translate any uids in and out of.

By forbidding all uid s and gids not mapped into a user namespace from
being used you can not manipulate processes you would not be able to
manipulate before.

As a nice side effect that happens to make having the setuid bit set
something of a fuse filesystem something that is uninteresting.  At
that point the setuid bit being set is no more dangerous than the setuid
being being set on any file you own.

Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-10-06 16:37 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 15:44 [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
2014-09-02 15:44 ` [PATCH v2 1/3] vfs: Check for invalid i_uid in may_follow_link() Seth Forshee
2014-09-05 17:05   ` Serge Hallyn
2014-09-05 19:00     ` Seth Forshee
2014-09-05 19:23       ` Serge Hallyn
2014-09-02 15:44 ` [PATCH v2 2/3] fuse: Translate pids passed to userspace into pid namespaces Seth Forshee
2014-09-05 17:10   ` Serge Hallyn
2014-09-02 15:44 ` [PATCH v2 3/3] fuse: Add support for mounts from user namespaces Seth Forshee
2014-09-05 16:48   ` Serge Hallyn
2014-09-05 17:36     ` Seth Forshee
2014-09-05 19:25       ` Serge Hallyn
2014-09-05 20:40 ` [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces Seth Forshee
2014-09-10 12:35 ` Seth Forshee
2014-09-10 16:21   ` Serge E. Hallyn
2014-09-10 16:42     ` Seth Forshee
2014-09-11 18:10       ` Seth Forshee
2014-09-23 22:29         ` Eric W. Biederman
2014-09-24 13:29           ` Seth Forshee
2014-09-24 17:10             ` Eric W. Biederman
2014-09-25 15:04               ` Miklos Szeredi
2014-09-25 16:21                 ` Seth Forshee
2014-09-25 18:05                 ` Eric W. Biederman
2014-09-25 18:44                   ` Seth Forshee
2014-09-25 18:53                     ` Seth Forshee
2014-09-25 19:14                     ` Eric W. Biederman
2014-09-25 19:48                       ` Seth Forshee
2014-09-27  1:41                         ` Eric W. Biederman
2014-09-27  4:24                           ` Seth Forshee
2014-09-29 19:34                             ` Eric W. Biederman
2014-09-30 16:25                               ` Seth Forshee
2014-10-05 16:48                                 ` Seth Forshee
2014-10-06 16:00                                   ` Serge Hallyn
2014-10-06 16:31                                     ` Seth Forshee
2014-10-06 16:36                                       ` Serge Hallyn
2014-09-23 16:07 ` Miklos Szeredi
2014-09-23 16:26   ` Seth Forshee
2014-09-23 17:03     ` Miklos Szeredi
2014-09-23 17:33       ` Seth Forshee
2014-09-23 21:46       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).