linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/1] shiftfs: uid/gid shifting filesystem
@ 2016-05-12 19:06 James Bottomley
  2016-05-12 19:07 ` [RFC 1/1] shiftfs: uid/gid shifting bind mount James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2016-05-12 19:06 UTC (permalink / raw)
  To: Djalal Harouni, Chris Mason, tytso, Serge Hallyn, Josh Triplett,
	Eric W. Biederman, Andy Lutomirski, Seth Forshee, linux-fsdevel,
	linux-kernel, linux-security-module, Dongsu Park, David Herrmann,
	Miklos Szeredi, Alban Crequy, Al Viro

This is currently an RFC because the patch applies to Linus head, but
needs altering for the vfs tree, so I'll respin and resend after the
merge window closes.

My use case for this is that I run a lot of unprivileged architectural
emulation containers on my system using user namespaces.  Details here:

http://blog.hansenpartnership.com/unprivileged-build-containers/

They're mostly for building non-x86 stuff (like aarch64 and arm secure
boot and mips images).  For builds, I have all the environments in my
home directory with downshifted uids; however, sometimes I need to use
them to administer real images that run on systems, meaning the uids
are the usual privileged ones not the downshifted ones.  The only
current choice I have is to start the emulation as root so the uid/gids
match.  The reason for this filesystem is to use my standard
unprivileged containers to maintain these images.  The way I do this is
crack the image with a loop and then shift the uids before bringing up
the container.  I usually loop mount into /var/tmp/images/, so it's
owned by real root there:

jarvis:~ # ls -l /var/tmp/images/mips|head -4
total 0
drwxr-xr-x 1 root root 8192 May 12 08:33 bin
drwxr-xr-x 1 root root    6 May 12 08:33 boot
drwxr-xr-x 1 root root  167 May 12 08:33 dev

And I usually run my build containers with a uid_map of 

         0     100000       1000
      1000       1000          1
     65534     101000          1

(maps 0-999 shifted, then shifts nobody to 1000 and keeps my uid [1000]
fixed so I can mount my home directory into the namespace) and
something similar with gid_map. So I shift mount the mips image with

mount -t shiftfs -o uidmap=0:100000:1000,uidmap=65534:101000:1,gidmap=0:100000:100,gidmap=101:100101:899,gidmap=65533:101000:2 /var/tmp/images/mips /home/jejb/containers/mips

and I now see it as

jejb@jarvis:~> ls -l containers/mips|head -4
total 0
drwxr-xr-x 1 100000 100000 8192 May 12 08:33 bin/
drwxr-xr-x 1 100000 100000    6 May 12 08:33 boot/
drwxr-xr-x 1 100000 100000  167 May 12 08:33 dev/

Like my usual unprivileged build roots and I can now use an
unprivileged container to enter and administer the image.

It seems like a lot of container systems need to do something similar
when they try and provide unprivileged access to standard images. 
 Right at the moment, the security mechanism only allows root in the
host to use this, but it's not impossible to come up with a scheme for
marking trees that can safely be shift mounted by unprivileged user
namespaces.

James

---

 fs/Kconfig                 |   8 +
 fs/Makefile                |   1 +
 fs/shiftfs.c               | 833 +++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/magic.h |   2 +
 4 files changed, 844 insertions(+)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-12 19:06 [RFC 0/1] shiftfs: uid/gid shifting filesystem James Bottomley
@ 2016-05-12 19:07 ` James Bottomley
  2016-05-16 19:41   ` Serge Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2016-05-12 19:07 UTC (permalink / raw)
  To: Djalal Harouni, Chris Mason, tytso, Serge Hallyn, Josh Triplett,
	Eric W. Biederman, Andy Lutomirski, Seth Forshee, linux-fsdevel,
	linux-kernel, linux-security-module, Dongsu Park, David Herrmann,
	Miklos Szeredi, Alban Crequy, Al Viro

This allows any subtree to be uid/gid shifted and bound elsewhere.  It
does this by operating simlarly to overlayfs, except that since
there's only a single underlying layer, all dentry lookups go through
this.  Its primary use is for shifting the underlying uids of
filesystems used to support unpriviliged (uid shifted) containers.
The usual use case here is that the container is operating with an uid
shifted unprivileged root but sometimes needs to make use of or work
with a filesystem image that has root at real uid 0.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

---

Changes so far: fixed up locking and addressed viro's comments

use negative dentries on the underlying cached in d_fsdata to remove
the extra lookup_one_len() calls

Add show_options/statfs callbacks

Add proper Kconfig plumbing

diff --git a/fs/Kconfig b/fs/Kconfig
index 6725f59..a9b0834 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -94,6 +94,14 @@ source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
 
+config SHIFT_FS
+	tristate "UID/GID shifting overlay filesystem for containers"
+	help
+	  This filesystem can overlay any mounted filesystem and shift
+	  the uid/gid the files appear at.  The idea is that
+	  unprivileged containers can use this to mount root volumes
+	  using this technique.
+
 menu "Caches"
 
 source "fs/fscache/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 85b6e13..ff9890e 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -128,3 +128,4 @@ obj-y				+= exofs/ # Multiple modules
 obj-$(CONFIG_CEPH_FS)		+= ceph/
 obj-$(CONFIG_PSTORE)		+= pstore/
 obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
+obj-$(CONFIG_SHIFT_FS)		+= shiftfs.o
diff --git a/fs/shiftfs.c b/fs/shiftfs.c
new file mode 100644
index 0000000..d352377
--- /dev/null
+++ b/fs/shiftfs.c
@@ -0,0 +1,833 @@
+#include <linux/cred.h>
+#include <linux/mount.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/magic.h>
+#include <linux/parser.h>
+#include <linux/seq_file.h>
+#include <linux/statfs.h>
+#include <linux/slab.h>
+#include <linux/user_namespace.h>
+#include <linux/uidgid.h>
+
+struct shiftfs_super_info {
+	struct vfsmount *mnt;
+	struct uid_gid_map uid_map, gid_map;
+};
+
+static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode,
+				       struct dentry *dentry);
+
+enum {
+	OPT_UIDMAP,
+	OPT_GIDMAP,
+	OPT_LAST,
+};
+
+/* global filesystem options */
+static const match_table_t tokens = {
+	{ OPT_UIDMAP, "uidmap=%u:%u:%u" },
+	{ OPT_GIDMAP, "gidmap=%u:%u:%u" },
+	{ OPT_LAST, NULL }
+};
+
+/*
+ * code stolen from user_namespace.c ... except that these functions
+ * return the same id back if unmapped ... should probably have a
+ * library?
+ */
+static u32 map_id_down(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_rmb();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].lower_first;
+
+	return id;
+}
+
+static u32 map_id_up(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_rmb();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].lower_first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].first;
+
+	return id;
+}
+
+static bool mappings_overlap(struct uid_gid_map *new_map,
+			     struct uid_gid_extent *extent)
+{
+	u32 upper_first, lower_first, upper_last, lower_last;
+	unsigned idx;
+
+	upper_first = extent->first;
+	lower_first = extent->lower_first;
+	upper_last = upper_first + extent->count - 1;
+	lower_last = lower_first + extent->count - 1;
+
+	for (idx = 0; idx < new_map->nr_extents; idx++) {
+		u32 prev_upper_first, prev_lower_first;
+		u32 prev_upper_last, prev_lower_last;
+		struct uid_gid_extent *prev;
+
+		prev = &new_map->extent[idx];
+
+		prev_upper_first = prev->first;
+		prev_lower_first = prev->lower_first;
+		prev_upper_last = prev_upper_first + prev->count - 1;
+		prev_lower_last = prev_lower_first + prev->count - 1;
+
+		/* Does the upper range intersect a previous extent? */
+		if ((prev_upper_first <= upper_last) &&
+		    (prev_upper_last >= upper_first))
+			return true;
+
+		/* Does the lower range intersect a previous extent? */
+		if ((prev_lower_first <= lower_last) &&
+		    (prev_lower_last >= lower_first))
+			return true;
+	}
+	return false;
+}
+/* end code stolen from user_namespace.c */
+
+static const struct cred *shiftfs_get_up_creds(struct super_block *sb)
+{
+	struct cred *cred = prepare_creds();
+	struct shiftfs_super_info *ssi = sb->s_fs_info;
+
+	if (!cred)
+		return NULL;
+
+	cred->fsuid = KUIDT_INIT(map_id_up(&ssi->uid_map, __kuid_val(cred->fsuid)));
+	cred->fsgid = KGIDT_INIT(map_id_up(&ssi->gid_map, __kgid_val(cred->fsgid)));
+
+	return cred;
+}
+
+static const struct cred *shiftfs_new_creds(const struct cred **newcred,
+					    struct super_block *sb)
+{
+	const struct cred *cred = shiftfs_get_up_creds(sb);
+
+	*newcred = cred;
+
+	if (cred)
+		cred = override_creds(cred);
+	else
+		printk(KERN_ERR "Credential override failed: no memory\n");
+
+	return cred;
+}
+
+static void shiftfs_old_creds(const struct cred *oldcred,
+			      const struct cred **newcred)
+{
+	if (!*newcred)
+		return;
+
+	revert_creds(oldcred);
+	put_cred(*newcred);
+}
+
+static int shiftfs_parse_options(struct shiftfs_super_info *ssi, char *options)
+{
+	char *p;
+	substring_t args[MAX_OPT_ARGS];
+	int from, to, count;
+	struct uid_gid_map *map, *maps[2] = {
+		[OPT_UIDMAP] = &ssi->uid_map,
+		[OPT_GIDMAP] = &ssi->gid_map,
+	};
+
+	while ((p = strsep(&options, ",")) != NULL) {
+		int token;
+		struct uid_gid_extent ext;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		if (token != OPT_UIDMAP && token != OPT_GIDMAP)
+			return -EINVAL;
+		if (match_int(&args[0], &from) ||
+		    match_int(&args[1], &to) ||
+		    match_int(&args[2], &count))
+			return -EINVAL;
+		map = maps[token];
+		if (map->nr_extents >= UID_GID_MAP_MAX_EXTENTS)
+			return -EINVAL;
+		ext.first = from;
+		ext.lower_first = to;
+		ext.count = count;
+		if (mappings_overlap(map, &ext))
+			return -EINVAL;
+		map->extent[map->nr_extents++] = ext;
+	}
+	return 0;
+}
+
+static void shiftfs_d_release(struct dentry *dentry)
+{
+	struct dentry *real = dentry->d_fsdata;
+
+	dput(real);
+}
+
+static const struct dentry_operations shiftfs_dentry_ops = {
+	.d_release	= shiftfs_d_release,
+};
+
+static int shiftfs_readlink(struct dentry *dentry, char __user *data,
+			    int flags)
+{
+	struct dentry *real = dentry->d_fsdata;
+	const struct inode_operations *iop = real->d_inode->i_op;
+
+	if (iop->readlink)
+		return iop->readlink(real, data, flags);
+
+	return -EINVAL;
+}
+
+static const char *shiftfs_get_link(struct dentry *dentry, struct inode *inode,
+				    struct delayed_call *done)
+{
+	if (dentry) {
+		struct dentry *real = dentry->d_fsdata;
+		struct inode *reali = real->d_inode;
+		const struct inode_operations *iop = reali->i_op;
+		const char *res = ERR_PTR(-EPERM);
+
+		if (iop->get_link)
+			res = iop->get_link(real, reali, done);
+
+		return res;
+	} else {
+		/* RCU lookup not supported */
+		return ERR_PTR(-ECHILD);
+	}
+}
+
+static int shiftfs_setxattr(struct dentry *dentry, const char *name,
+			    const void *value, size_t size, int flags)
+{
+	struct dentry *real = dentry->d_fsdata;
+	const struct inode_operations *iop = real->d_inode->i_op;
+	int err = -EOPNOTSUPP;
+
+	if (iop->setxattr) {
+		const struct cred *oldcred, *newcred;
+
+		oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+		err = iop->setxattr(real, name, value, size, flags);
+		shiftfs_old_creds(oldcred, &newcred);
+	}
+
+	return err;
+}
+
+static ssize_t shiftfs_getxattr(struct dentry *dentry, const char *name,
+				void *value, size_t size)
+{
+	struct dentry *real = dentry->d_fsdata;
+	const struct inode_operations *iop = real->d_inode->i_op;
+	int err = -EOPNOTSUPP;
+
+	if (iop->getxattr) {
+		const struct cred *oldcred, *newcred;
+
+		oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+		err = iop->getxattr(real, name, value, size);
+		shiftfs_old_creds(oldcred, &newcred);
+	}
+
+	return err;
+}
+
+static ssize_t shiftfs_listxattr(struct dentry *dentry, char *list,
+				 size_t size)
+{
+	struct dentry *real = dentry->d_fsdata;
+	const struct inode_operations *iop = real->d_inode->i_op;
+
+	if (iop->listxattr)
+		return iop->listxattr(real, list, size);
+
+	return -EINVAL;
+}
+
+static int shiftfs_removexattr(struct dentry *dentry, const char *name)
+{
+	struct dentry *real = dentry->d_fsdata;
+	const struct inode_operations *iop = real->d_inode->i_op;
+
+	if (iop->removexattr)
+		return iop->removexattr(real, name);
+
+	return -EINVAL;
+}
+
+static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry)
+{
+	struct inode *reali;
+	struct shiftfs_super_info *ssi = inode->i_sb->s_fs_info;
+
+	if (!dentry)
+		return;
+
+	reali = dentry->d_inode;
+
+	if (!reali->i_op->get_link)
+		inode->i_opflags |= IOP_NOFOLLOW;
+
+	inode->i_mapping = reali->i_mapping;
+	inode->i_private = dentry;
+
+	inode->i_uid = KUIDT_INIT(map_id_down(&ssi->uid_map, __kuid_val(reali->i_uid)));
+	inode->i_gid = KGIDT_INIT(map_id_down(&ssi->gid_map, __kgid_val(reali->i_gid)));
+}
+
+static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
+			       umode_t mode, const char *symlink,
+			       struct dentry *hardlink, bool excl)
+{
+	struct dentry *real = dir->i_private, *new = dentry->d_fsdata;
+	struct inode *reali = real->d_inode, *newi;
+	const struct inode_operations *iop = reali->i_op;
+	int err;
+	const struct cred *oldcred, *newcred;
+	bool op_ok = false;
+
+	if (hardlink) {
+		op_ok = iop->link;
+	} else {
+		switch (mode & S_IFMT) {
+		case S_IFDIR:
+			op_ok = iop->mkdir;
+			break;
+		case S_IFREG:
+			op_ok = iop->create;
+			break;
+		case S_IFLNK:
+			op_ok = iop->symlink;
+		}
+	}
+	if (!op_ok)
+		return -EINVAL;
+
+
+	newi = shiftfs_new_inode(dentry->d_sb, mode, NULL);
+	if (!newi)
+		return -ENOMEM;
+
+	oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+
+	inode_lock_nested(reali, I_MUTEX_PARENT);
+
+	err = -EINVAL;		/* shut gcc up about uninit var */
+	if (hardlink) {
+		struct dentry *realhardlink = hardlink->d_fsdata;
+
+		err = vfs_link(realhardlink, reali, new, NULL);
+	} else {
+		switch (mode & S_IFMT) {
+		case S_IFDIR:
+			err = vfs_mkdir(reali, new, mode);
+			break;
+		case S_IFREG:
+			err = vfs_create(reali, new, mode, excl);
+			break;
+		case S_IFLNK:
+			err = vfs_symlink(reali, new, symlink);
+		}
+	}
+
+	shiftfs_old_creds(oldcred, &newcred);
+
+	if (err)
+		goto out_dput;
+
+	shiftfs_fill_inode(newi, new);
+
+	d_instantiate(dentry, newi);
+
+	new = NULL;
+	newi = NULL;
+
+ out_dput:
+	dput(new);
+	iput(newi);
+	inode_unlock(reali);
+
+	return err;
+}
+
+static int shiftfs_create(struct inode *dir, struct dentry *dentry,
+			  umode_t mode,  bool excl)
+{
+	mode |= S_IFREG;
+
+	return shiftfs_make_object(dir, dentry, mode, NULL, NULL, excl);
+}
+
+static int shiftfs_mkdir(struct inode *dir, struct dentry *dentry,
+			 umode_t mode)
+{
+	mode |= S_IFDIR;
+
+	return shiftfs_make_object(dir, dentry, mode, NULL, NULL, false);
+}
+
+static int shiftfs_link(struct dentry *hardlink, struct inode *dir,
+			struct dentry *dentry)
+{
+	return shiftfs_make_object(dir, dentry, 0, NULL, hardlink, false);
+}
+
+static int shiftfs_symlink(struct inode *dir, struct dentry *dentry,
+			   const char *symlink)
+{
+	return shiftfs_make_object(dir, dentry, S_IFLNK, symlink, NULL, false);
+}
+
+static int shiftfs_rm(struct inode *dir, struct dentry *dentry, bool rmdir)
+{
+	struct dentry *real = dir->i_private, *new = dentry->d_fsdata;
+	struct inode *reali = real->d_inode;
+	int err;
+	const struct cred *oldcred, *newcred;
+
+	inode_lock_nested(reali, I_MUTEX_PARENT);
+
+	oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+
+	if (rmdir)
+		err = vfs_rmdir(reali, new);
+	else
+		err = vfs_unlink(reali, new, NULL);
+
+	shiftfs_old_creds(oldcred, &newcred);
+	inode_unlock(reali);
+
+	return err;
+}
+
+static int shiftfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+	return shiftfs_rm(dir, dentry, false);
+}
+
+static int shiftfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	return shiftfs_rm(dir, dentry, true);
+}
+
+static int shiftfs_rename2(struct inode *olddir, struct dentry *old,
+			   struct inode *newdir, struct dentry *new,
+			   unsigned int flags)
+{
+	struct dentry *rodd = olddir->i_private, *rndd = newdir->i_private,
+		*realold = old->d_fsdata,
+		*realnew = new->d_fsdata, *trap;
+	struct inode *realolddir = rodd->d_inode, *realnewdir = rndd->d_inode;
+	int err = -EINVAL;
+	const struct cred *oldcred, *newcred;
+
+	trap = lock_rename(rndd, rodd);
+
+	if (trap == realold || trap == realnew)
+		goto out_unlock;
+
+	oldcred = shiftfs_new_creds(&newcred, old->d_sb);
+
+	err = vfs_rename(realolddir, realold, realnewdir,
+			 realnew, NULL, flags);
+
+	shiftfs_old_creds(oldcred, &newcred);
+
+ out_unlock:
+	unlock_rename(rndd, rodd);
+
+	return err;
+}
+
+static struct dentry *shiftfs_lookup(struct inode *dir, struct dentry *dentry,
+				     unsigned int flags)
+{
+	struct dentry *real = dir->i_private, *new;
+	struct inode *reali = real->d_inode, *newi;
+	const struct cred *oldcred, *newcred;
+
+	/* note: violation of usual fs rules here: dentries are never
+	 * added with d_add.  This is because we want no dentry cache
+	 * for shiftfs.  All lookups proceed through the dentry cache
+	 * of the underlying filesystem, meaning we always see any
+	 * changes in the underlying */
+
+	inode_lock(reali);
+	oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+	new = lookup_one_len(dentry->d_name.name, real, dentry->d_name.len);
+	shiftfs_old_creds(oldcred, &newcred);
+	inode_unlock(reali);
+
+	if (IS_ERR(new))
+		return new;
+
+	dentry->d_fsdata = new;
+
+	if (!new->d_inode)
+		return NULL;
+
+	newi = shiftfs_new_inode(dentry->d_sb, new->d_inode->i_mode, new);
+	if (!newi) {
+		dput(new);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	d_instantiate(dentry, newi);
+
+	return NULL;
+}
+
+static int shiftfs_permission(struct inode *inode, int mask)
+{
+	struct dentry *real = inode->i_private;
+	struct inode *reali = real->d_inode;
+	const struct inode_operations *iop = reali->i_op;
+	int err;
+	const struct cred *oldcred, *newcred;
+
+	oldcred = shiftfs_new_creds(&newcred, inode->i_sb);
+	if (iop->permission)
+		err = iop->permission(reali, mask);
+	else
+		err = generic_permission(reali, mask);
+	shiftfs_old_creds(oldcred, &newcred);
+
+	return err;
+}
+
+static int shiftfs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct dentry *real = dentry->d_fsdata;
+	struct inode *reali = real->d_inode;
+	const struct inode_operations *iop = reali->i_op;
+	struct iattr newattr = *attr;
+	const struct cred *oldcred, *newcred;
+	struct shiftfs_super_info *ssi = dentry->d_sb->s_fs_info;
+	int err;
+
+	newattr.ia_uid = KUIDT_INIT(map_id_up(&ssi->uid_map, __kuid_val(attr->ia_uid)));
+	newattr.ia_gid = KGIDT_INIT(map_id_up(&ssi->gid_map, __kgid_val(attr->ia_gid)));
+
+	oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+	if (iop->setattr)
+		err = iop->setattr(real, &newattr);
+	else
+		err = simple_setattr(real, &newattr);
+	shiftfs_old_creds(oldcred, &newcred);
+
+	return err;
+}
+
+static int shiftfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			   struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct dentry *real = inode->i_private;
+	struct inode *reali = real->d_inode;
+	const struct inode_operations *iop = reali->i_op;
+	int err = 0;
+
+	mnt = dentry->d_sb->s_fs_info;
+
+	if (iop->getattr)
+		err = iop->getattr(mnt, real, stat);
+	else
+		generic_fillattr(reali, stat);
+
+	if (err)
+		return err;
+
+	stat->uid = inode->i_uid;
+	stat->gid = inode->i_gid;
+	return 0;
+}
+
+struct shiftfs_fop_carrier {
+	struct inode *inode;
+	int (*release)(struct inode *, struct file *);
+	struct file_operations fop;
+};
+
+static int shiftfs_release(struct inode *inode, struct file *file)
+{
+	struct shiftfs_fop_carrier *sfc;
+	int err = 0;
+
+	sfc = container_of(file->f_op, struct shiftfs_fop_carrier, fop);
+
+	if (sfc->release)
+		err = sfc->release(inode, file);
+
+	file->f_inode = sfc->inode;
+	file->f_op = sfc->inode->i_fop;
+	fops_put(inode->i_fop);
+
+	kfree(sfc);
+
+	return err;
+}
+
+static int shiftfs_open(struct inode *inode, struct file *file)
+{
+	struct dentry *real = inode->i_private;
+	struct inode *reali = real->d_inode;
+	const struct file_operations *fop;
+	struct shiftfs_fop_carrier *sfc;
+	int err = 0;
+
+	sfc = kmalloc(sizeof(*sfc), GFP_KERNEL);
+	if (!sfc)
+		return -ENOMEM;
+
+	if (real->d_flags & DCACHE_OP_SELECT_INODE)
+		reali = real->d_op->d_select_inode(real, file->f_flags);
+
+	fop = fops_get(reali->i_fop);
+	sfc->inode = inode;
+	memcpy(&sfc->fop, fop, sizeof(*fop));
+	sfc->release = sfc->fop.release;
+	sfc->fop.release = shiftfs_release;
+
+	file->f_op = &sfc->fop;
+	file->f_inode = reali;
+
+	if (fop->open)
+		err = fop->open(reali, file);
+
+	return err;
+}
+
+static const struct inode_operations shiftfs_inode_ops = {
+	/* intercepted */
+	.lookup		= shiftfs_lookup,
+	.getattr	= shiftfs_getattr,
+	.setattr	= shiftfs_setattr,
+	.permission	= shiftfs_permission,
+
+	/*pass though */
+	.mkdir		= shiftfs_mkdir,
+	.symlink	= shiftfs_symlink,
+	.get_link	= shiftfs_get_link,
+	.readlink	= shiftfs_readlink,
+	.unlink		= shiftfs_unlink,
+	.rmdir		= shiftfs_rmdir,
+	.rename2	= shiftfs_rename2,
+	.link		= shiftfs_link,
+	.create		= shiftfs_create,
+	.mknod		= NULL,	/* no special files currently */
+	.setxattr	= shiftfs_setxattr,
+	.getxattr	= shiftfs_getxattr,
+	.listxattr	= shiftfs_listxattr,
+	.removexattr	= shiftfs_removexattr,
+};
+
+static const struct file_operations shiftfs_file_ops = {
+	.open		= shiftfs_open,
+};
+
+static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode,
+				       struct dentry *dentry)
+{
+	struct inode *inode;
+
+	inode = new_inode(sb);
+	if (!inode)
+		return NULL;
+
+	mode &= S_IFMT;
+
+	inode->i_ino = get_next_ino();
+	inode->i_mode = mode;
+	inode->i_flags |= S_NOATIME | S_NOCMTIME;
+
+	inode->i_op = &shiftfs_inode_ops;
+	inode->i_fop = &shiftfs_file_ops;
+
+	shiftfs_fill_inode(inode, dentry);
+
+	return inode;
+}
+
+static int shiftfs_show_options(struct seq_file *m, struct dentry *dentry)
+{
+	struct super_block *sb = dentry->d_sb;
+	struct shiftfs_super_info *ssi = sb->s_fs_info;
+
+	static const char *options[] = { "uidmap", "gidmap" };
+	const struct uid_gid_map *map[ARRAY_SIZE(options)] =
+		{ &ssi->uid_map, &ssi->gid_map };
+	int i, j;
+
+	for (i = 0; i < ARRAY_SIZE(options); i++) {
+		for (j = 0; j < map[i]->nr_extents; j++) {
+			const struct uid_gid_extent *ext = &map[i]->extent[j];
+
+			seq_show_option(m, options[i], NULL);
+			seq_printf(m, "=%u:%u:%u", ext->first,
+				   ext->lower_first, ext->count);
+		}
+	}
+
+	return 0;
+}
+
+static int shiftfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	struct super_block *sb = dentry->d_sb;
+	struct shiftfs_super_info *ssi = sb->s_fs_info;
+	struct dentry *root = sb->s_root;
+	struct dentry *realroot = root->d_fsdata;
+	struct path realpath = { .mnt = ssi->mnt, .dentry = realroot };
+	int err;
+
+	err = vfs_statfs(&realpath, buf);
+	if (err)
+		return err;
+
+	buf->f_type = sb->s_magic;
+
+	return 0;
+}
+
+static void shiftfs_put_super(struct super_block *sb)
+{
+	struct shiftfs_super_info *ssi = sb->s_fs_info;
+
+	mntput(ssi->mnt);
+	kfree(ssi);
+}
+
+static const struct super_operations shiftfs_super_ops = {
+	.put_super	= shiftfs_put_super,
+	.show_options	= shiftfs_show_options,
+	.statfs		= shiftfs_statfs,
+};
+
+struct shiftfs_data {
+	void *data;
+	const char *path;
+};
+
+static int shiftfs_fill_super(struct super_block *sb, void *raw_data,
+			      int silent)
+{
+	struct shiftfs_data *data = raw_data;
+	char *name = kstrdup(data->path, GFP_KERNEL);
+	int err = -ENOMEM;
+	struct shiftfs_super_info *ssi = NULL;
+	struct path path;
+
+	if (!name)
+		goto out;
+
+	ssi = kzalloc(sizeof(*ssi), GFP_KERNEL);
+	if (!ssi)
+		goto out;
+
+	err = -EPERM;
+	if (!capable(CAP_SYS_ADMIN))
+		goto out;
+
+	err = shiftfs_parse_options(ssi, data->data);
+	if (err)
+		goto out;
+
+	err = kern_path(name, LOOKUP_FOLLOW, &path);
+	if (err)
+		goto out;
+
+	if (!S_ISDIR(path.dentry->d_inode->i_mode)) {
+		err = -ENOTDIR;
+		goto out_put;
+	}
+	ssi->mnt = path.mnt;
+
+	sb->s_fs_info = ssi;
+	sb->s_magic = SHIFTFS_MAGIC;
+	sb->s_op = &shiftfs_super_ops;
+	sb->s_d_op = &shiftfs_dentry_ops;
+	sb->s_root = d_make_root(shiftfs_new_inode(sb, S_IFDIR, path.dentry));
+	sb->s_root->d_fsdata = path.dentry;
+
+	return 0;
+
+ out_put:
+	path_put(&path);
+ out:
+	kfree(name);
+	if (err)
+		kfree(ssi);
+	return err;
+}
+
+static struct dentry *shiftfs_mount(struct file_system_type *fs_type,
+				    int flags, const char *dev_name, void *data)
+{
+	struct shiftfs_data d = { data, dev_name };
+
+	return mount_nodev(fs_type, flags, &d, shiftfs_fill_super);
+}
+
+static struct file_system_type shiftfs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "shiftfs",
+	.mount		= shiftfs_mount,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init shiftfs_init(void)
+{
+	return register_filesystem(&shiftfs_type);
+}
+
+static void __exit shiftfs_exit(void)
+{
+	unregister_filesystem(&shiftfs_type);
+}
+
+MODULE_ALIAS_FS("shiftfs");
+MODULE_AUTHOR("James Bottomley");
+MODULE_DESCRIPTION("uid/gid shifting bind filesystem");
+MODULE_LICENSE("GPL v2");
+module_init(shiftfs_init)
+module_exit(shiftfs_exit)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 0de181a..d7992f5 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -79,4 +79,6 @@
 #define NSFS_MAGIC		0x6e736673
 #define BPF_FS_MAGIC		0xcafe4a11
 
+#define SHIFTFS_MAGIC		0x6a656a62
+
 #endif /* __LINUX_MAGIC_H__ */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-12 19:07 ` [RFC 1/1] shiftfs: uid/gid shifting bind mount James Bottomley
@ 2016-05-16 19:41   ` Serge Hallyn
  2016-05-17  2:28     ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Serge Hallyn @ 2016-05-16 19:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: Djalal Harouni, Chris Mason, tytso, Serge Hallyn, Josh Triplett,
	Eric W. Biederman, Andy Lutomirski, Seth Forshee, linux-fsdevel,
	linux-kernel, linux-security-module, Dongsu Park, David Herrmann,
	Miklos Szeredi, Alban Crequy, Al Viro

Hey James,

I probably did something wrong - but i applied your patch onto 4.6,
compiled in shiftfs, did

mount -t shiftfs -o uidmap=0:100000:65536,gidmap=0:100000:65536 /home/ubuntu /mnt

and ls segfaults and gives me kernel syslog msgs like:


[ 1089.744726] ===============================
[ 1089.748851] [ INFO: suspicious RCU usage. ]
[ 1089.752901] 4.6.0-rc5+ #10 Not tainted
[ 1089.756315] -------------------------------
[ 1089.760021] include/linux/rcupdate.h:569 Illegal context switch in RCU read-side critical section!
[ 1089.767348]
               other info that might help us debug this:

[ 1089.773401]
               rcu_scheduler_active = 1, debug_locks = 0
[ 1089.778417] 1 lock held by ls/3053:
[ 1089.781112]  #0:  (rcu_read_lock){......}, at: [<ffffffff81270907>] path_init+0x667/0x770
[ 1089.787492]
               stack backtrace:
[ 1089.790827] CPU: 0 PID: 3053 Comm: ls Not tainted 4.6.0-rc5+ #10
[ 1089.795304] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1089.801376]  0000000000000286 000000005ed87b3e ffff88007a70bb10 ffffffff8145daa3
[ 1089.807098]  ffff88007a688000 0000000000000001 ffff88007a70bb40 ffffffff810e7587
[ 1089.812793]  0000000000000000 ffffffff81ca8baf 0000000000000184 ffff88007d08f640
[ 1089.818320] Call Trace:
[ 1089.820205]  [<ffffffff8145daa3>] dump_stack+0x85/0xc2
[ 1089.824046]  [<ffffffff810e7587>] lockdep_rcu_suspicious+0xd7/0x110
[ 1089.828871]  [<ffffffff810baf97>] ___might_sleep+0xa7/0x230
[ 1089.833024]  [<ffffffff810bb169>] __might_sleep+0x49/0x80
[ 1089.837118]  [<ffffffff81238109>] kmem_cache_alloc+0x1d9/0x2d0
[ 1089.841725]  [<ffffffff810b667a>] prepare_creds+0x3a/0x130
[ 1089.845827]  [<ffffffff813954a7>] shiftfs_new_creds+0x17/0x120
[ 1089.850170]  [<ffffffff81395cb2>] shiftfs_permission+0x42/0xd0
[ 1089.854507]  [<ffffffff8126d58b>] __inode_permission+0x6b/0xb0
[ 1089.858925]  [<ffffffff8126d5e4>] inode_permission+0x14/0x50
[ 1089.863190]  [<ffffffff812710cd>] link_path_walk+0x7d/0x510
[ 1089.867454]  [<ffffffff812707cb>] ? path_init+0x52b/0x770
[ 1089.871570]  [<ffffffff81270907>] ? path_init+0x667/0x770
[ 1089.875577]  [<ffffffff8127165c>] path_lookupat+0x7c/0x110
[ 1089.879830]  [<ffffffff812732c1>] filename_lookup+0xb1/0x180
[ 1089.883937]  [<ffffffff81272ec6>] ? getname_flags+0x56/0x1f0
[ 1089.888042]  [<ffffffff8110a25d>] ? rcu_read_lock_sched_held+0x6d/0x80
[ 1089.892841]  [<ffffffff81238193>] ? kmem_cache_alloc+0x263/0x2d0
[ 1089.897282]  [<ffffffff81272ee2>] ? getname_flags+0x72/0x1f0
[ 1089.901483]  [<ffffffff81273466>] user_path_at_empty+0x36/0x40
[ 1089.905768]  [<ffffffff81267166>] vfs_fstatat+0x66/0xc0
[ 1089.909596]  [<ffffffff81267761>] SYSC_newlstat+0x31/0x60
[ 1089.913616]  [<ffffffff81202d16>] ? __might_fault+0x96/0xa0
[ 1089.917684]  [<ffffffff81202ccd>] ? __might_fault+0x4d/0xa0
[ 1089.922750]  [<ffffffff810e9879>] ? trace_hardirqs_on_caller+0x129/0x1b0
[ 1089.928605]  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
[ 1089.934347]  [<ffffffff8126789e>] SyS_newlstat+0xe/0x10
[ 1089.939193]  [<ffffffff81904000>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 1089.945045] BUG: sleeping function called from invalid context at mm/slab.h:388
[ 1089.951474] in_atomic(): 1, irqs_disabled(): 0, pid: 3053, name: ls
[ 1089.957214] INFO: lockdep is turned off.
[ 1089.961166] CPU: 0 PID: 3053 Comm: ls Not tainted 4.6.0-rc5+ #10
[ 1089.966739] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1089.973975]  0000000000000286 000000005ed87b3e ffff88007a70bb40 ffffffff8145daa3
[ 1089.980644]  ffff88007a688000 ffffffff81ca8baf ffff88007a70bb68 ffffffff810bb069
[ 1089.987297]  ffffffff81ca8baf 0000000000000184 0000000000000000 ffff88007a70bb90
[ 1089.994180] Call Trace:
[ 1089.997097]  [<ffffffff8145daa3>] dump_stack+0x85/0xc2
[ 1090.002051]  [<ffffffff810bb069>] ___might_sleep+0x179/0x230
[ 1090.007255]  [<ffffffff810bb169>] __might_sleep+0x49/0x80
[ 1090.012290]  [<ffffffff81238109>] kmem_cache_alloc+0x1d9/0x2d0
[ 1090.017679]  [<ffffffff810b667a>] prepare_creds+0x3a/0x130
[ 1090.022736]  [<ffffffff813954a7>] shiftfs_new_creds+0x17/0x120
[ 1090.028090]  [<ffffffff81395cb2>] shiftfs_permission+0x42/0xd0
[ 1090.033454]  [<ffffffff8126d58b>] __inode_permission+0x6b/0xb0
[ 1090.039006]  [<ffffffff8126d5e4>] inode_permission+0x14/0x50
[ 1090.044304]  [<ffffffff812710cd>] link_path_walk+0x7d/0x510
[ 1090.049593]  [<ffffffff812707cb>] ? path_init+0x52b/0x770
[ 1090.054795]  [<ffffffff81270907>] ? path_init+0x667/0x770
[ 1090.059950]  [<ffffffff8127165c>] path_lookupat+0x7c/0x110
[ 1090.065218]  [<ffffffff812732c1>] filename_lookup+0xb1/0x180
[ 1090.070629]  [<ffffffff81272ec6>] ? getname_flags+0x56/0x1f0
[ 1090.076265]  [<ffffffff8110a25d>] ? rcu_read_lock_sched_held+0x6d/0x80
[ 1090.082559]  [<ffffffff81238193>] ? kmem_cache_alloc+0x263/0x2d0
[ 1090.088153]  [<ffffffff81272ee2>] ? getname_flags+0x72/0x1f0
[ 1090.093478]  [<ffffffff81273466>] user_path_at_empty+0x36/0x40
[ 1090.099164]  [<ffffffff81267166>] vfs_fstatat+0x66/0xc0
[ 1090.104236]  [<ffffffff81267761>] SYSC_newlstat+0x31/0x60
[ 1090.109449]  [<ffffffff81202d16>] ? __might_fault+0x96/0xa0
[ 1090.115506]  [<ffffffff81202ccd>] ? __might_fault+0x4d/0xa0
[ 1090.120418]  [<ffffffff810e9879>] ? trace_hardirqs_on_caller+0x129/0x1b0
[ 1090.126325]  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
[ 1090.133230]  [<ffffffff8126789e>] SyS_newlstat+0xe/0x10
[ 1090.138320]  [<ffffffff81904000>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 1090.146513] ------------[ cut here ]------------
[ 1090.151061] kernel BUG at include/linux/fs.h:2574!
[ 1090.155883] invalid opcode: 0000 [#1] SMP
[ 1090.160131] Modules linked in: binfmt_misc veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm irqbypass joydev input_leds serio_raw nls_utf8 isofs i2c_piix4 mac_hid parport_pc parport 8250_fintek pvpanic ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt psmouse
[ 1090.223228]  fb_sys_fops drm pata_acpi floppy
[ 1090.226948] CPU: 0 PID: 3053 Comm: ls Not tainted 4.6.0-rc5+ #10
[ 1090.232806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1090.240377] task: ffff88007a688000 ti: ffff88007a708000 task.ti: ffff88007a708000
[ 1090.247359] RIP: 0010:[<ffffffff81263ef5>]  [<ffffffff81263ef5>] __fput+0x235/0x240
[ 1090.254759] RSP: 0018:ffff88007a70be70  EFLAGS: 00010246
[ 1090.260430] RAX: 0000000000000000 RBX: ffff880035739a00 RCX: 000000000007937c
[ 1090.267476] RDX: 0000000000000001 RSI: ffff88007fddada0 RDI: 0000000000000000
[ 1090.274538] RBP: ffff88007a70bea8 R08: 0000000000000000 R09: ffff8800367ff270
[ 1090.281637] R10: ffff880079d66c10 R11: ffff880035739a10 R12: 0000000040000010
[ 1090.288731] R13: ffff880079d66c10 R14: ffff88007a1b63a0 R15: ffff880050e6b000
[ 1090.295648] FS:  00007fec3f20c800(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1090.303194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1090.308945] CR2: 00007f7fe394c000 CR3: 000000007a72e000 CR4: 00000000000006f0
[ 1090.315954] Stack:
[ 1090.318947]  ffff880079d66c10 ffff880035739a10 ffffffff822ebab0 ffff88007a688710
[ 1090.326268]  ffff88007a688000 0000000000000000 ffff88007a688000 ffff88007a70beb8
[ 1090.333392]  ffffffff81263f3e ffff88007a70bee8 ffffffff810b2153 0000000000000002
[ 1090.340618] Call Trace:
[ 1090.343863]  [<ffffffff81263f3e>] ____fput+0xe/0x10
[ 1090.349178]  [<ffffffff810b2153>] task_work_run+0x73/0xa0
[ 1090.354941]  [<ffffffff810032bc>] exit_to_usermode_loop+0xcc/0xd0
[ 1090.361297]  [<ffffffff81003f0c>] syscall_return_slowpath+0xcc/0xe0
[ 1090.367735]  [<ffffffff8190409c>] entry_SYSCALL_64_fastpath+0xbf/0xc1
[ 1090.374412] Code: 00 e9 be fe ff ff 48 8b 43 28 48 8b 80 80 00 00 00 48 85 c0 0f 84 bf fe ff ff 31 d2 48 89 de bf ff ff ff ff ff d0 e9 ae fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 ff 48 87 3d
[ 1090.394163] RIP  [<ffffffff81263ef5>] __fput+0x235/0x240
[ 1090.399624]  RSP <ffff88007a70be70>
[ 1090.406515] ---[ end trace 909301922855c45e ]---
[ 1121.390946] audit: type=1400 audit(1463427449.647:19): apparmor="STATUS" operation="profile_load" name="lxd-x1_</var/lib/lxd>" pid=3076 comm="apparmor_parser"
[ 1121.427553] lxdbr0: port 1(vethBUS8OC) entered blocking state
[ 1121.432842] lxdbr0: port 1(vethBUS8OC) entered disabled state
[ 1121.439138] device vethBUS8OC entered promiscuous mode
[ 1121.449963] IPv6: ADDRCONF(NETDEV_UP): vethBUS8OC: link is not ready
[ 1121.494963] eth0: renamed from vethVNDWLE
[ 1121.502817] IPv6: ADDRCONF(NETDEV_CHANGE): vethBUS8OC: link becomes ready
[ 1121.512573] lxdbr0: port 1(vethBUS8OC) entered blocking state
[ 1121.518224] lxdbr0: port 1(vethBUS8OC) entered forwarding state
[ 1125.274210] BUG: sleeping function called from invalid context at mm/slab.h:388
[ 1125.280904] in_atomic(): 1, irqs_disabled(): 0, pid: 3760, name: ls
[ 1125.286508] INFO: lockdep is turned off.
[ 1125.290856] CPU: 0 PID: 3760 Comm: ls Tainted: G      D         4.6.0-rc5+ #10
[ 1125.298026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1125.305921]  0000000000000286 00000000323611df ffff88003099bb20 ffffffff8145daa3
[ 1125.313356]  ffff88002f1fe500 ffffffff81ca8baf ffff88003099bb48 ffffffff810bb069
[ 1125.320806]  ffffffff81ca8baf 0000000000000184 0000000000000000 ffff88003099bb70
[ 1125.328228] Call Trace:
[ 1125.331545]  [<ffffffff8145daa3>] dump_stack+0x85/0xc2
[ 1125.336984]  [<ffffffff810bb069>] ___might_sleep+0x179/0x230
[ 1125.342816]  [<ffffffff810bb169>] __might_sleep+0x49/0x80
[ 1125.348595]  [<ffffffff81238109>] kmem_cache_alloc+0x1d9/0x2d0
[ 1125.354678]  [<ffffffff810b667a>] prepare_creds+0x3a/0x130
[ 1125.360259]  [<ffffffff813954a7>] shiftfs_new_creds+0x17/0x120
[ 1125.366258]  [<ffffffff81395cb2>] shiftfs_permission+0x42/0xd0
[ 1125.372281]  [<ffffffff8126d58b>] __inode_permission+0x6b/0xb0
[ 1125.378283]  [<ffffffff8126d5e4>] inode_permission+0x14/0x50
[ 1125.384105]  [<ffffffff812710cd>] link_path_walk+0x7d/0x510
[ 1125.389733]  [<ffffffff812707cb>] ? path_init+0x52b/0x770
[ 1125.395147]  [<ffffffff81270907>] ? path_init+0x667/0x770
[ 1125.400481]  [<ffffffff8127165c>] path_lookupat+0x7c/0x110
[ 1125.405974]  [<ffffffff812732c1>] filename_lookup+0xb1/0x180
[ 1125.411831]  [<ffffffff81238126>] ? kmem_cache_alloc+0x1f6/0x2d0
[ 1125.417833]  [<ffffffff81273466>] user_path_at_empty+0x36/0x40
[ 1125.423601]  [<ffffffff81267166>] vfs_fstatat+0x66/0xc0
[ 1125.428933]  [<ffffffff81267761>] SYSC_newlstat+0x31/0x60
[ 1125.434390]  [<ffffffff81003a68>] ? syscall_trace_enter_phase1+0xc8/0x140
[ 1125.441067]  [<ffffffff8126789e>] SyS_newlstat+0xe/0x10
[ 1125.446541]  [<ffffffff81003f89>] do_syscall_64+0x69/0x160
[ 1125.452315]  [<ffffffff819040c3>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1125.791437] ------------[ cut here ]------------
[ 1125.795754] kernel BUG at include/linux/fs.h:2574!
[ 1125.800529] invalid opcode: 0000 [#2] SMP
[ 1125.804923] Modules linked in: binfmt_misc veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm irqbypass joydev input_leds serio_raw nls_utf8 isofs i2c_piix4 mac_hid parport_pc parport 8250_fintek pvpanic ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt psmouse
[ 1125.871862]  fb_sys_fops drm pata_acpi floppy
[ 1125.875745] CPU: 0 PID: 3760 Comm: ls Tainted: G      D         4.6.0-rc5+ #10
[ 1125.882927] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1125.890945] task: ffff88002f1fe500 ti: ffff880030998000 task.ti: ffff880030998000
[ 1125.898617] RIP: 0010:[<ffffffff81263ef5>]  [<ffffffff81263ef5>] __fput+0x235/0x240
[ 1125.906342] RSP: 0018:ffff88003099be70  EFLAGS: 00010246
[ 1125.912078] RAX: 0000000000000000 RBX: ffff880030846600 RCX: 0000000000085f05
[ 1125.919331] RDX: 0000000000000001 RSI: ffff88007fddada0 RDI: 0000000000000000
[ 1125.926545] RBP: ffff88003099bea8 R08: 0000000000000000 R09: ffff8800770bc2a8
[ 1125.933706] R10: 000000000010000f R11: ffff880030846601 R12: 0000000040000010
[ 1125.940782] R13: ffff880079d66c10 R14: ffff88007990cc60 R15: ffff880050e6b000
[ 1125.947844] FS:  00007f8297abc800(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1125.955772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1125.961908] CR2: 000055918a8d9018 CR3: 00000000309a4000 CR4: 00000000000006f0
[ 1125.969232] Stack:
[ 1125.972341]  ffff880079d66c10 ffff880030846610 ffffffff822ebab0 ffff88002f1fec10
[ 1125.979890]  ffff88002f1fe500 0000000000000000 ffff88002f1fe500 ffff88003099beb8
[ 1125.987279]  ffffffff81263f3e ffff88003099bee8 ffffffff810b2153 0000000000000102
[ 1125.994850] Call Trace:
[ 1125.998345]  [<ffffffff81263f3e>] ____fput+0xe/0x10
[ 1126.003695]  [<ffffffff810b2153>] task_work_run+0x73/0xa0
[ 1126.009377]  [<ffffffff810032bc>] exit_to_usermode_loop+0xcc/0xd0
[ 1126.015880]  [<ffffffff81004000>] do_syscall_64+0xe0/0x160
[ 1126.021848]  [<ffffffff819040c3>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1126.028612] Code: 00 e9 be fe ff ff 48 8b 43 28 48 8b 80 80 00 00 00 48 85 c0 0f 84 bf fe ff ff 31 d2 48 89 de bf ff ff ff ff ff d0 e9 ae fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 ff 48 87 3d
[ 1126.049139] RIP  [<ffffffff81263ef5>] __fput+0x235/0x240
[ 1126.055150]  RSP <ffff88003099be70>
[ 1126.059746] ---[ end trace 909301922855c45f ]---
root@shiftfs:~#

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-16 19:41   ` Serge Hallyn
@ 2016-05-17  2:28     ` James Bottomley
  2016-05-17  3:47       ` Serge E. Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2016-05-17  2:28 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Djalal Harouni, Chris Mason, tytso, Serge Hallyn, Josh Triplett,
	Eric W. Biederman, Andy Lutomirski, Seth Forshee, linux-fsdevel,
	linux-kernel, linux-security-module, Dongsu Park, David Herrmann,
	Miklos Szeredi, Alban Crequy, Al Viro

On Mon, 2016-05-16 at 19:41 +0000, Serge Hallyn wrote:
> Hey James,
> 
> I probably did something wrong - but i applied your patch onto 4.6,
> compiled in shiftfs, did
> 
> mount -t shiftfs -o uidmap=0:100000:65536,gidmap=0:100000:65536
> /home/ubuntu /mnt
> 
> and ls segfaults and gives me kernel syslog msgs like:

Hm, it looks to be something IMA related, since the SUSE default is no
IMA and this BUG in the filesystem is to do with the IMA version of
i_readcount_dec.  I'll recompile my kernel to see if I can reproduce. 
 Just in case, what's the underlying filesystem on /home/ubuntu?

Thanks,

James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-17  2:28     ` James Bottomley
@ 2016-05-17  3:47       ` Serge E. Hallyn
  2016-05-17 10:23         ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Serge E. Hallyn @ 2016-05-17  3:47 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge Hallyn, Djalal Harouni, Chris Mason, tytso, Serge Hallyn,
	Josh Triplett, Eric W. Biederman, Andy Lutomirski, Seth Forshee,
	linux-fsdevel, linux-kernel, linux-security-module, Dongsu Park,
	David Herrmann, Miklos Szeredi, Alban Crequy, Al Viro

On Mon, May 16, 2016 at 10:28:32PM -0400, James Bottomley wrote:
> On Mon, 2016-05-16 at 19:41 +0000, Serge Hallyn wrote:
> > Hey James,
> > 
> > I probably did something wrong - but i applied your patch onto 4.6,
> > compiled in shiftfs, did
> > 
> > mount -t shiftfs -o uidmap=0:100000:65536,gidmap=0:100000:65536
> > /home/ubuntu /mnt
> > 
> > and ls segfaults and gives me kernel syslog msgs like:
> 
> Hm, it looks to be something IMA related, since the SUSE default is no
> IMA and this BUG in the filesystem is to do with the IMA version of
> i_readcount_dec.  I'll recompile my kernel to see if I can reproduce. 
>  Just in case, what's the underlying filesystem on /home/ubuntu?

It was ext4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-17  3:47       ` Serge E. Hallyn
@ 2016-05-17 10:23         ` James Bottomley
  2016-05-17 20:59           ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2016-05-17 10:23 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge Hallyn, Djalal Harouni, Chris Mason, tytso, Serge Hallyn,
	Josh Triplett, Eric W. Biederman, Andy Lutomirski, Seth Forshee,
	linux-fsdevel, linux-kernel, linux-security-module, Dongsu Park,
	David Herrmann, Miklos Szeredi, Alban Crequy, Al Viro

On Mon, 2016-05-16 at 22:47 -0500, Serge E. Hallyn wrote:
> On Mon, May 16, 2016 at 10:28:32PM -0400, James Bottomley wrote:
> > On Mon, 2016-05-16 at 19:41 +0000, Serge Hallyn wrote:
> > > Hey James,
> > > 
> > > I probably did something wrong - but i applied your patch onto
> > > 4.6,
> > > compiled in shiftfs, did
> > > 
> > > mount -t shiftfs -o uidmap=0:100000:65536,gidmap=0:100000:65536
> > > /home/ubuntu /mnt
> > > 
> > > and ls segfaults and gives me kernel syslog msgs like:
> > 
> > Hm, it looks to be something IMA related, since the SUSE default is
> > no
> > IMA and this BUG in the filesystem is to do with the IMA version of
> > i_readcount_dec.  I'll recompile my kernel to see if I can
> > reproduce. 
> >  Just in case, what's the underlying filesystem on /home/ubuntu?
> 
> It was ext4

Thanks.  I've got it to reproduce with CONFIG_IMA set ... just
debugging now.

James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-17 10:23         ` James Bottomley
@ 2016-05-17 20:59           ` James Bottomley
  2016-05-19  2:28             ` Serge E. Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2016-05-17 20:59 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge Hallyn, Djalal Harouni, Chris Mason, tytso, Serge Hallyn,
	Josh Triplett, Eric W. Biederman, Andy Lutomirski, Seth Forshee,
	linux-fsdevel, linux-kernel, linux-security-module, Dongsu Park,
	David Herrmann, Miklos Szeredi, Alban Crequy, Al Viro

On Tue, 2016-05-17 at 06:23 -0400, James Bottomley wrote:
> On Mon, 2016-05-16 at 22:47 -0500, Serge E. Hallyn wrote:
> > On Mon, May 16, 2016 at 10:28:32PM -0400, James Bottomley wrote:
> > > On Mon, 2016-05-16 at 19:41 +0000, Serge Hallyn wrote:
> > > > Hey James,
> > > > 
> > > > I probably did something wrong - but i applied your patch onto
> > > > 4.6,
> > > > compiled in shiftfs, did
> > > > 
> > > > mount -t shiftfs -o uidmap=0:100000:65536,gidmap=0:100000:65536
> > > > /home/ubuntu /mnt
> > > > 
> > > > and ls segfaults and gives me kernel syslog msgs like:
> > > 
> > > Hm, it looks to be something IMA related, since the SUSE default
> > > is
> > > no
> > > IMA and this BUG in the filesystem is to do with the IMA version
> > > of
> > > i_readcount_dec.  I'll recompile my kernel to see if I can
> > > reproduce. 
> > >  Just in case, what's the underlying filesystem on /home/ubuntu?
> > 
> > It was ext4
> 
> Thanks.  I've got it to reproduce with CONFIG_IMA set ... just
> debugging now.

OK, I think this is the fix, can you apply on top of what you have
(it's two fixes, one for the RCU lookup and the other for the IMA
problem).

This probably has to be fixed in the VFS, but at least it will prove
I've got the correct problem and diagnosis.

Thanks,

James

---

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index d352377..2699b95 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -525,6 +525,9 @@ static int shiftfs_permission(struct inode *inode, int mask)
 	int err;
 	const struct cred *oldcred, *newcred;
 
+	if (mask & MAY_NOT_BLOCK)
+		return -ECHILD;
+
 	oldcred = shiftfs_new_creds(&newcred, inode->i_sb);
 	if (iop->permission)
 		err = iop->permission(reali, mask);
@@ -598,6 +601,15 @@ static int shiftfs_release(struct inode *inode, struct file *file)
 	if (sfc->release)
 		err = sfc->release(inode, file);
 
+#ifdef CONFIG_IMA
+	/* FIXME: IMA calls aren't balanced across ->open ->release
+	 * they occur after ->open and after ->release, so manually
+	 * swizzle here */
+
+	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+		i_readcount_dec(sfc->inode);
+#endif
+
 	file->f_inode = sfc->inode;
 	file->f_op = sfc->inode->i_fop;
 	fops_put(inode->i_fop);
@@ -631,6 +643,16 @@ static int shiftfs_open(struct inode *inode, struct file *file)
 	file->f_op = &sfc->fop;
 	file->f_inode = reali;
 
+#ifdef CONFIG_IMA
+	/* FIXME: IMA calls always operate on a saved copy of the
+	 * inode so they increment the above and decrement the
+	 * underlying. fix that here */
+
+	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+		i_readcount_inc(reali);
+#endif
+
+
 	if (fop->open)
 		err = fop->open(reali, file);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-17 20:59           ` James Bottomley
@ 2016-05-19  2:28             ` Serge E. Hallyn
  2016-05-19 10:53               ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Serge E. Hallyn @ 2016-05-19  2:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge E. Hallyn, Serge Hallyn, Djalal Harouni, Chris Mason,
	tytso, Serge Hallyn, Josh Triplett, Eric W. Biederman,
	Andy Lutomirski, Seth Forshee, linux-fsdevel, linux-kernel,
	linux-security-module, Dongsu Park, David Herrmann,
	Miklos Szeredi, Alban Crequy, Al Viro

Hey James,

yeah that's a lot better.  I do still get some syslog messages,
but i was trivially able to bind a shiftfs into a container and
use it the way I'd want.

[  209.452274] ------------[ cut here ]------------
[  209.452296] WARNING: CPU: 0 PID: 3072 at fs/ext4/inode.c:3977 ext4_truncate+0x3f5/0x5b0
[  209.452299] Modules linked in: binfmt_misc veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm irqbypass nls_utf8 isofs joydev input_leds serio_raw i2c_piix4 pvpanic parport_pc 8250_fintek mac_hid parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[  209.452388]  psmouse drm pata_acpi floppy
[  209.452401] CPU: 0 PID: 3072 Comm: bash Not tainted 4.6.0-rc5+ #11
[  209.452404] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  209.452407]  0000000000000286 00000000ccc8425d ffff88007a1cfa98 ffffffff8145dae3
[  209.452412]  0000000000000000 0000000000000000 ffff88007a1cfad8 ffffffff8108c25b
[  209.452416]  00000f897a1cfaf8 ffff880052efe340 ffff88007a1cfbb8 ffff880052efe560
[  209.452421] Call Trace:
[  209.452431]  [<ffffffff8145dae3>] dump_stack+0x85/0xc2
[  209.452437]  [<ffffffff8108c25b>] __warn+0xcb/0xf0
[  209.452440]  [<ffffffff8108c38d>] warn_slowpath_null+0x1d/0x20
[  209.452444]  [<ffffffff81306d45>] ext4_truncate+0x3f5/0x5b0
[  209.452447]  [<ffffffff81309447>] ext4_setattr+0x627/0xa40
[  209.452457]  [<ffffffff813b6483>] ? security_prepare_creds+0x43/0x60
[  209.452468]  [<ffffffff810b63d2>] ? creds_are_invalid.part.1+0x12/0x40
[  209.452478]  [<ffffffff81396491>] shiftfs_setattr+0x181/0x202
[  209.452492]  [<ffffffff812831f5>] notify_change+0x235/0x360
[  209.452500]  [<ffffffff8125f057>] do_truncate+0x77/0xc0
[  209.452505]  [<ffffffff81271959>] path_openat+0x269/0x1350
[  209.452509]  [<ffffffff81273f01>] do_filp_open+0x91/0x100
[  209.452517]  [<ffffffff819036d7>] ? _raw_spin_unlock+0x27/0x40
[  209.452522]  [<ffffffff81284799>] ? __alloc_fd+0xf9/0x210
[  209.452526]  [<ffffffff81260654>] do_sys_open+0x124/0x210
[  209.452529]  [<ffffffff8126075e>] SyS_open+0x1e/0x20
[  209.452534]  [<ffffffff81003f89>] do_syscall_64+0x69/0x160
[  209.452537]  [<ffffffff81904103>] entry_SYSCALL64_slow_path+0x25/0x25
[  209.452541] ---[ end trace b995e24e590f8b85 ]---
[  209.452790] ------------[ cut here ]------------
[  209.452800] WARNING: CPU: 0 PID: 3072 at fs/ext4/namei.c:2778 ext4_orphan_add+0x11a/0x290
[  209.452803] Modules linked in: binfmt_misc veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm irqbypass nls_utf8 isofs joydev input_leds serio_raw i2c_piix4 pvpanic parport_pc 8250_fintek mac_hid parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[  209.452896]  psmouse drm pata_acpi floppy
[  209.452903] CPU: 0 PID: 3072 Comm: bash Tainted: G        W       4.6.0-rc5+ #11
[  209.452905] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  209.452907]  0000000000000286 00000000ccc8425d ffff88007a1cfa30 ffffffff8145dae3
[  209.452912]  0000000000000000 0000000000000000 ffff88007a1cfa70 ffffffff8108c25b
[  209.452917]  00000ada00000008 ffff880052efe340 ffff88007c3ba0c0 ffff880036806000
[  209.452921] Call Trace:
[  209.452925]  [<ffffffff8145dae3>] dump_stack+0x85/0xc2
[  209.452929]  [<ffffffff8108c25b>] __warn+0xcb/0xf0
[  209.452933]  [<ffffffff8108c38d>] warn_slowpath_null+0x1d/0x20
[  209.452936]  [<ffffffff813126ca>] ext4_orphan_add+0x11a/0x290
[  209.452940]  [<ffffffff81306a9e>] ? ext4_truncate+0x14e/0x5b0
[  209.452948]  [<ffffffff81338b98>] ? __ext4_journal_start_sb+0x88/0x1f0
[  209.452953]  [<ffffffff81306ad1>] ext4_truncate+0x181/0x5b0
[  209.452956]  [<ffffffff81309447>] ext4_setattr+0x627/0xa40
[  209.452960]  [<ffffffff813b6483>] ? security_prepare_creds+0x43/0x60
[  209.452964]  [<ffffffff810b63d2>] ? creds_are_invalid.part.1+0x12/0x40
[  209.452967]  [<ffffffff81396491>] shiftfs_setattr+0x181/0x202
[  209.452971]  [<ffffffff812831f5>] notify_change+0x235/0x360
[  209.452975]  [<ffffffff8125f057>] do_truncate+0x77/0xc0
[  209.452978]  [<ffffffff81271959>] path_openat+0x269/0x1350
[  209.452982]  [<ffffffff81273f01>] do_filp_open+0x91/0x100
[  209.452986]  [<ffffffff819036d7>] ? _raw_spin_unlock+0x27/0x40
[  209.452989]  [<ffffffff81284799>] ? __alloc_fd+0xf9/0x210
[  209.452993]  [<ffffffff81260654>] do_sys_open+0x124/0x210
[  209.452997]  [<ffffffff8126075e>] SyS_open+0x1e/0x20
[  209.453001]  [<ffffffff81003f89>] do_syscall_64+0x69/0x160
[  209.453004]  [<ffffffff81904103>] entry_SYSCALL64_slow_path+0x25/0x25
[  209.453007] ---[ end trace b995e24e590f8b86 ]---
[  209.453541] ------------[ cut here ]------------
[  209.453548] WARNING: CPU: 0 PID: 3072 at fs/ext4/namei.c:2860 ext4_orphan_del+0x18c/0x2a0
[  209.453550] Modules linked in: binfmt_misc veth ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev kvm_intel kvm irqbypass nls_utf8 isofs joydev input_leds serio_raw i2c_piix4 pvpanic parport_pc 8250_fintek mac_hid parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[  209.453625]  psmouse drm pata_acpi floppy
[  209.453632] CPU: 0 PID: 3072 Comm: bash Tainted: G        W       4.6.0-rc5+ #11
[  209.453635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  209.453637]  0000000000000286 00000000ccc8425d ffff88007a1cfa18 ffffffff8145dae3
[  209.453641]  0000000000000000 0000000000000000 ffff88007a1cfa58 ffffffff8108c25b
[  209.453646]  00000b2c8103fca9 ffff880052efe340 ffff88007c3ba0c0 ffff88007c3ba0c0
[  209.453650] Call Trace:
[  209.453655]  [<ffffffff8145dae3>] dump_stack+0x85/0xc2
[  209.453658]  [<ffffffff8108c25b>] __warn+0xcb/0xf0
[  209.453662]  [<ffffffff8108c38d>] warn_slowpath_null+0x1d/0x20
[  209.453665]  [<ffffffff81313d0c>] ext4_orphan_del+0x18c/0x2a0
[  209.453668]  [<ffffffff81903cf7>] ? _raw_write_unlock+0x27/0x40
[  209.453673]  [<ffffffff81306d72>] ext4_truncate+0x422/0x5b0
[  209.453692]  [<ffffffff81309447>] ext4_setattr+0x627/0xa40
[  209.453697]  [<ffffffff813b6483>] ? security_prepare_creds+0x43/0x60
[  209.453701]  [<ffffffff810b63d2>] ? creds_are_invalid.part.1+0x12/0x40
[  209.453705]  [<ffffffff81396491>] shiftfs_setattr+0x181/0x202
[  209.453709]  [<ffffffff812831f5>] notify_change+0x235/0x360
[  209.453712]  [<ffffffff8125f057>] do_truncate+0x77/0xc0
[  209.453716]  [<ffffffff81271959>] path_openat+0x269/0x1350
[  209.453720]  [<ffffffff81273f01>] do_filp_open+0x91/0x100
[  209.453724]  [<ffffffff819036d7>] ? _raw_spin_unlock+0x27/0x40
[  209.453727]  [<ffffffff81284799>] ? __alloc_fd+0xf9/0x210
[  209.453731]  [<ffffffff81260654>] do_sys_open+0x124/0x210
[  209.453734]  [<ffffffff8126075e>] SyS_open+0x1e/0x20
[  209.453738]  [<ffffffff81003f89>] do_syscall_64+0x69/0x160
[  209.453741]  [<ffffffff81904103>] entry_SYSCALL64_slow_path+0x25/0x25
[  209.453745] ---[ end trace b995e24e590f8b87 ]---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
  2016-05-19  2:28             ` Serge E. Hallyn
@ 2016-05-19 10:53               ` James Bottomley
  0 siblings, 0 replies; 9+ messages in thread
From: James Bottomley @ 2016-05-19 10:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge Hallyn, Djalal Harouni, Chris Mason, tytso, Serge Hallyn,
	Josh Triplett, Eric W. Biederman, Andy Lutomirski, Seth Forshee,
	linux-fsdevel, linux-kernel, linux-security-module, Dongsu Park,
	David Herrmann, Miklos Szeredi, Alban Crequy, Al Viro

On Wed, 2016-05-18 at 21:28 -0500, Serge E. Hallyn wrote:
> Hey James,
> 
> yeah that's a lot better.  I do still get some syslog messages,
> but i was trivially able to bind a shiftfs into a container and
> use it the way I'd want.
> 
> [  209.452274] ------------[ cut here ]------------
> [  209.452296] WARNING: CPU: 0 PID: 3072 at fs/ext4/inode.c:3977 
> ext4_truncate+0x3f5/0x5b0

Heh, I really need to test with ext4; it seems much more careful.  XFS
doesn't warn on any of this.  These are both inode locking problems
with setattr.  It also looks like I'd have the same problem with
setxattr and removexattr.  Does this additional patch allow you to
operate without any warnings?

There's also something else you'll be running into soon: the xattr
calls aren't uid shifted.  I was a bit worried about how to do this
without leaking root attribute setting capability, but I'll think a bit
more carefully about how to do it.

Thanks,

James

---

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index d352377..29f343f 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -240,14 +240,17 @@ static int shiftfs_setxattr(struct dentry *dentry, const char *name,
 			    const void *value, size_t size, int flags)
 {
 	struct dentry *real = dentry->d_fsdata;
-	const struct inode_operations *iop = real->d_inode->i_op;
+	struct inode *reali = real->d_inode;
+	const struct inode_operations *iop = reali->i_op;
 	int err = -EOPNOTSUPP;
 
 	if (iop->setxattr) {
 		const struct cred *oldcred, *newcred;
 
 		oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
+		inode_lock(reali);
 		err = iop->setxattr(real, name, value, size, flags);
+		inode_unlock(reali);
 		shiftfs_old_creds(oldcred, &newcred);
 	}
 
@@ -287,12 +290,17 @@ static ssize_t shiftfs_listxattr(struct dentry *dentry, char *list,
 static int shiftfs_removexattr(struct dentry *dentry, const char *name)
 {
 	struct dentry *real = dentry->d_fsdata;
-	const struct inode_operations *iop = real->d_inode->i_op;
+	struct inode *reali = real->d_inode;
+	const struct inode_operations *iop = reali->i_op;
+	int err = -EINVAL;
 
-	if (iop->removexattr)
-		return iop->removexattr(real, name);
+	if (iop->removexattr) {
+		inode_lock(reali);
+		err = iop->removexattr(real, name);
+		inode_unlock(reali);
+	}
 
-	return -EINVAL;
+	return err;
 }
 
 static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry)
@@ -548,11 +556,13 @@ static int shiftfs_setattr(struct dentry *dentry, struct iattr *attr)
 	newattr.ia_uid = KUIDT_INIT(map_id_up(&ssi->uid_map, __kuid_val(attr->ia_uid)));
 	newattr.ia_gid = KGIDT_INIT(map_id_up(&ssi->gid_map, __kgid_val(attr->ia_gid)));
 
+	inode_lock(reali);
 	oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
 	if (iop->setattr)
 		err = iop->setattr(real, &newattr);
 	else
 		err = simple_setattr(real, &newattr);
+	inode_unlock(reali);
 	shiftfs_old_creds(oldcred, &newcred);
 
 	return err;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-05-19 10:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12 19:06 [RFC 0/1] shiftfs: uid/gid shifting filesystem James Bottomley
2016-05-12 19:07 ` [RFC 1/1] shiftfs: uid/gid shifting bind mount James Bottomley
2016-05-16 19:41   ` Serge Hallyn
2016-05-17  2:28     ` James Bottomley
2016-05-17  3:47       ` Serge E. Hallyn
2016-05-17 10:23         ` James Bottomley
2016-05-17 20:59           ` James Bottomley
2016-05-19  2:28             ` Serge E. Hallyn
2016-05-19 10:53               ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).