All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] overlay: support idmapped layers
@ 2022-03-29 10:35 Christian Brauner
  2022-03-29 10:35 ` [PATCH 01/18] fs: add two trivial lookup helpers Christian Brauner
                   ` (20 more replies)
  0 siblings, 21 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner (Microsoft),
	linux-unionfs, Christoph Hellwig, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber

From: "Christian Brauner (Microsoft)" <brauner@kernel.org>

Hey,

This adds support for mounting overlay on top of idmapped layers.

I have to start by saying a massive thank you to Amir! He did not just
answer my constant overlay questions but also provided quite a few
patches himself in this series in addition to reviews, comments and a
lot of suggestions. Thank you!

There have been a lot of requests to unblock this. For just a few select
examples see [3], [4], and [5]. I've worked closely with various
communities among them containerd, Kubernetes, Podman, LXD, runC, crun,
and systemd (For the curious please see the various pull-request and
issues below.) a lot of them already support idmapped mounts since they
are enabled for btrfs, ext4, and xfs (and f2fs and fat fwiw). In
additon, a few colleagues at Microsoft and from Red Hat work on a
Kubernetes Enhancement Proposals (KEP) that also relies on overlayfs
supporting idmapped layers, see [12].

Overlayfs on top of idmapped layers will be used in various ways:

* Container managers use overlayfs to efficiently share layers between
  containers. However, this only works for privileged containers.
  Layers cannot be shared if both privileged and unprivileged containers
  are used. Layers can also not be shared if non-overlapping idmappings
  are used for unprivileged containers. Layers cannot be shared because
  of the conflicting ownership requirements between the containers.

  Both the KEP proposal (see [13]) and LXD (see [14]) use
  non-overlapping idmappings to increase isolation.

  All of these cases can be supported if overlayfs can be mounted on
  idmapped layers. The container runtime will create idmapped mounts for
  the layers supposed to be shared. Each container will be given
  idmapped mounts with the correct idmapping. Either by attaching a
  custom or its own user namespace depending on the use-case. Then an
  overlay mount can be mounted on top of the idmapped layers. The
  underlying idmapped mounts can then be unmounted and the mount table
  will be in a clean state.
  This approach has been tested an verified by Giuseppe and others.

* Because of the inability to share layers preparing a layer causes a
  big runtime overhead as ownership needs to be recursively changed.
  This becomes increasingly costly with the bigger the layers are.
  Especially for a large rootfs it becomes prohibitively expensive.

  This recursive ownership change also causes a lot storage overhead.
  Without metacopy it can quickly become unmanagable. With metacopy it
  still wastes a lot of space and inodes.

  For both the KEP and container manager being able to use idmapped
  layer with overlayfs on top of it solves all these problems.

* Container managers such as LXD do run full system containers. Such
  systems are managed like virtual machines and users run application
  containers inside. Since LXD uses an idmapped mounts for the rootfs of
  the container with the container's user namespace attached to the
  mount container runtimes cannot user overlayfs inside.

  Once overlayfs can be mounted on top of idmapped layers it will be
  possible for container runtimes to use overlayfs inside LXD
  containers.

* The systemd-homed daemon and toolsuite is a new component of systemd
  to manage home areas in a portable way. systemd-homed makes use of
  idmapped mounts for the home areas. If the kernel and used file system
  support it the user's home area will be mounted as an idmapped mount
  when they log into their system.

  All files are internally owned by the "nobody" user (i.e. the user
  typically used for indicating "this ownership is not mapped"), and
  dynamically mapped to the {g,u}id used locally on the system via
  idmapped mounts. This makes migrating home areas between different
  systems trivial because recursively chown()ing file system trees is no
  longer necessary. This also means that it is impossible to store files
  as {g,u}id 0 on disk.

  Once overlayfs can be mounted on top of idmapped layers it will be
  possible for container runtimes to work better together with
  systemd-homed.

* systemd provides various sandboxing features (see [11]) for services.
  These serve as hardening measures. In this context, idmapped mounts
  can be used to prevent services from creating files on disk as {g,u}id
  0, making specific files inaccessible, or to prevent access to whole
  directories. Since such services may also make use of overlayfs for
  e.g. the ExtensionImages= option supporting overlayfs on top of
  idmapped layers would be another huge hardening win.

* systemd provides the ability to use system extension images [15] for
  /usr and /opt (with a look to /etc in the future). Such system
  extension images contain files and directories similar in fashion to
  a regular OS system tree. When one or more system extension images are
  activated, their /usr/ and /opt/ hierarchies are combined via
  overlay with the same hierarchies of the host OS, and the host /usr/
  and /opt/ overmounted with it ("merging"). These images are read-only.

  This feature is available to unprivileged container and sandboxed
  services as well. Idmapped layers are used here to avoid runtime and
  storage overhead from recursively changing ownership and ultimately an
  overlay mount is supposed to be created on top of it.

Giuseppe provided testing for runC/crun/Podman and he put up a pull
request to support overlayfs on top of idmapped layers at [16]. That
already covers a lot of users. Other tools have put up pull requests as
well and they are linked below.

The patchset has been extensively tested for about 2 weeks with
xfstests. The tests and results are explained in the following
paragraphs.

In order to test overlayfs with idmapped mounts a simple patch to
xfstests has been added which is part of this series. The patchset
simply allows for each test in the xfstests suite to be run on top of
idmapped mounts. That is in addition to the generic idmapped mount tests
that have existed and are already run for a long time.

Since idmapped mounts can be created on top of btrfs, ext4, and xfs and
these are the most relevant filesystems for users they were taken into
the test matrix.

Amir ensured that the test matrix also includes metacopy. So all tests
are run once with metacopy=on and once with metacopy=off.

Additionally, the unionmount tesuite that Amir maintains was run as part
of xfstests. This brings testing for multi layer overlay and a few
rarely used overlayfs use-cases.

And last, xfstests were run with and without idmapped mounts.

Since the patch series is based on Linux 5.17 the 5.17 kernel was chosen
as a baseline kernel. The baseline kernel is pure 5.17 upstream without
any of the patches in this series. The baseline kernel was used to run
all xfstests with the same parameters minus the idmapped mount part of
the test matrix. This ensured that regressions would be immediately
noticeable.

So the full test matrix is:

ext4 x metacopy=off x -idmapped mounts
ext4 x metacopy=on  x -idmapped mounts
ext4 x metacopy=off x +idmapped mounts
ext4 x metacopy=on  x +idmapped mounts

xfs x metacopy=off x -idmapped mounts
xfs x metacopy=on  x -idmapped mounts
xfs x metacopy=off x +idmapped mounts
xfs x metacopy=on  x +idmapped mounts

btrfs x metacopy=off x -idmapped mounts
btrfs x metacopy=on  x -idmapped mounts
btrfs x metacopy=off x +idmapped mounts
btrfs x metacopy=on  x +idmapped mounts

The test runs were started with:

sudo ./check -overlay

so they encompass all xfstests apart from those that needed to be
excluded because they triggered known issues (One such example is
generic/530 which causes an xfs corruption that is currently under
discussion for a fix upstream.).

A single testrun with over 750 tests takes a bonkers long time given
that I run them in a beefy VM on top of an ssd and not on bare metal.

The successes and failures are identical for the whole test matrix with
and without idmapped mounts. The tests and failures are also identical
compared to the baseline kernel. All in all this should provide a good
amount of convidence. The full test output can be found in the following
repo: https://gitlab.com/brauner/fs.idmapped.overlay.xfstests.output

In order to create an idmapped mount the mount_setattr() system call
(see [17]) can be used together with the MOUNT_ATTR_IDMAP flag to attach
a userns fd to a detached mount created with open_tree()'s
OPEN_TREE_CLONE flag. The only effect is that the idmapping of the
userns becomes the idmapping of the relevant. The links below should
serve to illustrate how widely they are already used.

For people who would like to test I've created a tag fs.idmapped.overlay.v1
which can be fetched:

git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git fs.idmapped.overlay.v1

the same goes for xfstests:

git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git fs.idmapped.overlay.v1

Thanks!
Christian

[1]: OCI runtime-spec extension for idmapped mounts
     https://github.com/opencontainers/runtime-spec/pull/1143

[2]: Support for idmapped mounts for shared volumes
     https://github.com/opencontainers/runc/pull/3429

[3]: containerd support for idmapped mounts with focus on overlayfs
     https://github.com/containerd/containerd/pull/5890

[4]: runC support for idmapped mounts with focus on overlayfs
     https://github.com/opencontainers/runc/issues/2821

[5]: Podman support for idmapped mounts with focus on overlayfs
     https://github.com/containers/podman/issues/10374

[6]: systemd-nspawn support for idmapped mounts
     https://github.com/systemd/systemd/pull/19438

[7]: systemd-homed support for idmapped mounts
     https://github.com/systemd/systemd/pull/21136

[8]: LXD support for idmapped mounts
     https://github.com/lxc/lxd/pull/8778

[9]: LXC support for idmapped mounts
     https://github.com/lxc/lxc/pull/3709

[10]: crun support for idmapped mounts
      https://github.com/containers/crun/pull/874

[11]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing

[12]: https://github.com/kubernetes/enhancements/pull/3065

[13]: https://github.com/kubernetes/enhancements/pull/3065/files#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaR118

[14]: https://github.com/lxc/lxd/blob/master/doc/instances.md
      (The relevant entry can be found by looking for the
       "security.idmap.isolated" key.)

[15]: https://www.freedesktop.org/software/systemd/man/systemd-sysext.html

[16]: https://github.com/containers/storage/pull/1180

[17]: https://man7.org/linux/man-pages/man2/mount_setattr.2.html

Amir Goldstein (3):
  ovl: use wrappers to all vfs_*xattr() calls
  ovl: pass layer mnt to ovl_open_realfile()
  ovl: store lower path in ovl_inode

Christian Brauner (15):
  fs: add two trivial lookup helpers
  exportfs: support idmapped mounts
  ovl: pass ofs to creation operations
  ovl: handle idmappings in creation operations
  ovl: pass ofs to setattr operations
  ovl: use ovl_do_notify_change() wrapper
  ovl: use ovl_lookup_upper() wrapper
  ovl: use ovl_path_getxattr() wrapper
  ovl: handle idmappings for layer fileattrs
  ovl: handle idmappings for layer lookup
  ovl: use ovl_copy_{real,upper}attr() wrappers
  ovl: handle idmappings in ovl_permission()
  ovl: handle idmappings in layer open helpers
  ovl: handle idmappings in ovl_xattr_{g,s}et()
  ovl: support idmapped layers

 fs/exportfs/expfs.c      |   5 +-
 fs/namei.c               |  52 +++++++--
 fs/overlayfs/copy_up.c   |  97 +++++++++-------
 fs/overlayfs/dir.c       | 133 +++++++++++-----------
 fs/overlayfs/export.c    |   3 +-
 fs/overlayfs/file.c      |  44 ++++----
 fs/overlayfs/inode.c     |  63 +++++++----
 fs/overlayfs/namei.c     |  49 ++++++---
 fs/overlayfs/overlayfs.h | 231 ++++++++++++++++++++++++++++-----------
 fs/overlayfs/ovl_entry.h |   7 +-
 fs/overlayfs/readdir.c   |  48 ++++----
 fs/overlayfs/super.c     |  57 +++++-----
 fs/overlayfs/util.c      | 142 +++++++++++++++++++-----
 include/linux/namei.h    |   2 +
 14 files changed, 607 insertions(+), 326 deletions(-)


base-commit: f443e374ae131c168a065ea1748feac6b2e76613
-- 
2.32.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/18] fs: add two trivial lookup helpers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 02/18] exportfs: support idmapped mounts Christian Brauner
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Christoph Hellwig, Miklos Szeredi, Al Viro
  Cc: Christian Brauner, linux-fsdevel, linux-unionfs, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber

Similar to the addition of lookup_one() add a version of
lookup_one_unlocked() and lookup_one_positive_unlocked() that take
idmapped mounts into account. This is required to port overlay to
support idmapped base layers.

Cc: <linux-fsdevel@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/namei.c            | 52 ++++++++++++++++++++++++++++++++++---------
 include/linux/namei.h |  2 ++
 2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3f1829b3ab5b..ca2a490a1f6b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2768,7 +2768,8 @@ struct dentry *lookup_one(struct user_namespace *mnt_userns, const char *name,
 EXPORT_SYMBOL(lookup_one);
 
 /**
- * lookup_one_len_unlocked - filesystem helper to lookup single pathname component
+ * lookup_one_unlocked - filesystem helper to lookup single pathname component
+ * @mnt_userns:	idmapping of the mount the lookup is performed from
  * @name:	pathname component to lookup
  * @base:	base directory to lookup from
  * @len:	maximum length @len should be interpreted to
@@ -2779,14 +2780,15 @@ EXPORT_SYMBOL(lookup_one);
  * Unlike lookup_one_len, it should be called without the parent
  * i_mutex held, and will take the i_mutex itself if necessary.
  */
-struct dentry *lookup_one_len_unlocked(const char *name,
-				       struct dentry *base, int len)
+struct dentry *lookup_one_unlocked(struct user_namespace *mnt_userns,
+				   const char *name, struct dentry *base,
+				   int len)
 {
 	struct qstr this;
 	int err;
 	struct dentry *ret;
 
-	err = lookup_one_common(&init_user_ns, name, base, len, &this);
+	err = lookup_one_common(mnt_userns, name, base, len, &this);
 	if (err)
 		return ERR_PTR(err);
 
@@ -2795,6 +2797,41 @@ struct dentry *lookup_one_len_unlocked(const char *name,
 		ret = lookup_slow(&this, base, 0);
 	return ret;
 }
+EXPORT_SYMBOL(lookup_one_unlocked);
+
+/*
+ * Like lookup_positive_unlocked() but takes a mount's idmapping into account.
+ */
+struct dentry *lookup_one_positive_unlocked(struct user_namespace *mnt_userns,
+					    const char *name,
+					    struct dentry *base, int len)
+{
+	struct dentry *ret = lookup_one_unlocked(mnt_userns, name, base, len);
+	if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
+		dput(ret);
+		ret = ERR_PTR(-ENOENT);
+	}
+	return ret;
+}
+EXPORT_SYMBOL(lookup_one_positive_unlocked);
+
+/**
+ * lookup_one_len_unlocked - filesystem helper to lookup single pathname component
+ * @name:	pathname component to lookup
+ * @base:	base directory to lookup from
+ * @len:	maximum length @len should be interpreted to
+ *
+ * Note that this routine is purely a helper for filesystem usage and should
+ * not be called by generic code.
+ *
+ * Unlike lookup_one_len, it should be called without the parent
+ * i_mutex held, and will take the i_mutex itself if necessary.
+ */
+struct dentry *lookup_one_len_unlocked(const char *name,
+				       struct dentry *base, int len)
+{
+	return lookup_one_unlocked(&init_user_ns, name, base, len);
+}
 EXPORT_SYMBOL(lookup_one_len_unlocked);
 
 /*
@@ -2808,12 +2845,7 @@ EXPORT_SYMBOL(lookup_one_len_unlocked);
 struct dentry *lookup_positive_unlocked(const char *name,
 				       struct dentry *base, int len)
 {
-	struct dentry *ret = lookup_one_len_unlocked(name, base, len);
-	if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
-		dput(ret);
-		ret = ERR_PTR(-ENOENT);
-	}
-	return ret;
+	return lookup_one_positive_unlocked(&init_user_ns, name, base, len);
 }
 EXPORT_SYMBOL(lookup_positive_unlocked);
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index e89329bb3134..759b996b9e1a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -69,6 +69,8 @@ extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
 extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int);
 extern struct dentry *lookup_positive_unlocked(const char *, struct dentry *, int);
 struct dentry *lookup_one(struct user_namespace *, const char *, struct dentry *, int);
+struct dentry *lookup_one_unlocked(struct user_namespace *, const char *, struct dentry *, int);
+struct dentry *lookup_one_positive_unlocked(struct user_namespace *, const char *, struct dentry *, int);
 
 extern int follow_down_one(struct path *);
 extern int follow_down(struct path *);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/18] exportfs: support idmapped mounts
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
  2022-03-29 10:35 ` [PATCH 01/18] fs: add two trivial lookup helpers Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls Christian Brauner
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Christoph Hellwig, Miklos Szeredi, Al Viro
  Cc: Christian Brauner, linux-fsdevel, linux-unionfs, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber, stable

Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.

In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.

Cc: <stable@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/exportfs/expfs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
 	if (err)
 		goto out_err;
 	dprintk("%s: found name: %s\n", __func__, nbuf);
-	tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+	tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
 	if (IS_ERR(tmp)) {
 		dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
 		err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 		}
 
 		inode_lock(target_dir->d_inode);
-		nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+		nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+				     target_dir, strlen(nbuf));
 		if (!IS_ERR(nresult)) {
 			if (unlikely(nresult->d_inode != result->d_inode)) {
 				dput(nresult);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
  2022-03-29 10:35 ` [PATCH 01/18] fs: add two trivial lookup helpers Christian Brauner
  2022-03-29 10:35 ` [PATCH 02/18] exportfs: support idmapped mounts Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 11:18   ` Miklos Szeredi
  2022-03-29 10:35 ` [PATCH 04/18] ovl: pass ofs to creation operations Christian Brauner
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christoph Hellwig, linux-unionfs, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber,
	Christian Brauner

From: Amir Goldstein <amir73il@gmail.com>

Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs
and use helpers ovl_do_*xattr() to access generic xattrs. This is a
preparatory patch for using idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 22 ++++++++++++----------
 fs/overlayfs/dir.c       |  4 +++-
 fs/overlayfs/inode.c     | 13 +++++++------
 fs/overlayfs/namei.c     |  6 +++---
 fs/overlayfs/overlayfs.h | 37 +++++++++++++++++++++++++++----------
 fs/overlayfs/readdir.c   |  4 ++--
 fs/overlayfs/super.c     | 12 ++++++------
 fs/overlayfs/util.c      | 16 ++++++++--------
 8 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index e040970408d4..104de97f85d6 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -47,6 +47,7 @@ static bool ovl_must_copy_xattr(const char *name)
 int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 		   struct dentry *new)
 {
+	struct ovl_fs *ofs = OVL_FS(sb);
 	ssize_t list_size, size, value_size = 0;
 	char *buf, *name, *value = NULL;
 	int error = 0;
@@ -117,7 +118,7 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 			goto retry;
 		}
 
-		error = vfs_setxattr(&init_user_ns, new, name, value, size, 0);
+		error = ovl_do_setxattr(ofs, new, name, value, size, 0);
 		if (error) {
 			if (error != -EOPNOTSUPP || ovl_must_copy_xattr(name))
 				break;
@@ -433,7 +434,7 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper,
 	if (IS_ERR(fh))
 		return PTR_ERR(fh);
 
-	err = ovl_do_setxattr(ofs, index, OVL_XATTR_UPPER, fh->buf, fh->fb.len);
+	err = ovl_setxattr(ofs, index, OVL_XATTR_UPPER, fh->buf, fh->fb.len);
 
 	kfree(fh);
 	return err;
@@ -865,12 +866,13 @@ static bool ovl_need_meta_copy_up(struct dentry *dentry, umode_t mode,
 	return true;
 }
 
-static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value)
+static ssize_t ovl_getxattr_value(struct ovl_fs *ofs, struct dentry *dentry,
+				  char *name, char **value)
 {
 	ssize_t res;
 	char *buf;
 
-	res = vfs_getxattr(&init_user_ns, dentry, name, NULL, 0);
+	res = ovl_do_getxattr(ofs, dentry, name, NULL, 0);
 	if (res == -ENODATA || res == -EOPNOTSUPP)
 		res = 0;
 
@@ -879,7 +881,7 @@ static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value)
 		if (!buf)
 			return -ENOMEM;
 
-		res = vfs_getxattr(&init_user_ns, dentry, name, buf, res);
+		res = ovl_do_getxattr(ofs, dentry, name, buf, res);
 		if (res < 0)
 			kfree(buf);
 		else
@@ -906,8 +908,8 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 		return -EIO;
 
 	if (c->stat.size) {
-		err = cap_size = ovl_getxattr(upperpath.dentry, XATTR_NAME_CAPS,
-					      &capability);
+		err = cap_size = ovl_getxattr_value(ofs, upperpath.dentry,
+						    XATTR_NAME_CAPS, &capability);
 		if (cap_size < 0)
 			goto out;
 	}
@@ -921,14 +923,14 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 	 * don't want that to happen for normal copy-up operation.
 	 */
 	if (capability) {
-		err = vfs_setxattr(&init_user_ns, upperpath.dentry,
-				   XATTR_NAME_CAPS, capability, cap_size, 0);
+		err = ovl_do_setxattr(ofs, upperpath.dentry, XATTR_NAME_CAPS,
+				      capability, cap_size, 0);
 		if (err)
 			goto out_free;
 	}
 
 
-	err = ovl_do_removexattr(ofs, upperpath.dentry, OVL_XATTR_METACOPY);
+	err = ovl_removexattr(ofs, upperpath.dentry, OVL_XATTR_METACOPY);
 	if (err)
 		goto out_free;
 
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index f18490813170..d21e3bbcf082 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -435,6 +435,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 static int ovl_set_upper_acl(struct dentry *upperdentry, const char *name,
 			     const struct posix_acl *acl)
 {
+	struct ovl_fs *ofs = OVL_FS(upperdentry->d_sb);
 	void *buffer;
 	size_t size;
 	int err;
@@ -451,7 +452,8 @@ static int ovl_set_upper_acl(struct dentry *upperdentry, const char *name,
 	if (err < 0)
 		goto out_free;
 
-	err = vfs_setxattr(&init_user_ns, upperdentry, name, buffer, size, XATTR_CREATE);
+	err = ovl_do_setxattr(ofs, upperdentry, name, buffer, size,
+			      XATTR_CREATE);
 out_free:
 	kfree(buffer);
 	return err;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1f36158c7dbe..c51a9dd36cc7 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -342,6 +342,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		  const void *value, size_t size, int flags)
 {
 	int err;
+	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct dentry *upperdentry = ovl_i_dentry_upper(inode);
 	struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry);
 	const struct cred *old_cred;
@@ -368,11 +369,11 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	if (value)
-		err = vfs_setxattr(&init_user_ns, realdentry, name, value, size,
-				   flags);
+		err = ovl_do_setxattr(ofs, realdentry, name, value, size,
+				      flags);
 	else {
 		WARN_ON(flags != XATTR_REPLACE);
-		err = vfs_removexattr(&init_user_ns, realdentry, name);
+		err = ovl_do_removexattr(ofs, realdentry, name);
 	}
 	revert_creds(old_cred);
 
@@ -871,8 +872,8 @@ static int ovl_set_nlink_common(struct dentry *dentry,
 	if (WARN_ON(len >= sizeof(buf)))
 		return -EIO;
 
-	return ovl_do_setxattr(OVL_FS(inode->i_sb), ovl_dentry_upper(dentry),
-			       OVL_XATTR_NLINK, buf, len);
+	return ovl_setxattr(OVL_FS(inode->i_sb), ovl_dentry_upper(dentry),
+			    OVL_XATTR_NLINK, buf, len);
 }
 
 int ovl_set_nlink_upper(struct dentry *dentry)
@@ -897,7 +898,7 @@ unsigned int ovl_get_nlink(struct ovl_fs *ofs, struct dentry *lowerdentry,
 	if (!lowerdentry || !upperdentry || d_inode(lowerdentry)->i_nlink == 1)
 		return fallback;
 
-	err = ovl_do_getxattr(ofs, upperdentry, OVL_XATTR_NLINK,
+	err = ovl_getxattr(ofs, upperdentry, OVL_XATTR_NLINK,
 			      &buf, sizeof(buf) - 1);
 	if (err < 0)
 		goto fail;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 1a9b515fc45d..042f6394a3d5 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -111,7 +111,7 @@ static struct ovl_fh *ovl_get_fh(struct ovl_fs *ofs, struct dentry *dentry,
 	int res, err;
 	struct ovl_fh *fh = NULL;
 
-	res = ovl_do_getxattr(ofs, dentry, ox, NULL, 0);
+	res = ovl_getxattr(ofs, dentry, ox, NULL, 0);
 	if (res < 0) {
 		if (res == -ENODATA || res == -EOPNOTSUPP)
 			return NULL;
@@ -125,7 +125,7 @@ static struct ovl_fh *ovl_get_fh(struct ovl_fs *ofs, struct dentry *dentry,
 	if (!fh)
 		return ERR_PTR(-ENOMEM);
 
-	res = ovl_do_getxattr(ofs, dentry, ox, fh->buf, res);
+	res = ovl_getxattr(ofs, dentry, ox, fh->buf, res);
 	if (res < 0)
 		goto fail;
 
@@ -464,7 +464,7 @@ int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry,
 
 	err = ovl_verify_fh(ofs, dentry, ox, fh);
 	if (set && err == -ENODATA)
-		err = ovl_do_setxattr(ofs, dentry, ox, fh->buf, fh->fb.len);
+		err = ovl_setxattr(ofs, dentry, ox, fh->buf, fh->fb.len);
 	if (err)
 		goto fail;
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 2cd5741c873b..6a53ca0d2c96 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -183,10 +183,9 @@ static inline int ovl_do_symlink(struct inode *dir, struct dentry *dentry,
 }
 
 static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
-				      enum ovl_xattr ox, void *value,
+				      const char *name, void *value,
 				      size_t size)
 {
-	const char *name = ovl_xattr(ofs, ox);
 	int err = vfs_getxattr(&init_user_ns, dentry, name, value, size);
 	int len = (value && err > 0) ? err : 0;
 
@@ -195,26 +194,44 @@ static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
 	return err;
 }
 
+static inline ssize_t ovl_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
+				   enum ovl_xattr ox, void *value,
+				   size_t size)
+{
+	return ovl_do_getxattr(ofs, dentry, ovl_xattr(ofs, ox), value, size);
+}
+
 static inline int ovl_do_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
-				  enum ovl_xattr ox, const void *value,
-				  size_t size)
+				  const char *name, const void *value,
+				  size_t size, int flags)
 {
-	const char *name = ovl_xattr(ofs, ox);
-	int err = vfs_setxattr(&init_user_ns, dentry, name, value, size, 0);
-	pr_debug("setxattr(%pd2, \"%s\", \"%*pE\", %zu, 0) = %i\n",
-		 dentry, name, min((int)size, 48), value, size, err);
+	int err = vfs_setxattr(&init_user_ns, dentry, name, value, size, flags);
+	pr_debug("setxattr(%pd2, \"%s\", \"%*pE\", %zu, %d) = %i\n",
+		 dentry, name, min((int)size, 48), value, size, flags, err);
 	return err;
 }
 
+static inline int ovl_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
+			       enum ovl_xattr ox, const void *value,
+			       size_t size)
+{
+	return ovl_do_setxattr(ofs, dentry, ovl_xattr(ofs, ox), value, size, 0);
+}
+
 static inline int ovl_do_removexattr(struct ovl_fs *ofs, struct dentry *dentry,
-				     enum ovl_xattr ox)
+				     const char *name)
 {
-	const char *name = ovl_xattr(ofs, ox);
 	int err = vfs_removexattr(&init_user_ns, dentry, name);
 	pr_debug("removexattr(%pd2, \"%s\") = %i\n", dentry, name, err);
 	return err;
 }
 
+static inline int ovl_removexattr(struct ovl_fs *ofs, struct dentry *dentry,
+				  enum ovl_xattr ox)
+{
+	return ovl_do_removexattr(ofs, dentry, ovl_xattr(ofs, ox));
+}
+
 static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
 				struct inode *newdir, struct dentry *newdentry,
 				unsigned int flags)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 150fdf3bc68d..c7b542331065 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -623,8 +623,8 @@ static struct ovl_dir_cache *ovl_cache_get_impure(struct path *path)
 		 * Removing the "impure" xattr is best effort.
 		 */
 		if (!ovl_want_write(dentry)) {
-			ovl_do_removexattr(ofs, ovl_dentry_upper(dentry),
-					   OVL_XATTR_IMPURE);
+			ovl_removexattr(ofs, ovl_dentry_upper(dentry),
+					OVL_XATTR_IMPURE);
 			ovl_drop_write(dentry);
 		}
 		ovl_clear_flag(OVL_IMPURE, d_inode(dentry));
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 7bb0a47cb615..f35116db6e94 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -809,13 +809,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 		 * allowed as upper are limited to "normal" ones, where checking
 		 * for the above two errors is sufficient.
 		 */
-		err = vfs_removexattr(&init_user_ns, work,
-				      XATTR_NAME_POSIX_ACL_DEFAULT);
+		err = ovl_do_removexattr(ofs, work,
+					 XATTR_NAME_POSIX_ACL_DEFAULT);
 		if (err && err != -ENODATA && err != -EOPNOTSUPP)
 			goto out_dput;
 
-		err = vfs_removexattr(&init_user_ns, work,
-				      XATTR_NAME_POSIX_ACL_ACCESS);
+		err = ovl_do_removexattr(ofs, work,
+					 XATTR_NAME_POSIX_ACL_ACCESS);
 		if (err && err != -ENODATA && err != -EOPNOTSUPP)
 			goto out_dput;
 
@@ -1411,7 +1411,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 	/*
 	 * Check if upper/work fs supports (trusted|user).overlay.* xattr
 	 */
-	err = ovl_do_setxattr(ofs, ofs->workdir, OVL_XATTR_OPAQUE, "0", 1);
+	err = ovl_setxattr(ofs, ofs->workdir, OVL_XATTR_OPAQUE, "0", 1);
 	if (err) {
 		ofs->noxattr = true;
 		if (ofs->config.index || ofs->config.metacopy) {
@@ -1429,7 +1429,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 		}
 		err = 0;
 	} else {
-		ovl_do_removexattr(ofs, ofs->workdir, OVL_XATTR_OPAQUE);
+		ovl_removexattr(ofs, ofs->workdir, OVL_XATTR_OPAQUE);
 	}
 
 	/*
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f48284a2a896..453551f56467 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -554,7 +554,7 @@ bool ovl_check_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry)
 {
 	int res;
 
-	res = ovl_do_getxattr(ofs, dentry, OVL_XATTR_ORIGIN, NULL, 0);
+	res = ovl_getxattr(ofs, dentry, OVL_XATTR_ORIGIN, NULL, 0);
 
 	/* Zero size value means "copied up but origin unknown" */
 	if (res >= 0)
@@ -572,7 +572,7 @@ bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
 	if (!d_is_dir(dentry))
 		return false;
 
-	res = ovl_do_getxattr(OVL_FS(sb), dentry, ox, &val, 1);
+	res = ovl_getxattr(OVL_FS(sb), dentry, ox, &val, 1);
 	if (res == 1 && val == 'y')
 		return true;
 
@@ -612,7 +612,7 @@ int ovl_check_setxattr(struct ovl_fs *ofs, struct dentry *upperdentry,
 	if (ofs->noxattr)
 		return xerr;
 
-	err = ovl_do_setxattr(ofs, upperdentry, ox, value, size);
+	err = ovl_setxattr(ofs, upperdentry, ox, value, size);
 
 	if (err == -EOPNOTSUPP) {
 		pr_warn("cannot set %s xattr on upper\n", ovl_xattr(ofs, ox));
@@ -652,7 +652,7 @@ void ovl_check_protattr(struct inode *inode, struct dentry *upper)
 	char buf[OVL_PROTATTR_MAX+1];
 	int res, n;
 
-	res = ovl_do_getxattr(ofs, upper, OVL_XATTR_PROTATTR, buf,
+	res = ovl_getxattr(ofs, upper, OVL_XATTR_PROTATTR, buf,
 			      OVL_PROTATTR_MAX);
 	if (res < 0)
 		return;
@@ -708,7 +708,7 @@ int ovl_set_protattr(struct inode *inode, struct dentry *upper,
 		err = ovl_check_setxattr(ofs, upper, OVL_XATTR_PROTATTR,
 					 buf, len, -EPERM);
 	} else if (inode->i_flags & OVL_PROT_I_FLAGS_MASK) {
-		err = ovl_do_removexattr(ofs, upper, OVL_XATTR_PROTATTR);
+		err = ovl_removexattr(ofs, upper, OVL_XATTR_PROTATTR);
 		if (err == -EOPNOTSUPP || err == -ENODATA)
 			err = 0;
 	}
@@ -951,7 +951,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry)
 	if (!S_ISREG(d_inode(dentry)->i_mode))
 		return 0;
 
-	res = ovl_do_getxattr(ofs, dentry, OVL_XATTR_METACOPY, NULL, 0);
+	res = ovl_getxattr(ofs, dentry, OVL_XATTR_METACOPY, NULL, 0);
 	if (res < 0) {
 		if (res == -ENODATA || res == -EOPNOTSUPP)
 			return 0;
@@ -993,7 +993,7 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	int res;
 	char *s, *next, *buf = NULL;
 
-	res = ovl_do_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, NULL, 0);
+	res = ovl_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, NULL, 0);
 	if (res == -ENODATA || res == -EOPNOTSUPP)
 		return NULL;
 	if (res < 0)
@@ -1005,7 +1005,7 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	res = ovl_do_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, buf, res);
+	res = ovl_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, buf, res);
 	if (res < 0)
 		goto fail;
 	if (res == 0)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/18] ovl: pass ofs to creation operations
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (2 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 05/18] ovl: handle idmappings in " Christian Brauner
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Pass down struct ovl_fs to all creation helpers so we can ultimately
retrieve the relevant upper mount and take the mount's idmapping into
account when creating new filesystem objects. This is needed to support
idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/copy_up.c   | 21 +++++-----
 fs/overlayfs/dir.c       | 86 +++++++++++++++++++++-------------------
 fs/overlayfs/overlayfs.h | 54 +++++++++++++++----------
 fs/overlayfs/readdir.c   | 28 +++++++------
 fs/overlayfs/super.c     | 28 +++++++------
 fs/overlayfs/util.c      |  2 +-
 6 files changed, 122 insertions(+), 97 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 104de97f85d6..44605c51a382 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -475,7 +475,7 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
 	if (err)
 		return err;
 
-	temp = ovl_create_temp(indexdir, OVL_CATTR(S_IFDIR | 0));
+	temp = ovl_create_temp(ofs, indexdir, OVL_CATTR(S_IFDIR | 0));
 	err = PTR_ERR(temp);
 	if (IS_ERR(temp))
 		goto free_name;
@@ -488,12 +488,12 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
 	if (IS_ERR(index)) {
 		err = PTR_ERR(index);
 	} else {
-		err = ovl_do_rename(dir, temp, dir, index, 0);
+		err = ovl_do_rename(ofs, dir, temp, dir, index, 0);
 		dput(index);
 	}
 out:
 	if (err)
-		ovl_cleanup(dir, temp);
+		ovl_cleanup(ofs, dir, temp);
 	dput(temp);
 free_name:
 	kfree(name.name);
@@ -520,6 +520,7 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
 	int err;
 	struct dentry *upper;
 	struct dentry *upperdir = ovl_dentry_upper(c->parent);
+	struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
 	struct inode *udir = d_inode(upperdir);
 
 	/* Mark parent "impure" because it may now contain non-pure upper */
@@ -536,7 +537,7 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
 			       c->dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (!IS_ERR(upper)) {
-		err = ovl_do_link(ovl_dentry_upper(c->dentry), udir, upper);
+		err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
 		dput(upper);
 
 		if (!err) {
@@ -657,6 +658,7 @@ static void ovl_revert_cu_creds(struct ovl_cu_creds *cc)
  */
 static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 {
+	struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
 	struct inode *inode;
 	struct inode *udir = d_inode(c->destdir), *wdir = d_inode(c->workdir);
 	struct dentry *temp, *upper;
@@ -678,7 +680,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 	if (err)
 		goto unlock;
 
-	temp = ovl_create_temp(c->workdir, &cattr);
+	temp = ovl_create_temp(ofs, c->workdir, &cattr);
 	ovl_revert_cu_creds(&cc);
 
 	err = PTR_ERR(temp);
@@ -700,7 +702,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 	if (IS_ERR(upper))
 		goto cleanup;
 
-	err = ovl_do_rename(wdir, temp, udir, upper, 0);
+	err = ovl_do_rename(ofs, wdir, temp, udir, upper, 0);
 	dput(upper);
 	if (err)
 		goto cleanup;
@@ -717,7 +719,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 	return err;
 
 cleanup:
-	ovl_cleanup(wdir, temp);
+	ovl_cleanup(ofs, wdir, temp);
 	dput(temp);
 	goto unlock;
 }
@@ -725,6 +727,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 /* Copyup using O_TMPFILE which does not require cross dir locking */
 static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
 {
+	struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
 	struct inode *udir = d_inode(c->destdir);
 	struct dentry *temp, *upper;
 	struct ovl_cu_creds cc;
@@ -734,7 +737,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
 	if (err)
 		return err;
 
-	temp = ovl_do_tmpfile(c->workdir, c->stat.mode);
+	temp = ovl_do_tmpfile(ofs, c->workdir, c->stat.mode);
 	ovl_revert_cu_creds(&cc);
 
 	if (IS_ERR(temp))
@@ -749,7 +752,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
 	upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len);
 	err = PTR_ERR(upper);
 	if (!IS_ERR(upper)) {
-		err = ovl_do_link(temp, udir, upper);
+		err = ovl_do_link(ofs, temp, udir, upper);
 		dput(upper);
 	}
 	inode_unlock(udir);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index d21e3bbcf082..8da72b1ebafc 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -23,15 +23,15 @@ MODULE_PARM_DESC(redirect_max,
 
 static int ovl_set_redirect(struct dentry *dentry, bool samedir);
 
-int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
+int ovl_cleanup(struct ovl_fs *ofs, struct inode *wdir, struct dentry *wdentry)
 {
 	int err;
 
 	dget(wdentry);
 	if (d_is_dir(wdentry))
-		err = ovl_do_rmdir(wdir, wdentry);
+		err = ovl_do_rmdir(ofs, wdir, wdentry);
 	else
-		err = ovl_do_unlink(wdir, wdentry);
+		err = ovl_do_unlink(ofs, wdir, wdentry);
 	dput(wdentry);
 
 	if (err) {
@@ -42,7 +42,7 @@ int ovl_cleanup(struct inode *wdir, struct dentry *wdentry)
 	return err;
 }
 
-struct dentry *ovl_lookup_temp(struct dentry *workdir)
+struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
 {
 	struct dentry *temp;
 	char name[20];
@@ -70,11 +70,11 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
 	struct inode *wdir = workdir->d_inode;
 
 	if (!ofs->whiteout) {
-		whiteout = ovl_lookup_temp(workdir);
+		whiteout = ovl_lookup_temp(ofs, workdir);
 		if (IS_ERR(whiteout))
 			goto out;
 
-		err = ovl_do_whiteout(wdir, whiteout);
+		err = ovl_do_whiteout(ofs, wdir, whiteout);
 		if (err) {
 			dput(whiteout);
 			whiteout = ERR_PTR(err);
@@ -84,11 +84,11 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
 	}
 
 	if (ofs->share_whiteout) {
-		whiteout = ovl_lookup_temp(workdir);
+		whiteout = ovl_lookup_temp(ofs, workdir);
 		if (IS_ERR(whiteout))
 			goto out;
 
-		err = ovl_do_link(ofs->whiteout, wdir, whiteout);
+		err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
 		if (!err)
 			goto out;
 
@@ -122,27 +122,28 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct inode *dir,
 	if (d_is_dir(dentry))
 		flags = RENAME_EXCHANGE;
 
-	err = ovl_do_rename(wdir, whiteout, dir, dentry, flags);
+	err = ovl_do_rename(ofs, wdir, whiteout, dir, dentry, flags);
 	if (err)
 		goto kill_whiteout;
 	if (flags)
-		ovl_cleanup(wdir, dentry);
+		ovl_cleanup(ofs, wdir, dentry);
 
 out:
 	dput(whiteout);
 	return err;
 
 kill_whiteout:
-	ovl_cleanup(wdir, whiteout);
+	ovl_cleanup(ofs, wdir, whiteout);
 	goto out;
 }
 
-int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, umode_t mode)
+int ovl_mkdir_real(struct ovl_fs *ofs, struct inode *dir,
+		   struct dentry **newdentry, umode_t mode)
 {
 	int err;
 	struct dentry *d, *dentry = *newdentry;
 
-	err = ovl_do_mkdir(dir, dentry, mode);
+	err = ovl_do_mkdir(ofs, dir, dentry, mode);
 	if (err)
 		return err;
 
@@ -167,8 +168,8 @@ int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, umode_t mode)
 	return 0;
 }
 
-struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
-			       struct ovl_cattr *attr)
+struct dentry *ovl_create_real(struct ovl_fs *ofs, struct inode *dir,
+			       struct dentry *newdentry, struct ovl_cattr *attr)
 {
 	int err;
 
@@ -180,28 +181,28 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
 		goto out;
 
 	if (attr->hardlink) {
-		err = ovl_do_link(attr->hardlink, dir, newdentry);
+		err = ovl_do_link(ofs, attr->hardlink, dir, newdentry);
 	} else {
 		switch (attr->mode & S_IFMT) {
 		case S_IFREG:
-			err = ovl_do_create(dir, newdentry, attr->mode);
+			err = ovl_do_create(ofs, dir, newdentry, attr->mode);
 			break;
 
 		case S_IFDIR:
 			/* mkdir is special... */
-			err =  ovl_mkdir_real(dir, &newdentry, attr->mode);
+			err =  ovl_mkdir_real(ofs, dir, &newdentry, attr->mode);
 			break;
 
 		case S_IFCHR:
 		case S_IFBLK:
 		case S_IFIFO:
 		case S_IFSOCK:
-			err = ovl_do_mknod(dir, newdentry, attr->mode,
+			err = ovl_do_mknod(ofs, dir, newdentry, attr->mode,
 					   attr->rdev);
 			break;
 
 		case S_IFLNK:
-			err = ovl_do_symlink(dir, newdentry, attr->link);
+			err = ovl_do_symlink(ofs, dir, newdentry, attr->link);
 			break;
 
 		default:
@@ -223,10 +224,11 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
 	return newdentry;
 }
 
-struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr)
+struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
+			       struct ovl_cattr *attr)
 {
-	return ovl_create_real(d_inode(workdir), ovl_lookup_temp(workdir),
-			       attr);
+	return ovl_create_real(ofs, d_inode(workdir),
+			       ovl_lookup_temp(ofs, workdir), attr);
 }
 
 static int ovl_set_opaque_xerr(struct dentry *dentry, struct dentry *upper,
@@ -330,7 +332,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 		attr->mode &= ~current_umask();
 
 	inode_lock_nested(udir, I_MUTEX_PARENT);
-	newdentry = ovl_create_real(udir,
+	newdentry = ovl_create_real(ofs, udir,
 				    lookup_one_len(dentry->d_name.name,
 						   upperdir,
 						   dentry->d_name.len),
@@ -353,7 +355,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 	return err;
 
 out_cleanup:
-	ovl_cleanup(udir, newdentry);
+	ovl_cleanup(ofs, udir, newdentry);
 	dput(newdentry);
 	goto out_unlock;
 }
@@ -361,6 +363,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 static struct dentry *ovl_clear_empty(struct dentry *dentry,
 				      struct list_head *list)
 {
+	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct dentry *workdir = ovl_workdir(dentry);
 	struct inode *wdir = workdir->d_inode;
 	struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
@@ -391,7 +394,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 	if (upper->d_parent->d_inode != udir)
 		goto out_unlock;
 
-	opaquedir = ovl_create_temp(workdir, OVL_CATTR(stat.mode));
+	opaquedir = ovl_create_temp(ofs, workdir, OVL_CATTR(stat.mode));
 	err = PTR_ERR(opaquedir);
 	if (IS_ERR(opaquedir))
 		goto out_unlock;
@@ -410,12 +413,12 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 	if (err)
 		goto out_cleanup;
 
-	err = ovl_do_rename(wdir, opaquedir, udir, upper, RENAME_EXCHANGE);
+	err = ovl_do_rename(ofs, wdir, opaquedir, udir, upper, RENAME_EXCHANGE);
 	if (err)
 		goto out_cleanup;
 
-	ovl_cleanup_whiteouts(upper, list);
-	ovl_cleanup(wdir, upper);
+	ovl_cleanup_whiteouts(ofs, upper, list);
+	ovl_cleanup(ofs, wdir, upper);
 	unlock_rename(workdir, upperdir);
 
 	/* dentry's upper doesn't match now, get rid of it */
@@ -424,7 +427,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 	return opaquedir;
 
 out_cleanup:
-	ovl_cleanup(wdir, opaquedir);
+	ovl_cleanup(ofs, wdir, opaquedir);
 	dput(opaquedir);
 out_unlock:
 	unlock_rename(workdir, upperdir);
@@ -462,6 +465,7 @@ static int ovl_set_upper_acl(struct dentry *upperdentry, const char *name,
 static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 				    struct ovl_cattr *cattr)
 {
+	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct dentry *workdir = ovl_workdir(dentry);
 	struct inode *wdir = workdir->d_inode;
 	struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
@@ -496,7 +500,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 	if (d_is_negative(upper) || !IS_WHITEOUT(d_inode(upper)))
 		goto out_dput;
 
-	newdentry = ovl_create_temp(workdir, cattr);
+	newdentry = ovl_create_temp(ofs, workdir, cattr);
 	err = PTR_ERR(newdentry);
 	if (IS_ERR(newdentry))
 		goto out_dput;
@@ -534,20 +538,20 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 		if (err)
 			goto out_cleanup;
 
-		err = ovl_do_rename(wdir, newdentry, udir, upper,
+		err = ovl_do_rename(ofs, wdir, newdentry, udir, upper,
 				    RENAME_EXCHANGE);
 		if (err)
 			goto out_cleanup;
 
-		ovl_cleanup(wdir, upper);
+		ovl_cleanup(ofs, wdir, upper);
 	} else {
-		err = ovl_do_rename(wdir, newdentry, udir, upper, 0);
+		err = ovl_do_rename(ofs, wdir, newdentry, udir, upper, 0);
 		if (err)
 			goto out_cleanup;
 	}
 	err = ovl_instantiate(dentry, inode, newdentry, hardlink);
 	if (err) {
-		ovl_cleanup(udir, newdentry);
+		ovl_cleanup(ofs, udir, newdentry);
 		dput(newdentry);
 	}
 out_dput:
@@ -562,7 +566,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 	return err;
 
 out_cleanup:
-	ovl_cleanup(wdir, newdentry);
+	ovl_cleanup(ofs, wdir, newdentry);
 	dput(newdentry);
 	goto out_dput;
 }
@@ -802,6 +806,7 @@ static int ovl_remove_and_whiteout(struct dentry *dentry,
 static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
 			    struct list_head *list)
 {
+	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
 	struct inode *dir = upperdir->d_inode;
 	struct dentry *upper;
@@ -828,9 +833,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
 		goto out_dput_upper;
 
 	if (is_dir)
-		err = vfs_rmdir(&init_user_ns, dir, upper);
+		err = ovl_do_rmdir(ofs, dir, upper);
 	else
-		err = vfs_unlink(&init_user_ns, dir, upper, NULL);
+		err = ovl_do_unlink(ofs, dir, upper);
 	ovl_dir_modified(dentry->d_parent, ovl_type_origin(dentry));
 
 	/*
@@ -1097,6 +1102,7 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 	bool samedir = olddir == newdir;
 	struct dentry *opaquedir = NULL;
 	const struct cred *old_cred = NULL;
+	struct ovl_fs *ofs = OVL_FS(old->d_sb);
 	LIST_HEAD(list);
 
 	err = -EINVAL;
@@ -1253,13 +1259,13 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 	if (err)
 		goto out_dput;
 
-	err = ovl_do_rename(old_upperdir->d_inode, olddentry,
+	err = ovl_do_rename(ofs, old_upperdir->d_inode, olddentry,
 			    new_upperdir->d_inode, newdentry, flags);
 	if (err)
 		goto out_dput;
 
 	if (cleanup_whiteout)
-		ovl_cleanup(old_upperdir->d_inode, newdentry);
+		ovl_cleanup(ofs, old_upperdir->d_inode, newdentry);
 
 	if (overwrite && d_inode(new)) {
 		if (new_is_dir)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6a53ca0d2c96..8fae64722eda 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -122,7 +122,8 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
 	return ovl_xattr_table[ox][ofs->config.userxattr];
 }
 
-static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
+static inline int ovl_do_rmdir(struct ovl_fs *ofs,
+			       struct inode *dir, struct dentry *dentry)
 {
 	int err = vfs_rmdir(&init_user_ns, dir, dentry);
 
@@ -130,7 +131,8 @@ static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry)
+static inline int ovl_do_unlink(struct ovl_fs *ofs, struct inode *dir,
+				struct dentry *dentry)
 {
 	int err = vfs_unlink(&init_user_ns, dir, dentry, NULL);
 
@@ -138,8 +140,8 @@ static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir,
-			      struct dentry *new_dentry)
+static inline int ovl_do_link(struct ovl_fs *ofs, struct dentry *old_dentry,
+			      struct inode *dir, struct dentry *new_dentry)
 {
 	int err = vfs_link(old_dentry, &init_user_ns, dir, new_dentry, NULL);
 
@@ -147,7 +149,8 @@ static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir,
 	return err;
 }
 
-static inline int ovl_do_create(struct inode *dir, struct dentry *dentry,
+static inline int ovl_do_create(struct ovl_fs *ofs,
+				struct inode *dir, struct dentry *dentry,
 				umode_t mode)
 {
 	int err = vfs_create(&init_user_ns, dir, dentry, mode, true);
@@ -156,7 +159,8 @@ static inline int ovl_do_create(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static inline int ovl_do_mkdir(struct inode *dir, struct dentry *dentry,
+static inline int ovl_do_mkdir(struct ovl_fs *ofs,
+			       struct inode *dir, struct dentry *dentry,
 			       umode_t mode)
 {
 	int err = vfs_mkdir(&init_user_ns, dir, dentry, mode);
@@ -164,7 +168,8 @@ static inline int ovl_do_mkdir(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static inline int ovl_do_mknod(struct inode *dir, struct dentry *dentry,
+static inline int ovl_do_mknod(struct ovl_fs *ofs,
+			       struct inode *dir, struct dentry *dentry,
 			       umode_t mode, dev_t dev)
 {
 	int err = vfs_mknod(&init_user_ns, dir, dentry, mode, dev);
@@ -173,7 +178,8 @@ static inline int ovl_do_mknod(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static inline int ovl_do_symlink(struct inode *dir, struct dentry *dentry,
+static inline int ovl_do_symlink(struct ovl_fs *ofs,
+				 struct inode *dir, struct dentry *dentry,
 				 const char *oldname)
 {
 	int err = vfs_symlink(&init_user_ns, dir, dentry, oldname);
@@ -232,9 +238,9 @@ static inline int ovl_removexattr(struct ovl_fs *ofs, struct dentry *dentry,
 	return ovl_do_removexattr(ofs, dentry, ovl_xattr(ofs, ox));
 }
 
-static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
-				struct inode *newdir, struct dentry *newdentry,
-				unsigned int flags)
+static inline int ovl_do_rename(struct ovl_fs *ofs, struct inode *olddir,
+				struct dentry *olddentry, struct inode *newdir,
+				struct dentry *newdentry, unsigned int flags)
 {
 	int err;
 	struct renamedata rd = {
@@ -256,14 +262,16 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
 	return err;
 }
 
-static inline int ovl_do_whiteout(struct inode *dir, struct dentry *dentry)
+static inline int ovl_do_whiteout(struct ovl_fs *ofs,
+				  struct inode *dir, struct dentry *dentry)
 {
 	int err = vfs_whiteout(&init_user_ns, dir, dentry);
 	pr_debug("whiteout(%pd2) = %i\n", dentry, err);
 	return err;
 }
 
-static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
+static inline struct dentry *ovl_do_tmpfile(struct ovl_fs *ofs,
+					    struct dentry *dentry, umode_t mode)
 {
 	struct dentry *ret = vfs_tmpfile(&init_user_ns, dentry, mode, 0);
 	int err = PTR_ERR_OR_ZERO(ret);
@@ -478,12 +486,13 @@ static inline int ovl_verify_upper(struct ovl_fs *ofs, struct dentry *index,
 extern const struct file_operations ovl_dir_operations;
 struct file *ovl_dir_real_file(const struct file *file, bool want_upper);
 int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list);
-void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
+void ovl_cleanup_whiteouts(struct ovl_fs *ofs, struct dentry *upper,
+			   struct list_head *list);
 void ovl_cache_free(struct list_head *list);
 void ovl_dir_cache_free(struct inode *inode);
 int ovl_check_d_type_supported(struct path *realpath);
-int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
-			struct dentry *dentry, int level);
+int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
+			struct vfsmount *mnt, struct dentry *dentry, int level);
 int ovl_indexdir_cleanup(struct ovl_fs *ofs);
 
 /*
@@ -587,12 +596,15 @@ struct ovl_cattr {
 
 #define OVL_CATTR(m) (&(struct ovl_cattr) { .mode = (m) })
 
-int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, umode_t mode);
-struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
+int ovl_mkdir_real(struct ovl_fs *ofs, struct inode *dir,
+		   struct dentry **newdentry, umode_t mode);
+struct dentry *ovl_create_real(struct ovl_fs *ofs,
+			       struct inode *dir, struct dentry *newdentry,
+			       struct ovl_cattr *attr);
+int ovl_cleanup(struct ovl_fs *ofs, struct inode *dir, struct dentry *dentry);
+struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir);
+struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
 			       struct ovl_cattr *attr);
-int ovl_cleanup(struct inode *dir, struct dentry *dentry);
-struct dentry *ovl_lookup_temp(struct dentry *workdir);
-struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr);
 
 /* file.c */
 extern const struct file_operations ovl_file_operations;
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index c7b542331065..9c580ef8cd6f 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1001,7 +1001,8 @@ int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
 	return err;
 }
 
-void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list)
+void ovl_cleanup_whiteouts(struct ovl_fs *ofs, struct dentry *upper,
+			   struct list_head *list)
 {
 	struct ovl_cache_entry *p;
 
@@ -1020,7 +1021,7 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list)
 			continue;
 		}
 		if (dentry->d_inode)
-			ovl_cleanup(upper->d_inode, dentry);
+			ovl_cleanup(ofs, upper->d_inode, dentry);
 		dput(dentry);
 	}
 	inode_unlock(upper->d_inode);
@@ -1064,7 +1065,8 @@ int ovl_check_d_type_supported(struct path *realpath)
 
 #define OVL_INCOMPATDIR_NAME "incompat"
 
-static int ovl_workdir_cleanup_recurse(struct path *path, int level)
+static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
+				       int level)
 {
 	int err;
 	struct inode *dir = path->dentry->d_inode;
@@ -1115,7 +1117,7 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
 		if (IS_ERR(dentry))
 			continue;
 		if (dentry->d_inode)
-			err = ovl_workdir_cleanup(dir, path->mnt, dentry, level);
+			err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry, level);
 		dput(dentry);
 		if (err)
 			break;
@@ -1126,24 +1128,24 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
 	return err;
 }
 
-int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
-			 struct dentry *dentry, int level)
+int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir,
+			struct vfsmount *mnt, struct dentry *dentry, int level)
 {
 	int err;
 
 	if (!d_is_dir(dentry) || level > 1) {
-		return ovl_cleanup(dir, dentry);
+		return ovl_cleanup(ofs, dir, dentry);
 	}
 
-	err = ovl_do_rmdir(dir, dentry);
+	err = ovl_do_rmdir(ofs, dir, dentry);
 	if (err) {
 		struct path path = { .mnt = mnt, .dentry = dentry };
 
 		inode_unlock(dir);
-		err = ovl_workdir_cleanup_recurse(&path, level + 1);
+		err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1);
 		inode_lock_nested(dir, I_MUTEX_PARENT);
 		if (!err)
-			err = ovl_cleanup(dir, dentry);
+			err = ovl_cleanup(ofs, dir, dentry);
 	}
 
 	return err;
@@ -1187,7 +1189,7 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
 		}
 		/* Cleanup leftover from index create/cleanup attempt */
 		if (index->d_name.name[0] == '#') {
-			err = ovl_workdir_cleanup(dir, path.mnt, index, 1);
+			err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1);
 			if (err)
 				break;
 			goto next;
@@ -1197,7 +1199,7 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
 			goto next;
 		} else if (err == -ESTALE) {
 			/* Cleanup stale index entries */
-			err = ovl_cleanup(dir, index);
+			err = ovl_cleanup(ofs, dir, index);
 		} else if (err != -ENOENT) {
 			/*
 			 * Abort mount to avoid corrupting the index if
@@ -1213,7 +1215,7 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
 			err = ovl_cleanup_and_whiteout(ofs, dir, index);
 		} else {
 			/* Cleanup orphan index entries */
-			err = ovl_cleanup(dir, index);
+			err = ovl_cleanup(ofs, dir, index);
 		}
 
 		if (err)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index f35116db6e94..f1deb84aebcf 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -778,7 +778,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 				goto out_unlock;
 
 			retried = true;
-			err = ovl_workdir_cleanup(dir, mnt, work, 0);
+			err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0);
 			dput(work);
 			if (err == -EINVAL) {
 				work = ERR_PTR(err);
@@ -787,7 +787,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 			goto retry;
 		}
 
-		err = ovl_mkdir_real(dir, &work, attr.ia_mode);
+		err = ovl_mkdir_real(ofs, dir, &work, attr.ia_mode);
 		if (err)
 			goto out_dput;
 
@@ -1256,8 +1256,9 @@ static int ovl_get_upper(struct super_block *sb, struct ovl_fs *ofs,
  * Returns 1 if RENAME_WHITEOUT is supported, 0 if not supported and
  * negative values if error is encountered.
  */
-static int ovl_check_rename_whiteout(struct dentry *workdir)
+static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
 {
+	struct dentry *workdir = ofs->workdir;
 	struct inode *dir = d_inode(workdir);
 	struct dentry *temp;
 	struct dentry *dest;
@@ -1267,12 +1268,12 @@ static int ovl_check_rename_whiteout(struct dentry *workdir)
 
 	inode_lock_nested(dir, I_MUTEX_PARENT);
 
-	temp = ovl_create_temp(workdir, OVL_CATTR(S_IFREG | 0));
+	temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
 	err = PTR_ERR(temp);
 	if (IS_ERR(temp))
 		goto out_unlock;
 
-	dest = ovl_lookup_temp(workdir);
+	dest = ovl_lookup_temp(ofs, workdir);
 	err = PTR_ERR(dest);
 	if (IS_ERR(dest)) {
 		dput(temp);
@@ -1281,7 +1282,7 @@ static int ovl_check_rename_whiteout(struct dentry *workdir)
 
 	/* Name is inline and stable - using snapshot as a copy helper */
 	take_dentry_name_snapshot(&name, temp);
-	err = ovl_do_rename(dir, temp, dir, dest, RENAME_WHITEOUT);
+	err = ovl_do_rename(ofs, dir, temp, dir, dest, RENAME_WHITEOUT);
 	if (err) {
 		if (err == -EINVAL)
 			err = 0;
@@ -1297,11 +1298,11 @@ static int ovl_check_rename_whiteout(struct dentry *workdir)
 
 	/* Best effort cleanup of whiteout and temp file */
 	if (err)
-		ovl_cleanup(dir, whiteout);
+		ovl_cleanup(ofs, dir, whiteout);
 	dput(whiteout);
 
 cleanup_temp:
-	ovl_cleanup(dir, temp);
+	ovl_cleanup(ofs, dir, temp);
 	release_dentry_name_snapshot(&name);
 	dput(temp);
 	dput(dest);
@@ -1312,7 +1313,8 @@ static int ovl_check_rename_whiteout(struct dentry *workdir)
 	return err;
 }
 
-static struct dentry *ovl_lookup_or_create(struct dentry *parent,
+static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
+					   struct dentry *parent,
 					   const char *name, umode_t mode)
 {
 	size_t len = strlen(name);
@@ -1321,7 +1323,7 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
 	inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
 	child = lookup_one_len(name, parent, len);
 	if (!IS_ERR(child) && !child->d_inode)
-		child = ovl_create_real(parent->d_inode, child,
+		child = ovl_create_real(ofs, parent->d_inode, child,
 					OVL_CATTR(mode));
 	inode_unlock(parent->d_inode);
 	dput(parent);
@@ -1343,7 +1345,7 @@ static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
 	const char *const *name = volatile_path;
 
 	for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
-		d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
+		d = ovl_lookup_or_create(ofs, d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
 		if (IS_ERR(d))
 			return PTR_ERR(d);
 	}
@@ -1391,7 +1393,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 		pr_warn("upper fs needs to support d_type.\n");
 
 	/* Check if upper/work fs supports O_TMPFILE */
-	temp = ovl_do_tmpfile(ofs->workdir, S_IFREG | 0);
+	temp = ovl_do_tmpfile(ofs, ofs->workdir, S_IFREG | 0);
 	ofs->tmpfile = !IS_ERR(temp);
 	if (ofs->tmpfile)
 		dput(temp);
@@ -1400,7 +1402,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 
 
 	/* Check if upper/work fs supports RENAME_WHITEOUT */
-	err = ovl_check_rename_whiteout(ofs->workdir);
+	err = ovl_check_rename_whiteout(ofs);
 	if (err < 0)
 		goto out;
 
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 453551f56467..5a7e5d1e884b 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -834,7 +834,7 @@ static void ovl_cleanup_index(struct dentry *dentry)
 					       dir, index);
 	} else {
 		/* Cleanup orphan index entries */
-		err = ovl_cleanup(dir, index);
+		err = ovl_cleanup(ofs, dir, index);
 	}
 
 	inode_unlock(dir);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/18] ovl: handle idmappings in creation operations
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (3 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 04/18] ovl: pass ofs to creation operations Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 11:22   ` Miklos Szeredi
  2022-03-29 10:35 ` [PATCH 06/18] ovl: pass ofs to setattr operations Christian Brauner
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

When creating objects in the upper layer we need to pass down the upper
idmapping into the respective vfs helpers in order to support idmapped
base layers. The vfs helpers will take care of the rest.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/overlayfs.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 8fae64722eda..27f79be097b1 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -125,7 +125,7 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
 static inline int ovl_do_rmdir(struct ovl_fs *ofs,
 			       struct inode *dir, struct dentry *dentry)
 {
-	int err = vfs_rmdir(&init_user_ns, dir, dentry);
+	int err = vfs_rmdir(ovl_upper_idmap(ofs), dir, dentry);
 
 	pr_debug("rmdir(%pd2) = %i\n", dentry, err);
 	return err;
@@ -134,7 +134,7 @@ static inline int ovl_do_rmdir(struct ovl_fs *ofs,
 static inline int ovl_do_unlink(struct ovl_fs *ofs, struct inode *dir,
 				struct dentry *dentry)
 {
-	int err = vfs_unlink(&init_user_ns, dir, dentry, NULL);
+	int err = vfs_unlink(ovl_upper_idmap(ofs), dir, dentry, NULL);
 
 	pr_debug("unlink(%pd2) = %i\n", dentry, err);
 	return err;
@@ -143,7 +143,7 @@ static inline int ovl_do_unlink(struct ovl_fs *ofs, struct inode *dir,
 static inline int ovl_do_link(struct ovl_fs *ofs, struct dentry *old_dentry,
 			      struct inode *dir, struct dentry *new_dentry)
 {
-	int err = vfs_link(old_dentry, &init_user_ns, dir, new_dentry, NULL);
+	int err = vfs_link(old_dentry, ovl_upper_idmap(ofs), dir, new_dentry, NULL);
 
 	pr_debug("link(%pd2, %pd2) = %i\n", old_dentry, new_dentry, err);
 	return err;
@@ -153,7 +153,7 @@ static inline int ovl_do_create(struct ovl_fs *ofs,
 				struct inode *dir, struct dentry *dentry,
 				umode_t mode)
 {
-	int err = vfs_create(&init_user_ns, dir, dentry, mode, true);
+	int err = vfs_create(ovl_upper_idmap(ofs), dir, dentry, mode, true);
 
 	pr_debug("create(%pd2, 0%o) = %i\n", dentry, mode, err);
 	return err;
@@ -163,7 +163,7 @@ static inline int ovl_do_mkdir(struct ovl_fs *ofs,
 			       struct inode *dir, struct dentry *dentry,
 			       umode_t mode)
 {
-	int err = vfs_mkdir(&init_user_ns, dir, dentry, mode);
+	int err = vfs_mkdir(ovl_upper_idmap(ofs), dir, dentry, mode);
 	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, err);
 	return err;
 }
@@ -172,7 +172,7 @@ static inline int ovl_do_mknod(struct ovl_fs *ofs,
 			       struct inode *dir, struct dentry *dentry,
 			       umode_t mode, dev_t dev)
 {
-	int err = vfs_mknod(&init_user_ns, dir, dentry, mode, dev);
+	int err = vfs_mknod(ovl_upper_idmap(ofs), dir, dentry, mode, dev);
 
 	pr_debug("mknod(%pd2, 0%o, 0%o) = %i\n", dentry, mode, dev, err);
 	return err;
@@ -182,7 +182,7 @@ static inline int ovl_do_symlink(struct ovl_fs *ofs,
 				 struct inode *dir, struct dentry *dentry,
 				 const char *oldname)
 {
-	int err = vfs_symlink(&init_user_ns, dir, dentry, oldname);
+	int err = vfs_symlink(ovl_upper_idmap(ofs), dir, dentry, oldname);
 
 	pr_debug("symlink(\"%s\", %pd2) = %i\n", oldname, dentry, err);
 	return err;
@@ -244,10 +244,10 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct inode *olddir,
 {
 	int err;
 	struct renamedata rd = {
-		.old_mnt_userns	= &init_user_ns,
+		.old_mnt_userns	= ovl_upper_idmap(ofs),
 		.old_dir 	= olddir,
 		.old_dentry 	= olddentry,
-		.new_mnt_userns	= &init_user_ns,
+		.new_mnt_userns	= ovl_upper_idmap(ofs),
 		.new_dir 	= newdir,
 		.new_dentry 	= newdentry,
 		.flags 		= flags,
@@ -265,7 +265,7 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct inode *olddir,
 static inline int ovl_do_whiteout(struct ovl_fs *ofs,
 				  struct inode *dir, struct dentry *dentry)
 {
-	int err = vfs_whiteout(&init_user_ns, dir, dentry);
+	int err = vfs_whiteout(ovl_upper_idmap(ofs), dir, dentry);
 	pr_debug("whiteout(%pd2) = %i\n", dentry, err);
 	return err;
 }
@@ -273,7 +273,7 @@ static inline int ovl_do_whiteout(struct ovl_fs *ofs,
 static inline struct dentry *ovl_do_tmpfile(struct ovl_fs *ofs,
 					    struct dentry *dentry, umode_t mode)
 {
-	struct dentry *ret = vfs_tmpfile(&init_user_ns, dentry, mode, 0);
+	struct dentry *ret = vfs_tmpfile(ovl_upper_idmap(ofs), dentry, mode, 0);
 	int err = PTR_ERR_OR_ZERO(ret);
 
 	pr_debug("tmpfile(%pd2, 0%o) = %i\n", dentry, mode, err);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/18] ovl: pass ofs to setattr operations
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (4 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 05/18] ovl: handle idmappings in " Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 07/18] ovl: pass layer mnt to ovl_open_realfile() Christian Brauner
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Pass down struct ovl_fs to setattr operations so we can ultimately
retrieve the relevant upper mount and take the mount's idmapping into
account when creating new filesystem objects. This is needed to support
idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/copy_up.c   | 19 +++++++++++--------
 fs/overlayfs/dir.c       |  2 +-
 fs/overlayfs/overlayfs.h |  2 +-
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 44605c51a382..2c336acb2ba0 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -293,7 +293,8 @@ static int ovl_copy_up_data(struct ovl_fs *ofs, struct path *old,
 	return error;
 }
 
-static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
+static int ovl_set_size(struct ovl_fs *ofs,
+			struct dentry *upperdentry, struct kstat *stat)
 {
 	struct iattr attr = {
 		.ia_valid = ATTR_SIZE,
@@ -303,7 +304,8 @@ static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
 	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
 }
 
-static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
+static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
+			      struct kstat *stat)
 {
 	struct iattr attr = {
 		.ia_valid =
@@ -315,7 +317,8 @@ static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
 	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
 }
 
-int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
+int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
+		 struct kstat *stat)
 {
 	int err = 0;
 
@@ -335,7 +338,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 		err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
 	}
 	if (!err)
-		ovl_set_timestamps(upperdentry, stat);
+		ovl_set_timestamps(ofs, upperdentry, stat);
 
 	return err;
 }
@@ -542,7 +545,7 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
 
 		if (!err) {
 			/* Restore timestamps on parent (best effort) */
-			ovl_set_timestamps(upperdir, &c->pstat);
+			ovl_set_timestamps(ofs, upperdir, &c->pstat);
 			ovl_dentry_set_upper_alias(c->dentry);
 		}
 	}
@@ -616,9 +619,9 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 
 	inode_lock(temp->d_inode);
 	if (S_ISREG(c->stat.mode))
-		err = ovl_set_size(temp, &c->stat);
+		err = ovl_set_size(ofs, temp, &c->stat);
 	if (!err)
-		err = ovl_set_attr(temp, &c->stat);
+		err = ovl_set_attr(ofs, temp, &c->stat);
 	inode_unlock(temp->d_inode);
 
 	return err;
@@ -840,7 +843,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
 
 		/* Restore timestamps on parent (best effort) */
 		inode_lock(udir);
-		ovl_set_timestamps(c->destdir, &c->pstat);
+		ovl_set_timestamps(ofs, c->destdir, &c->pstat);
 		inode_unlock(udir);
 
 		ovl_dentry_set_upper_alias(c->dentry);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 8da72b1ebafc..27a40b6754f4 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -408,7 +408,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 		goto out_cleanup;
 
 	inode_lock(opaquedir->d_inode);
-	err = ovl_set_attr(opaquedir, &stat);
+	err = ovl_set_attr(ofs, opaquedir, &stat);
 	inode_unlock(opaquedir->d_inode);
 	if (err)
 		goto out_cleanup;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 27f79be097b1..b003652f8827 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -622,7 +622,7 @@ int ovl_copy_up_with_data(struct dentry *dentry);
 int ovl_maybe_copy_up(struct dentry *dentry, int flags);
 int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 		   struct dentry *new);
-int ovl_set_attr(struct dentry *upper, struct kstat *stat);
+int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
 				  bool is_upper);
 int ovl_set_origin(struct ovl_fs *ofs, struct dentry *lower,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/18] ovl: pass layer mnt to ovl_open_realfile()
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (5 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 06/18] ovl: pass ofs to setattr operations Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper Christian Brauner
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christoph Hellwig, linux-unionfs, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber,
	Christian Brauner

From: Amir Goldstein <amir73il@gmail.com>

Ensure that ovl_open_realfile() takes the mount's idmapping into
account. We add a new helper ovl_path_realdata() that can be used to
easily retrieve the relevant path which we can pass down. This is needed
to support idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c      | 22 +++++++++++++---------
 fs/overlayfs/overlayfs.h |  1 +
 fs/overlayfs/util.c      | 14 ++++++++++++++
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..9250e04e97af 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -38,8 +38,9 @@ static char ovl_whatisit(struct inode *inode, struct inode *realinode)
 #define OVL_OPEN_FLAGS (O_NOATIME | FMODE_NONOTIFY)
 
 static struct file *ovl_open_realfile(const struct file *file,
-				      struct inode *realinode)
+				      struct path *realpath)
 {
+	struct inode *realinode = d_inode(realpath->dentry);
 	struct inode *inode = file_inode(file);
 	struct file *realfile;
 	const struct cred *old_cred;
@@ -104,21 +105,21 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 			       bool allow_meta)
 {
-	struct inode *inode = file_inode(file);
-	struct inode *realinode;
+	struct dentry *dentry = file_dentry(file);
+	struct path realpath;
 
 	real->flags = 0;
 	real->file = file->private_data;
 
 	if (allow_meta)
-		realinode = ovl_inode_real(inode);
+		ovl_path_real(dentry, &realpath);
 	else
-		realinode = ovl_inode_realdata(inode);
+		ovl_path_realdata(dentry, &realpath);
 
 	/* Has it been copied up since we'd opened it? */
-	if (unlikely(file_inode(real->file) != realinode)) {
+	if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) {
 		real->flags = FDPUT_FPUT;
-		real->file = ovl_open_realfile(file, realinode);
+		real->file = ovl_open_realfile(file, &realpath);
 
 		return PTR_ERR_OR_ZERO(real->file);
 	}
@@ -144,17 +145,20 @@ static int ovl_real_fdget(const struct file *file, struct fd *real)
 
 static int ovl_open(struct inode *inode, struct file *file)
 {
+	struct dentry *dentry = file_dentry(file);
 	struct file *realfile;
+	struct path realpath;
 	int err;
 
-	err = ovl_maybe_copy_up(file_dentry(file), file->f_flags);
+	err = ovl_maybe_copy_up(dentry, file->f_flags);
 	if (err)
 		return err;
 
 	/* No longer need these flags, so don't pass them on to underlying fs */
 	file->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
-	realfile = ovl_open_realfile(file, ovl_inode_realdata(inode));
+	ovl_path_realdata(dentry, &realpath);
+	realfile = ovl_open_realfile(file, &realpath);
 	if (IS_ERR(realfile))
 		return PTR_ERR(realfile);
 
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index b003652f8827..816a69b46b67 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -319,6 +319,7 @@ void ovl_path_upper(struct dentry *dentry, struct path *path);
 void ovl_path_lower(struct dentry *dentry, struct path *path);
 void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
+enum ovl_path_type ovl_path_realdata(struct dentry *dentry, struct path *path);
 struct dentry *ovl_dentry_upper(struct dentry *dentry);
 struct dentry *ovl_dentry_lower(struct dentry *dentry);
 struct dentry *ovl_dentry_lowerdata(struct dentry *dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 5a7e5d1e884b..42293610f64e 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -194,6 +194,20 @@ enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path)
 	return type;
 }
 
+enum ovl_path_type ovl_path_realdata(struct dentry *dentry, struct path *path)
+{
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	WARN_ON_ONCE(d_is_dir(dentry));
+
+	if (!OVL_TYPE_UPPER(type) || OVL_TYPE_MERGE(type))
+		ovl_path_lowerdata(dentry, path);
+	else
+		ovl_path_upper(dentry, path);
+
+	return type;
+}
+
 struct dentry *ovl_dentry_upper(struct dentry *dentry)
 {
 	return ovl_upperdentry_dereference(OVL_I(d_inode(dentry)));
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (6 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 07/18] ovl: pass layer mnt to ovl_open_realfile() Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 11:56   ` Miklos Szeredi
  2022-03-29 10:35 ` [PATCH 09/18] ovl: use ovl_lookup_upper() wrapper Christian Brauner
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Introduce ovl_do_notify_change() as a simple wrapper around
notify_change() to support idmapped layers. The helper mirrors other
ovl_do_*() helpers that operate on the upper layers.

When changing ownership of an upper object the intended ownership needs
to be mapped according to the upper layer's idmapping. This mapping is
the inverse to the mapping applied when copying inode information from
an upper layer to the corresponding overlay inode. So e.g., when an
upper mount maps files that are stored on-disk as owned by id 1001 to
1000 this means that calling stat on this object from an idmapped mount
will report the file as being owned by id 1000. Consequently in order to
change ownership of an object in this filesystem so it appears as being
owned by id 1000 in the upper idmapped layer it needs to store id 1001
on disk. The mnt mapping helpers take care of this.

All idmapping helpers are nops when no idmapped base layers are used.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/copy_up.c   |  8 ++++----
 fs/overlayfs/dir.c       |  2 +-
 fs/overlayfs/inode.c     |  3 ++-
 fs/overlayfs/overlayfs.h | 25 +++++++++++++++++++++++++
 fs/overlayfs/ovl_entry.h |  5 +++++
 fs/overlayfs/super.c     |  2 +-
 6 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 2c336acb2ba0..a5d68302693f 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -301,7 +301,7 @@ static int ovl_set_size(struct ovl_fs *ofs,
 		.ia_size = stat->size,
 	};
 
-	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
+	return ovl_do_notify_change(ofs, upperdentry, &attr);
 }
 
 static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
@@ -314,7 +314,7 @@ static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
 		.ia_mtime = stat->mtime,
 	};
 
-	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
+	return ovl_do_notify_change(ofs, upperdentry, &attr);
 }
 
 int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
@@ -327,7 +327,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
 			.ia_valid = ATTR_MODE,
 			.ia_mode = stat->mode,
 		};
-		err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
+		err = ovl_do_notify_change(ofs, upperdentry, &attr);
 	}
 	if (!err) {
 		struct iattr attr = {
@@ -335,7 +335,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
 			.ia_uid = stat->uid,
 			.ia_gid = stat->gid,
 		};
-		err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
+		err = ovl_do_notify_change(ofs, upperdentry, &attr);
 	}
 	if (!err)
 		ovl_set_timestamps(ofs, upperdentry, stat);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 27a40b6754f4..9ae0352ff52a 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -516,7 +516,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 			.ia_mode = cattr->mode,
 		};
 		inode_lock(newdentry->d_inode);
-		err = notify_change(&init_user_ns, newdentry, &attr, NULL);
+		err = ovl_do_notify_change(ofs, newdentry, &attr);
 		inode_unlock(newdentry->d_inode);
 		if (err)
 			goto out_cleanup;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index c51a9dd36cc7..9a8e6b94d9e8 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -21,6 +21,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		struct iattr *attr)
 {
 	int err;
+	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	bool full_copy_up = false;
 	struct dentry *upperdentry;
 	const struct cred *old_cred;
@@ -77,7 +78,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 
 		inode_lock(upperdentry->d_inode);
 		old_cred = ovl_override_creds(dentry->d_sb);
-		err = notify_change(&init_user_ns, upperdentry, attr, NULL);
+		err = ovl_do_notify_change(ofs, upperdentry, attr);
 		revert_creds(old_cred);
 		if (!err)
 			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 816a69b46b67..eff8a1edb693 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -122,6 +122,31 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
 	return ovl_xattr_table[ox][ofs->config.userxattr];
 }
 
+/*
+ * When changing ownership of an upper object map the intended ownership
+ * according to the upper layer's idmapping. When an upper mount idmaps files
+ * that are stored on-disk as owned by id 1001 to id 1000 this means stat on
+ * this object will report it as being owned by id 1000 when calling via the
+ * upper mount. So in order to change ownership of an object so it reports id
+ * 1000 when calling stat on it through the upper mount the value written to
+ * disk must be 1001. The mount mapping helper will take of this. The mnt
+ * idmapping helpers are nops if the upper layer isn't idmapped.
+ */
+static inline int ovl_do_notify_change(struct ovl_fs *ofs,
+				       struct dentry *upperdentry,
+				       struct iattr *attr)
+{
+	struct user_namespace *upper_idmap = ovl_upper_idmap(ofs);
+	struct user_namespace *fs_idmap = i_user_ns(d_inode(upperdentry));
+
+	if (attr->ia_valid & ATTR_UID)
+		attr->ia_uid = mapped_kuid_user(upper_idmap, fs_idmap, attr->ia_uid);
+	if (attr->ia_valid & ATTR_GID)
+		attr->ia_gid = mapped_kgid_user(upper_idmap, fs_idmap, attr->ia_gid);
+
+	return notify_change(ovl_upper_idmap(ofs), upperdentry, attr, NULL);
+}
+
 static inline int ovl_do_rmdir(struct ovl_fs *ofs,
 			       struct inode *dir, struct dentry *dentry)
 {
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 63efee554f69..cf6aebcf893c 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -90,6 +90,11 @@ static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
 	return ofs->layers[0].mnt;
 }
 
+static inline struct user_namespace *ovl_upper_idmap(struct ovl_fs *ofs)
+{
+	return mnt_user_ns(ovl_upper_mnt(ofs));
+}
+
 static inline struct ovl_fs *OVL_FS(struct super_block *sb)
 {
 	return (struct ovl_fs *)sb->s_fs_info;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index f1deb84aebcf..2cc27e707cb3 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -821,7 +821,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 
 		/* Clear any inherited mode bits */
 		inode_lock(work->d_inode);
-		err = notify_change(&init_user_ns, work, &attr, NULL);
+		err = ovl_do_notify_change(ofs, work, &attr);
 		inode_unlock(work->d_inode);
 		if (err)
 			goto out_dput;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/18] ovl: use ovl_lookup_upper() wrapper
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (7 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 10/18] ovl: use ovl_path_getxattr() wrapper Christian Brauner
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Introduce ovl_lookup_upper() as a simple wrapper around lookup_one().
Make it clear in the helper's name that this only operates on the upper
layer. The wrapper will take upper layer's idmapping into account when
checking permission in lookup_one().

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/copy_up.c   | 12 +++++++-----
 fs/overlayfs/dir.c       | 31 +++++++++++++++----------------
 fs/overlayfs/overlayfs.h |  8 ++++++++
 fs/overlayfs/readdir.c   |  6 +++---
 fs/overlayfs/super.c     |  6 +++---
 fs/overlayfs/util.c      |  2 +-
 6 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index a5d68302693f..19b5f75d1fe2 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -487,7 +487,7 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
 	if (err)
 		goto out;
 
-	index = lookup_one_len(name.name, indexdir, name.len);
+	index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
 	if (IS_ERR(index)) {
 		err = PTR_ERR(index);
 	} else {
@@ -536,8 +536,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
 		return err;
 
 	inode_lock_nested(udir, I_MUTEX_PARENT);
-	upper = lookup_one_len(c->dentry->d_name.name, upperdir,
-			       c->dentry->d_name.len);
+	upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
+				 c->dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (!IS_ERR(upper)) {
 		err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
@@ -700,7 +700,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 			goto cleanup;
 	}
 
-	upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len);
+	upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
+				 c->destname.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto cleanup;
@@ -752,7 +753,8 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
 
 	inode_lock_nested(udir, I_MUTEX_PARENT);
 
-	upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len);
+	upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
+				 c->destname.len);
 	err = PTR_ERR(upper);
 	if (!IS_ERR(upper)) {
 		err = ovl_do_link(ofs, temp, udir, upper);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 9ae0352ff52a..d470eedfd42c 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -51,7 +51,7 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
 	/* counter is allowed to wrap, since temp dentries are ephemeral */
 	snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
 
-	temp = lookup_one_len(name, workdir, strlen(name));
+	temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
 	if (!IS_ERR(temp) && temp->d_inode) {
 		pr_err("workdir/%s already exists\n", name);
 		dput(temp);
@@ -155,8 +155,8 @@ int ovl_mkdir_real(struct ovl_fs *ofs, struct inode *dir,
 	 * to it unhashed and negative. If that happens, try to
 	 * lookup a new hashed and positive dentry.
 	 */
-	d = lookup_one_len(dentry->d_name.name, dentry->d_parent,
-			   dentry->d_name.len);
+	d = ovl_lookup_upper(ofs, dentry->d_name.name, dentry->d_parent,
+			     dentry->d_name.len);
 	if (IS_ERR(d)) {
 		pr_warn("failed lookup after mkdir (%pd2, err=%i).\n",
 			dentry, err);
@@ -333,9 +333,8 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 
 	inode_lock_nested(udir, I_MUTEX_PARENT);
 	newdentry = ovl_create_real(ofs, udir,
-				    lookup_one_len(dentry->d_name.name,
-						   upperdir,
-						   dentry->d_name.len),
+				    ovl_lookup_upper(ofs, dentry->d_name.name,
+						     upperdir, dentry->d_name.len),
 				    attr);
 	err = PTR_ERR(newdentry);
 	if (IS_ERR(newdentry))
@@ -490,8 +489,8 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 	if (err)
 		goto out;
 
-	upper = lookup_one_len(dentry->d_name.name, upperdir,
-			       dentry->d_name.len);
+	upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
+				 dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out_unlock;
@@ -773,8 +772,8 @@ static int ovl_remove_and_whiteout(struct dentry *dentry,
 	if (err)
 		goto out_dput;
 
-	upper = lookup_one_len(dentry->d_name.name, upperdir,
-			       dentry->d_name.len);
+	upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
+				 dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out_unlock;
@@ -821,8 +820,8 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
 	}
 
 	inode_lock_nested(dir, I_MUTEX_PARENT);
-	upper = lookup_one_len(dentry->d_name.name, upperdir,
-			       dentry->d_name.len);
+	upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
+				 dentry->d_name.len);
 	err = PTR_ERR(upper);
 	if (IS_ERR(upper))
 		goto out_unlock;
@@ -1197,8 +1196,8 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 
 	trap = lock_rename(new_upperdir, old_upperdir);
 
-	olddentry = lookup_one_len(old->d_name.name, old_upperdir,
-				   old->d_name.len);
+	olddentry = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
+				     old->d_name.len);
 	err = PTR_ERR(olddentry);
 	if (IS_ERR(olddentry))
 		goto out_unlock;
@@ -1207,8 +1206,8 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 	if (!ovl_matches_upper(old, olddentry))
 		goto out_dput_old;
 
-	newdentry = lookup_one_len(new->d_name.name, new_upperdir,
-				   new->d_name.len);
+	newdentry = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
+				     new->d_name.len);
 	err = PTR_ERR(newdentry);
 	if (IS_ERR(newdentry))
 		goto out_dput_old;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index eff8a1edb693..6b0f39ef58d0 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -7,6 +7,7 @@
 #include <linux/kernel.h>
 #include <linux/uuid.h>
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include "ovl_entry.h"
 
 #undef pr_fmt
@@ -305,6 +306,13 @@ static inline struct dentry *ovl_do_tmpfile(struct ovl_fs *ofs,
 	return ret;
 }
 
+static inline struct dentry *ovl_lookup_upper(struct ovl_fs *ofs,
+					      const char *name,
+					      struct dentry *base, int len)
+{
+	return lookup_one(ovl_upper_idmap(ofs), name, base, len);
+}
+
 static inline bool ovl_open_flags_need_copy_up(int flags)
 {
 	if (!flags)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 9c580ef8cd6f..1d06222a496c 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1013,7 +1013,7 @@ void ovl_cleanup_whiteouts(struct ovl_fs *ofs, struct dentry *upper,
 		if (WARN_ON(!p->is_whiteout || !p->is_upper))
 			continue;
 
-		dentry = lookup_one_len(p->name, upper, p->len);
+		dentry = ovl_lookup_upper(ofs, p->name, upper, p->len);
 		if (IS_ERR(dentry)) {
 			pr_err("lookup '%s/%.*s' failed (%i)\n",
 			       upper->d_name.name, p->len, p->name,
@@ -1113,7 +1113,7 @@ static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path,
 			err = -EINVAL;
 			break;
 		}
-		dentry = lookup_one_len(p->name, path->dentry, p->len);
+		dentry = ovl_lookup_upper(ofs, p->name, path->dentry, p->len);
 		if (IS_ERR(dentry))
 			continue;
 		if (dentry->d_inode)
@@ -1181,7 +1181,7 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs)
 			if (p->len == 2 && p->name[1] == '.')
 				continue;
 		}
-		index = lookup_one_len(p->name, indexdir, p->len);
+		index = ovl_lookup_upper(ofs, p->name, indexdir, p->len);
 		if (IS_ERR(index)) {
 			err = PTR_ERR(index);
 			index = NULL;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 2cc27e707cb3..1ed230c7baf1 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -761,7 +761,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 
 	inode_lock_nested(dir, I_MUTEX_PARENT);
 retry:
-	work = lookup_one_len(name, ofs->workbasedir, strlen(name));
+	work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
 
 	if (!IS_ERR(work)) {
 		struct iattr attr = {
@@ -1289,7 +1289,7 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
 		goto cleanup_temp;
 	}
 
-	whiteout = lookup_one_len(name.name.name, workdir, name.name.len);
+	whiteout = ovl_lookup_upper(ofs, name.name.name, workdir, name.name.len);
 	err = PTR_ERR(whiteout);
 	if (IS_ERR(whiteout))
 		goto cleanup_temp;
@@ -1321,7 +1321,7 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
 	struct dentry *child;
 
 	inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
-	child = lookup_one_len(name, parent, len);
+	child = ovl_lookup_upper(ofs, name, parent, len);
 	if (!IS_ERR(child) && !child->d_inode)
 		child = ovl_create_real(ofs, parent->d_inode, child,
 					OVL_CATTR(mode));
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 42293610f64e..79e5a22a3c7c 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -838,7 +838,7 @@ static void ovl_cleanup_index(struct dentry *dentry)
 	}
 
 	inode_lock_nested(dir, I_MUTEX_PARENT);
-	index = lookup_one_len(name.name, indexdir, name.len);
+	index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
 	err = PTR_ERR(index);
 	if (IS_ERR(index)) {
 		index = NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/18] ovl: use ovl_path_getxattr() wrapper
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (8 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 09/18] ovl: use ovl_lookup_upper() wrapper Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 11/18] ovl: handle idmappings for layer fileattrs Christian Brauner
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Add a helper that allows to retrieve ovl xattrs from either lower or
upper layers. To stop passing mnt and dentry separately everywhere use
struct path which more accurately reflects the tight coupling between
mount and dentry in this helper. Swich over all places to pass a path
argument that can operate on either upper or lower layers. This is
needed to support idmapped base layers with overlayfs.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/copy_up.c   | 19 +++++++-----
 fs/overlayfs/dir.c       |  2 +-
 fs/overlayfs/inode.c     |  7 ++++-
 fs/overlayfs/namei.c     | 29 +++++++++++++------
 fs/overlayfs/overlayfs.h | 62 +++++++++++++++++++++++++++++-----------
 fs/overlayfs/util.c      | 25 ++++++++--------
 6 files changed, 96 insertions(+), 48 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 19b5f75d1fe2..b673481eb47a 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -44,10 +44,10 @@ static bool ovl_must_copy_xattr(const char *name)
 	       !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN);
 }
 
-int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
-		   struct dentry *new)
+int ovl_copy_xattr(struct super_block *sb, struct path *path, struct dentry *new)
 {
 	struct ovl_fs *ofs = OVL_FS(sb);
+	struct dentry *old = path->dentry;
 	ssize_t list_size, size, value_size = 0;
 	char *buf, *name, *value = NULL;
 	int error = 0;
@@ -95,9 +95,9 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 			continue; /* Discard */
 		}
 retry:
-		size = vfs_getxattr(&init_user_ns, old, name, value, value_size);
+		size = ovl_do_getxattr(path, name, value, value_size);
 		if (size == -ERANGE)
-			size = vfs_getxattr(&init_user_ns, old, name, NULL, 0);
+			size = ovl_do_getxattr(path, name, NULL, 0);
 
 		if (size < 0) {
 			error = size;
@@ -583,7 +583,7 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
 			return err;
 	}
 
-	err = ovl_copy_xattr(c->dentry->d_sb, c->lowerpath.dentry, temp);
+	err = ovl_copy_xattr(c->dentry->d_sb, &c->lowerpath, temp);
 	if (err)
 		return err;
 
@@ -879,8 +879,12 @@ static ssize_t ovl_getxattr_value(struct ovl_fs *ofs, struct dentry *dentry,
 {
 	ssize_t res;
 	char *buf;
+	struct path upperpath = {
+		.dentry = dentry,
+		.mnt = ovl_upper_mnt(ofs),
+	};
 
-	res = ovl_do_getxattr(ofs, dentry, name, NULL, 0);
+	res = ovl_do_getxattr(&upperpath, name, NULL, 0);
 	if (res == -ENODATA || res == -EOPNOTSUPP)
 		res = 0;
 
@@ -889,7 +893,7 @@ static ssize_t ovl_getxattr_value(struct ovl_fs *ofs, struct dentry *dentry,
 		if (!buf)
 			return -ENOMEM;
 
-		res = ovl_do_getxattr(ofs, dentry, name, buf, res);
+		res = ovl_do_getxattr(&upperpath, name, buf, res);
 		if (res < 0)
 			kfree(buf);
 		else
@@ -965,6 +969,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 		return -EROFS;
 
 	ovl_path_lower(dentry, &ctx.lowerpath);
+
 	err = vfs_getattr(&ctx.lowerpath, &ctx.stat,
 			  STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
 	if (err)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index d470eedfd42c..cacb34c0094d 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -398,7 +398,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 	if (IS_ERR(opaquedir))
 		goto out_unlock;
 
-	err = ovl_copy_xattr(dentry->d_sb, upper, opaquedir);
+	err = ovl_copy_xattr(dentry->d_sb, &upperpath, opaquedir);
 	if (err)
 		goto out_cleanup;
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 9a8e6b94d9e8..763dada2ae68 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -1176,8 +1176,13 @@ struct inode *ovl_get_inode(struct super_block *sb,
 
 	/* Check for non-merge dir that may have whiteouts */
 	if (is_dir) {
+		struct path realpath = {
+			.dentry = upperdentry ?: lowerdentry,
+			.mnt = upperdentry ? ovl_upper_mnt(ofs) : lowerpath->layer->mnt,
+		};
+
 		if (((upperdentry && lowerdentry) || oip->numlower > 1) ||
-		    ovl_check_origin_xattr(ofs, upperdentry ?: lowerdentry)) {
+		    ovl_path_check_origin_xattr(ofs, &realpath)) {
 			ovl_set_flag(OVL_WHITEOUTS, inode);
 		}
 	}
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 042f6394a3d5..32f9d9089059 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -16,6 +16,7 @@
 
 struct ovl_lookup_data {
 	struct super_block *sb;
+	struct vfsmount *mnt;
 	struct qstr name;
 	bool is_dir;
 	bool opaque;
@@ -25,14 +26,14 @@ struct ovl_lookup_data {
 	bool metacopy;
 };
 
-static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
+static int ovl_check_redirect(struct path *path, struct ovl_lookup_data *d,
 			      size_t prelen, const char *post)
 {
 	int res;
 	char *buf;
 	struct ovl_fs *ofs = OVL_FS(d->sb);
 
-	buf = ovl_get_redirect_xattr(ofs, dentry, prelen + strlen(post));
+	buf = ovl_get_redirect_xattr(ofs, path, prelen + strlen(post));
 	if (IS_ERR_OR_NULL(buf))
 		return PTR_ERR(buf);
 
@@ -193,9 +194,9 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
 	return real;
 }
 
-static bool ovl_is_opaquedir(struct super_block *sb, struct dentry *dentry)
+static bool ovl_is_opaquedir(struct ovl_fs *ofs, struct path *path)
 {
-	return ovl_check_dir_xattr(sb, dentry, OVL_XATTR_OPAQUE);
+	return ovl_path_check_dir_xattr(ofs, path, OVL_XATTR_OPAQUE);
 }
 
 static struct dentry *ovl_lookup_positive_unlocked(const char *name,
@@ -224,6 +225,7 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 			     struct dentry **ret, bool drop_negative)
 {
 	struct dentry *this;
+	struct path path;
 	int err;
 	bool last_element = !post[0];
 
@@ -253,12 +255,15 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		d->stop = true;
 		goto put_and_out;
 	}
+
+	path.dentry = this;
+	path.mnt = d->mnt;
 	if (!d_can_lookup(this)) {
 		if (d->is_dir || !last_element) {
 			d->stop = true;
 			goto put_and_out;
 		}
-		err = ovl_check_metacopy_xattr(OVL_FS(d->sb), this);
+		err = ovl_check_metacopy_xattr(OVL_FS(d->sb), &path);
 		if (err < 0)
 			goto out_err;
 
@@ -278,14 +283,14 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		if (d->last)
 			goto out;
 
-		if (ovl_is_opaquedir(d->sb, this)) {
+		if (ovl_is_opaquedir(OVL_FS(d->sb), &path)) {
 			d->stop = true;
 			if (last_element)
 				d->opaque = true;
 			goto out;
 		}
 	}
-	err = ovl_check_redirect(this, d, prelen, post);
+	err = ovl_check_redirect(&path, d, prelen, post);
 	if (err)
 		goto out_err;
 out:
@@ -856,6 +861,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	old_cred = ovl_override_creds(dentry->d_sb);
 	upperdir = ovl_dentry_upper(dentry->d_parent);
 	if (upperdir) {
+		d.mnt = ovl_upper_mnt(ofs);
 		err = ovl_lookup_layer(upperdir, &d, &upperdentry, true);
 		if (err)
 			goto out;
@@ -911,6 +917,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		else
 			d.last = lower.layer->idx == roe->numlower;
 
+		d.mnt = lower.layer->mnt;
 		err = ovl_lookup_layer(lower.dentry, &d, &this, false);
 		if (err)
 			goto out_put;
@@ -1071,14 +1078,18 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (upperdentry)
 		ovl_dentry_set_upper_alias(dentry);
 	else if (index) {
+		struct path upperpath;
+
 		upperdentry = dget(index);
-		upperredirect = ovl_get_redirect_xattr(ofs, upperdentry, 0);
+		upperpath.dentry = upperdentry;
+		upperpath.mnt = ovl_upper_mnt(ofs);
+		upperredirect = ovl_get_redirect_xattr(ofs, &upperpath, 0);
 		if (IS_ERR(upperredirect)) {
 			err = PTR_ERR(upperredirect);
 			upperredirect = NULL;
 			goto out_free_oe;
 		}
-		err = ovl_check_metacopy_xattr(ofs, upperdentry);
+		err = ovl_check_metacopy_xattr(ofs, &upperpath);
 		if (err < 0)
 			goto out_free_oe;
 		uppermetacopy = err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6b0f39ef58d0..470c8b1912fe 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -214,15 +214,15 @@ static inline int ovl_do_symlink(struct ovl_fs *ofs,
 	return err;
 }
 
-static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
-				      const char *name, void *value,
-				      size_t size)
+static inline ssize_t ovl_do_getxattr(struct path *path, const char *name,
+				      void *value, size_t size)
 {
-	int err = vfs_getxattr(&init_user_ns, dentry, name, value, size);
+	int err = vfs_getxattr(mnt_user_ns(path->mnt), path->dentry,
+			       name, value, size);
 	int len = (value && err > 0) ? err : 0;
 
 	pr_debug("getxattr(%pd2, \"%s\", \"%*pE\", %zu, 0) = %i\n",
-		 dentry, name, min(len, 48), value, size, err);
+		 path->dentry, name, min(len, 48), value, size, err);
 	return err;
 }
 
@@ -230,14 +230,26 @@ static inline ssize_t ovl_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
 				   enum ovl_xattr ox, void *value,
 				   size_t size)
 {
-	return ovl_do_getxattr(ofs, dentry, ovl_xattr(ofs, ox), value, size);
+	struct path upperpath = {
+		.dentry = dentry,
+		.mnt = ovl_upper_mnt(ofs),
+	};
+	return ovl_do_getxattr(&upperpath, ovl_xattr(ofs, ox), value, size);
+}
+
+static inline ssize_t ovl_path_getxattr(struct ovl_fs *ofs,
+					 struct path *path,
+					 enum ovl_xattr ox, void *value,
+					 size_t size)
+{
+	return ovl_do_getxattr(path, ovl_xattr(ofs, ox), value, size);
 }
 
 static inline int ovl_do_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
 				  const char *name, const void *value,
 				  size_t size, int flags)
 {
-	int err = vfs_setxattr(&init_user_ns, dentry, name, value, size, flags);
+	int err = vfs_setxattr(ovl_upper_idmap(ofs), dentry, name, value, size, flags);
 	pr_debug("setxattr(%pd2, \"%s\", \"%*pE\", %zu, %d) = %i\n",
 		 dentry, name, min((int)size, 48), value, size, flags, err);
 	return err;
@@ -253,7 +265,7 @@ static inline int ovl_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
 static inline int ovl_do_removexattr(struct ovl_fs *ofs, struct dentry *dentry,
 				     const char *name)
 {
-	int err = vfs_removexattr(&init_user_ns, dentry, name);
+	int err = vfs_removexattr(ovl_upper_idmap(ofs), dentry, name);
 	pr_debug("removexattr(%pd2, \"%s\") = %i\n", dentry, name, err);
 	return err;
 }
@@ -389,9 +401,27 @@ struct file *ovl_path_open(struct path *path, int flags);
 int ovl_copy_up_start(struct dentry *dentry, int flags);
 void ovl_copy_up_end(struct dentry *dentry);
 bool ovl_already_copied_up(struct dentry *dentry, int flags);
-bool ovl_check_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry);
-bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
-			 enum ovl_xattr ox);
+bool ovl_path_check_dir_xattr(struct ovl_fs *ofs, struct path *path,
+			      enum ovl_xattr ox);
+static inline bool ovl_check_dir_xattr(struct ovl_fs *ofs,
+				       struct dentry *dentry, enum ovl_xattr ox)
+{
+	struct path upperpath = {
+		.dentry = dentry,
+		.mnt = ovl_upper_mnt(ofs),
+	};
+	return ovl_path_check_dir_xattr(ofs, &upperpath, ox);
+}
+bool ovl_path_check_origin_xattr(struct ovl_fs *ofs, struct path *path);
+static inline bool ovl_check_origin_xattr(struct ovl_fs *ofs,
+					  struct dentry *dentry)
+{
+	struct path upperpath = {
+		.dentry = dentry,
+		.mnt = ovl_upper_mnt(ofs),
+	};
+	return ovl_path_check_origin_xattr(ofs, &upperpath);
+}
 int ovl_check_setxattr(struct ovl_fs *ofs, struct dentry *upperdentry,
 		       enum ovl_xattr ox, const void *value, size_t size,
 		       int xerr);
@@ -403,10 +433,9 @@ bool ovl_need_index(struct dentry *dentry);
 int ovl_nlink_start(struct dentry *dentry);
 void ovl_nlink_end(struct dentry *dentry);
 int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
-int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry);
+int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct path *path);
 bool ovl_is_metacopy_dentry(struct dentry *dentry);
-char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
-			     int padding);
+char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct path *path, int padding);
 int ovl_sync_status(struct ovl_fs *ofs);
 
 static inline void ovl_set_flag(unsigned long flag, struct inode *inode)
@@ -427,7 +456,7 @@ static inline bool ovl_test_flag(unsigned long flag, struct inode *inode)
 static inline bool ovl_is_impuredir(struct super_block *sb,
 				    struct dentry *dentry)
 {
-	return ovl_check_dir_xattr(sb, dentry, OVL_XATTR_IMPURE);
+	return ovl_check_dir_xattr(OVL_FS(sb), dentry, OVL_XATTR_IMPURE);
 }
 
 /*
@@ -654,8 +683,7 @@ int ovl_fileattr_set(struct user_namespace *mnt_userns,
 int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_with_data(struct dentry *dentry);
 int ovl_maybe_copy_up(struct dentry *dentry, int flags);
-int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
-		   struct dentry *new);
+int ovl_copy_xattr(struct super_block *sb, struct path *path, struct dentry *new);
 int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upper, struct kstat *stat);
 struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
 				  bool is_upper);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 79e5a22a3c7c..3065393d143e 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -564,11 +564,11 @@ void ovl_copy_up_end(struct dentry *dentry)
 	ovl_inode_unlock(d_inode(dentry));
 }
 
-bool ovl_check_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry)
+bool ovl_path_check_origin_xattr(struct ovl_fs *ofs, struct path *path)
 {
 	int res;
 
-	res = ovl_getxattr(ofs, dentry, OVL_XATTR_ORIGIN, NULL, 0);
+	res = ovl_path_getxattr(ofs, path, OVL_XATTR_ORIGIN, NULL, 0);
 
 	/* Zero size value means "copied up but origin unknown" */
 	if (res >= 0)
@@ -577,16 +577,16 @@ bool ovl_check_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry)
 	return false;
 }
 
-bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry,
-			 enum ovl_xattr ox)
+bool ovl_path_check_dir_xattr(struct ovl_fs *ofs, struct path *path,
+			       enum ovl_xattr ox)
 {
 	int res;
 	char val;
 
-	if (!d_is_dir(dentry))
+	if (!d_is_dir(path->dentry))
 		return false;
 
-	res = ovl_getxattr(OVL_FS(sb), dentry, ox, &val, 1);
+	res = ovl_path_getxattr(ofs, path, ox, &val, 1);
 	if (res == 1 && val == 'y')
 		return true;
 
@@ -957,15 +957,15 @@ int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir)
 }
 
 /* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */
-int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry)
+int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct path *path)
 {
 	int res;
 
 	/* Only regular files can have metacopy xattr */
-	if (!S_ISREG(d_inode(dentry)->i_mode))
+	if (!S_ISREG(d_inode(path->dentry)->i_mode))
 		return 0;
 
-	res = ovl_getxattr(ofs, dentry, OVL_XATTR_METACOPY, NULL, 0);
+	res = ovl_path_getxattr(ofs, path, OVL_XATTR_METACOPY, NULL, 0);
 	if (res < 0) {
 		if (res == -ENODATA || res == -EOPNOTSUPP)
 			return 0;
@@ -1001,13 +1001,12 @@ bool ovl_is_metacopy_dentry(struct dentry *dentry)
 	return (oe->numlower > 1);
 }
 
-char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
-			     int padding)
+char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct path *path, int padding)
 {
 	int res;
 	char *s, *next, *buf = NULL;
 
-	res = ovl_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, NULL, 0);
+	res = ovl_path_getxattr(ofs, path, OVL_XATTR_REDIRECT, NULL, 0);
 	if (res == -ENODATA || res == -EOPNOTSUPP)
 		return NULL;
 	if (res < 0)
@@ -1019,7 +1018,7 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	res = ovl_getxattr(ofs, dentry, OVL_XATTR_REDIRECT, buf, res);
+	res = ovl_path_getxattr(ofs, path, OVL_XATTR_REDIRECT, buf, res);
 	if (res < 0)
 		goto fail;
 	if (res == 0)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/18] ovl: handle idmappings for layer fileattrs
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (9 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 10/18] ovl: use ovl_path_getxattr() wrapper Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 12/18] ovl: handle idmappings for layer lookup Christian Brauner
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Take the upper mount's idmapping into account when setting fileattrs on
the upper layer. This is needed to support idmapped base layers with
overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 763dada2ae68..f18b02b9dd53 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -537,7 +537,7 @@ int ovl_real_fileattr_set(struct path *realpath, struct fileattr *fa)
 	if (err)
 		return err;
 
-	return vfs_fileattr_set(&init_user_ns, realpath->dentry, fa);
+	return vfs_fileattr_set(mnt_user_ns(realpath->mnt), realpath->dentry, fa);
 }
 
 int ovl_fileattr_set(struct user_namespace *mnt_userns,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/18] ovl: handle idmappings for layer lookup
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (10 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 11/18] ovl: handle idmappings for layer fileattrs Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 13/18] ovl: store lower path in ovl_inode Christian Brauner
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Make the two places where lookup helpers can be called either on lower
or upper layers take the mount's idmapping into account. To this end we
pass down the mount in struct ovl_lookup_data. It can later also be used
to construct struct path for various other helpers. This is needed to
support idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/export.c  |  3 ++-
 fs/overlayfs/namei.c   | 14 ++++++++------
 fs/overlayfs/readdir.c | 10 +++++-----
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index ebde05c9cf62..5acf353d160b 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -391,7 +391,8 @@ static struct dentry *ovl_lookup_real_one(struct dentry *connected,
 	 * pointer because we hold no lock on the real dentry.
 	 */
 	take_dentry_name_snapshot(&name, real);
-	this = lookup_one_len(name.name.name, connected, name.name.len);
+	this = lookup_one(mnt_user_ns(layer->mnt), name.name.name,
+			  connected, name.name.len);
 	release_dentry_name_snapshot(&name);
 	err = PTR_ERR(this);
 	if (IS_ERR(this)) {
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 32f9d9089059..f7b550eafc1b 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -199,11 +199,12 @@ static bool ovl_is_opaquedir(struct ovl_fs *ofs, struct path *path)
 	return ovl_path_check_dir_xattr(ofs, path, OVL_XATTR_OPAQUE);
 }
 
-static struct dentry *ovl_lookup_positive_unlocked(const char *name,
+static struct dentry *ovl_lookup_positive_unlocked(struct ovl_lookup_data *d,
+						   const char *name,
 						   struct dentry *base, int len,
 						   bool drop_negative)
 {
-	struct dentry *ret = lookup_one_len_unlocked(name, base, len);
+	struct dentry *ret = lookup_one_unlocked(mnt_user_ns(d->mnt), name, base, len);
 
 	if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
 		if (drop_negative && ret->d_lockref.count == 1) {
@@ -229,7 +230,7 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 	int err;
 	bool last_element = !post[0];
 
-	this = ovl_lookup_positive_unlocked(name, base, namelen, drop_negative);
+	this = ovl_lookup_positive_unlocked(d, name, base, namelen, drop_negative);
 	if (IS_ERR(this)) {
 		err = PTR_ERR(this);
 		this = NULL;
@@ -709,7 +710,8 @@ struct dentry *ovl_lookup_index(struct ovl_fs *ofs, struct dentry *upper,
 	if (err)
 		return ERR_PTR(err);
 
-	index = lookup_positive_unlocked(name.name, ofs->indexdir, name.len);
+	index = lookup_one_positive_unlocked(ovl_upper_idmap(ofs), name.name,
+					     ofs->indexdir, name.len);
 	if (IS_ERR(index)) {
 		err = PTR_ERR(index);
 		if (err == -ENOENT) {
@@ -1174,8 +1176,8 @@ bool ovl_lower_positive(struct dentry *dentry)
 		struct dentry *this;
 		struct dentry *lowerdir = poe->lowerstack[i].dentry;
 
-		this = lookup_positive_unlocked(name->name, lowerdir,
-					       name->len);
+		this = lookup_one_positive_unlocked(mnt_user_ns(poe->lowerstack[i].layer->mnt),
+						   name->name, lowerdir, name->len);
 		if (IS_ERR(this)) {
 			switch (PTR_ERR(this)) {
 			case -ENOENT:
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1d06222a496c..78f62cc1797b 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -264,11 +264,11 @@ static int ovl_fill_merge(struct dir_context *ctx, const char *name,
 		return ovl_fill_lowest(rdd, name, namelen, offset, ino, d_type);
 }
 
-static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
+static int ovl_check_whiteouts(struct path *path, struct ovl_readdir_data *rdd)
 {
 	int err;
 	struct ovl_cache_entry *p;
-	struct dentry *dentry;
+	struct dentry *dentry, *dir = path->dentry;
 	const struct cred *old_cred;
 
 	old_cred = ovl_override_creds(rdd->dentry->d_sb);
@@ -278,7 +278,7 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
 		while (rdd->first_maybe_whiteout) {
 			p = rdd->first_maybe_whiteout;
 			rdd->first_maybe_whiteout = p->next_maybe_whiteout;
-			dentry = lookup_one_len(p->name, dir, p->len);
+			dentry = lookup_one(mnt_user_ns(path->mnt), p->name, dir, p->len);
 			if (!IS_ERR(dentry)) {
 				p->is_whiteout = ovl_is_whiteout(dentry);
 				dput(dentry);
@@ -312,7 +312,7 @@ static inline int ovl_dir_read(struct path *realpath,
 	} while (!err && rdd->count);
 
 	if (!err && rdd->first_maybe_whiteout && rdd->dentry)
-		err = ovl_check_whiteouts(realpath->dentry, rdd);
+		err = ovl_check_whiteouts(realpath, rdd);
 
 	fput(realfile);
 
@@ -479,7 +479,7 @@ static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
 			goto get;
 		}
 	}
-	this = lookup_one_len(p->name, dir, p->len);
+	this = lookup_one(mnt_user_ns(path->mnt), p->name, dir, p->len);
 	if (IS_ERR_OR_NULL(this) || !this->d_inode) {
 		/* Mark a stale entry */
 		p->is_whiteout = true;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/18] ovl: store lower path in ovl_inode
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (11 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 12/18] ovl: handle idmappings for layer lookup Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 14/18] ovl: use ovl_copy_{real,upper}attr() wrappers Christian Brauner
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christoph Hellwig, linux-unionfs, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber,
	Christian Brauner

From: Amir Goldstein <amir73il@gmail.com>

Create some ovl_i_* helpers to get real path from ovl inode. Instead of
just stashing struct inode for the lower layer we stash struct path for
the lower layer. The helpers allow to retrieve a struct path for the
relevant upper or lower layer. This will be used when retrieving
information based on struct inode when copying up inode attributes from
upper or lower inodes to ovl inodes and when checking permissions in
ovl_permission() in following patches. This is needed to support
idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/inode.c     | 11 +++++---
 fs/overlayfs/overlayfs.h |  6 ++++
 fs/overlayfs/ovl_entry.h |  2 +-
 fs/overlayfs/super.c     |  5 ++--
 fs/overlayfs/util.c      | 61 ++++++++++++++++++++++++++++++++++------
 5 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index f18b02b9dd53..e28b7ed755b3 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -779,13 +779,16 @@ void ovl_inode_init(struct inode *inode, struct ovl_inode_params *oip,
 		    unsigned long ino, int fsid)
 {
 	struct inode *realinode;
+	struct ovl_inode *oi = OVL_I(inode);
 
 	if (oip->upperdentry)
-		OVL_I(inode)->__upperdentry = oip->upperdentry;
-	if (oip->lowerpath && oip->lowerpath->dentry)
-		OVL_I(inode)->lower = igrab(d_inode(oip->lowerpath->dentry));
+		oi->__upperdentry = oip->upperdentry;
+	if (oip->lowerpath && oip->lowerpath->dentry) {
+		oi->lowerpath.dentry = dget(oip->lowerpath->dentry);
+		oi->lowerpath.layer = oip->lowerpath->layer;
+	}
 	if (oip->lowerdata)
-		OVL_I(inode)->lowerdata = igrab(d_inode(oip->lowerdata));
+		oi->lowerdata = igrab(d_inode(oip->lowerdata));
 
 	realinode = ovl_inode_real(inode);
 	ovl_copyattr(realinode, inode);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 470c8b1912fe..c3f7e5602699 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -359,18 +359,24 @@ bool ovl_dentry_remote(struct dentry *dentry);
 void ovl_dentry_update_reval(struct dentry *dentry, struct dentry *upperdentry,
 			     unsigned int mask);
 bool ovl_dentry_weird(struct dentry *dentry);
+enum ovl_path_type ovl_i_path_type(struct inode *inode, bool is_dir,
+				   int numlower);
 enum ovl_path_type ovl_path_type(struct dentry *dentry);
 void ovl_path_upper(struct dentry *dentry, struct path *path);
 void ovl_path_lower(struct dentry *dentry, struct path *path);
 void ovl_path_lowerdata(struct dentry *dentry, struct path *path);
+enum ovl_path_type ovl_i_path_real(struct inode *inode, struct path *path);
 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
 enum ovl_path_type ovl_path_realdata(struct dentry *dentry, struct path *path);
 struct dentry *ovl_dentry_upper(struct dentry *dentry);
 struct dentry *ovl_dentry_lower(struct dentry *dentry);
 struct dentry *ovl_dentry_lowerdata(struct dentry *dentry);
+const struct ovl_layer *ovl_i_layer_lower(struct inode *inode);
 const struct ovl_layer *ovl_layer_lower(struct dentry *dentry);
 struct dentry *ovl_dentry_real(struct dentry *dentry);
+struct dentry *ovl_i_dentry_real(struct inode *inode);
 struct dentry *ovl_i_dentry_upper(struct inode *inode);
+struct dentry *ovl_i_dentry_lower(struct inode *inode);
 struct inode *ovl_inode_upper(struct inode *inode);
 struct inode *ovl_inode_lower(struct inode *inode);
 struct inode *ovl_inode_lowerdata(struct inode *inode);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index cf6aebcf893c..898b002a5c6f 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -134,7 +134,7 @@ struct ovl_inode {
 	unsigned long flags;
 	struct inode vfs_inode;
 	struct dentry *__upperdentry;
-	struct inode *lower;
+	struct ovl_path lowerpath;
 
 	/* synchronize copy up and more */
 	struct mutex lock;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 1ed230c7baf1..9a656a24f7b1 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -184,7 +184,8 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
 	oi->version = 0;
 	oi->flags = 0;
 	oi->__upperdentry = NULL;
-	oi->lower = NULL;
+	oi->lowerpath.dentry = NULL;
+	oi->lowerpath.layer = NULL;
 	oi->lowerdata = NULL;
 	mutex_init(&oi->lock);
 
@@ -205,7 +206,7 @@ static void ovl_destroy_inode(struct inode *inode)
 	struct ovl_inode *oi = OVL_I(inode);
 
 	dput(oi->__upperdentry);
-	iput(oi->lower);
+	dput(oi->lowerpath.dentry);
 	if (S_ISDIR(inode->i_mode))
 		ovl_dir_cache_free(inode);
 	else
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 3065393d143e..7dd9901c9d17 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -125,31 +125,37 @@ bool ovl_dentry_weird(struct dentry *dentry)
 				  DCACHE_OP_COMPARE);
 }
 
-enum ovl_path_type ovl_path_type(struct dentry *dentry)
+enum ovl_path_type ovl_i_path_type(struct inode *inode, bool is_dir,
+				   int numlower)
 {
-	struct ovl_entry *oe = dentry->d_fsdata;
 	enum ovl_path_type type = 0;
 
-	if (ovl_dentry_upper(dentry)) {
+	if (ovl_i_dentry_upper(inode)) {
 		type = __OVL_PATH_UPPER;
 
 		/*
 		 * Non-dir dentry can hold lower dentry of its copy up origin.
 		 */
-		if (oe->numlower) {
-			if (ovl_test_flag(OVL_CONST_INO, d_inode(dentry)))
+		if (numlower) {
+			if (ovl_test_flag(OVL_CONST_INO, inode))
 				type |= __OVL_PATH_ORIGIN;
-			if (d_is_dir(dentry) ||
-			    !ovl_has_upperdata(d_inode(dentry)))
+			if (is_dir || !ovl_has_upperdata(inode))
 				type |= __OVL_PATH_MERGE;
 		}
 	} else {
-		if (oe->numlower > 1)
+		if (numlower > 1)
 			type |= __OVL_PATH_MERGE;
 	}
 	return type;
 }
 
+enum ovl_path_type ovl_path_type(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return ovl_i_path_type(d_inode(dentry), d_is_dir(dentry), oe->numlower);
+}
+
 void ovl_path_upper(struct dentry *dentry, struct path *path)
 {
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
@@ -250,6 +256,41 @@ struct dentry *ovl_i_dentry_upper(struct inode *inode)
 	return ovl_upperdentry_dereference(OVL_I(inode));
 }
 
+struct dentry *ovl_i_dentry_lower(struct inode *inode)
+{
+	return OVL_I(inode)->lowerpath.dentry;
+}
+
+struct dentry *ovl_i_dentry_real(struct inode *inode)
+{
+	return ovl_i_dentry_upper(inode) ?: ovl_i_dentry_lower(inode);
+}
+
+const struct ovl_layer *ovl_i_layer_lower(struct inode *inode)
+{
+	return OVL_I(inode)->lowerpath.layer;
+}
+
+enum ovl_path_type ovl_i_path_real(struct inode *inode, struct path *path)
+{
+	struct dentry *lowerdentry = ovl_i_dentry_lower(inode);
+	/* Will not set the __OVL_PATH_MERGE bit for merge lowers dir */
+	enum ovl_path_type type = ovl_i_path_type(inode, S_ISDIR(inode->i_mode),
+						  !!lowerdentry);
+
+	if (OVL_TYPE_UPPER(type)) {
+		path->dentry = ovl_i_dentry_upper(inode);
+		path->mnt = ovl_upper_mnt(OVL_FS(inode->i_sb));
+	} else if (lowerdentry) {
+		path->dentry = lowerdentry;
+		path->mnt = ovl_i_layer_lower(inode)->mnt;
+	} else {
+		*path = (struct path) { };
+	}
+
+	return type;
+}
+
 struct inode *ovl_inode_upper(struct inode *inode)
 {
 	struct dentry *upperdentry = ovl_i_dentry_upper(inode);
@@ -259,7 +300,9 @@ struct inode *ovl_inode_upper(struct inode *inode)
 
 struct inode *ovl_inode_lower(struct inode *inode)
 {
-	return OVL_I(inode)->lower;
+	struct dentry *lowerdentry = ovl_i_dentry_lower(inode);
+
+	return lowerdentry ? d_inode(lowerdentry) : NULL;
 }
 
 struct inode *ovl_inode_real(struct inode *inode)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/18] ovl: use ovl_copy_{real,upper}attr() wrappers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (12 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 13/18] ovl: store lower path in ovl_inode Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 15/18] ovl: handle idmappings in ovl_permission() Christian Brauner
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

When copying inode attributes we from the upper or lower layer to ovl
inodes we need to take the upper or lower layer's mount's idmapping into
account. In a lot of places we call ovl_copyattr() only on upper inodes
and in some we call it on either upper or lower inodes. Split this into
two separate helpers.

The first one should only be called on upper
inodes and is thus called ovl_copy_upperattr(). The second one can be
called on upper or lower inodes. We add ovl_copy_realattr() for this
task. The new helper makes use of the previously added ovl_i_path_real()
helper. This is needed to support idmapped base layers with overlay.

When overlay copies the inode information from an upper or lower layer
to the relevant overlay inode it will apply the idmapping of the upper
or lower layer when doing so. The ovl inode ownership will thus always
correctly reflect the ownership of the idmapped upper or lower layer.

All idmapping helpers are nops when no idmapped base layers are used.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/dir.c       |  6 +++---
 fs/overlayfs/file.c      | 15 +++++++--------
 fs/overlayfs/inode.c     |  8 ++++----
 fs/overlayfs/overlayfs.h | 22 +++++++++++++---------
 fs/overlayfs/util.c      | 27 ++++++++++++++++++++++++++-
 5 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index cacb34c0094d..bc0525cbcf82 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -931,7 +931,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	 */
 	upperdentry = ovl_dentry_upper(dentry);
 	if (upperdentry)
-		ovl_copyattr(d_inode(upperdentry), d_inode(dentry));
+		ovl_copy_upperattr(d_inode(upperdentry), d_inode(dentry));
 
 out_drop_write:
 	ovl_drop_write(dentry);
@@ -1279,9 +1279,9 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 			 (d_inode(new) && ovl_type_origin(new)));
 
 	/* copy ctime: */
-	ovl_copyattr(d_inode(olddentry), d_inode(old));
+	ovl_copy_upperattr(d_inode(olddentry), d_inode(old));
 	if (d_inode(new) && ovl_dentry_upper(new))
-		ovl_copyattr(d_inode(newdentry), d_inode(new));
+		ovl_copy_upperattr(d_inode(newdentry), d_inode(new));
 
 out_dput:
 	dput(newdentry);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 9250e04e97af..656c30bf20a6 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -277,7 +277,7 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
 		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
 				      SB_FREEZE_WRITE);
 		file_end_write(iocb->ki_filp);
-		ovl_copyattr(ovl_inode_real(inode), inode);
+		ovl_copy_realattr(inode);
 	}
 
 	orig_iocb->ki_pos = iocb->ki_pos;
@@ -360,7 +360,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 	inode_lock(inode);
 	/* Update mode */
-	ovl_copyattr(ovl_inode_real(inode), inode);
+	ovl_copy_realattr(inode);
 	ret = file_remove_privs(file);
 	if (ret)
 		goto out_unlock;
@@ -385,7 +385,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 				     ovl_iocb_to_rwf(ifl));
 		file_end_write(real.file);
 		/* Update size */
-		ovl_copyattr(ovl_inode_real(inode), inode);
+		ovl_copy_realattr(inode);
 	} else {
 		struct ovl_aio_req *aio_req;
 
@@ -435,12 +435,11 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	struct fd real;
 	const struct cred *old_cred;
 	struct inode *inode = file_inode(out);
-	struct inode *realinode = ovl_inode_real(inode);
 	ssize_t ret;
 
 	inode_lock(inode);
 	/* Update mode */
-	ovl_copyattr(realinode, inode);
+	ovl_copy_realattr(inode);
 	ret = file_remove_privs(out);
 	if (ret)
 		goto out_unlock;
@@ -456,7 +455,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	file_end_write(real.file);
 	/* Update size */
-	ovl_copyattr(realinode, inode);
+	ovl_copy_realattr(inode);
 	revert_creds(old_cred);
 	fdput(real);
 
@@ -530,7 +529,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	revert_creds(old_cred);
 
 	/* Update size */
-	ovl_copyattr(ovl_inode_real(inode), inode);
+	ovl_copy_realattr(inode);
 
 	fdput(real);
 
@@ -602,7 +601,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	revert_creds(old_cred);
 
 	/* Update size */
-	ovl_copyattr(ovl_inode_real(inode_out), inode_out);
+	ovl_copy_realattr(inode_out);
 
 	fdput(real_in);
 	fdput(real_out);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e28b7ed755b3..44fa578267fa 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -81,7 +81,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		err = ovl_do_notify_change(ofs, upperdentry, attr);
 		revert_creds(old_cred);
 		if (!err)
-			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
+			ovl_copy_upperattr(upperdentry->d_inode, dentry->d_inode);
 		inode_unlock(upperdentry->d_inode);
 
 		if (winode)
@@ -379,7 +379,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 	revert_creds(old_cred);
 
 	/* copy c/mtime */
-	ovl_copyattr(d_inode(realdentry), inode);
+	ovl_copy_upperattr(d_inode(realdentry), inode);
 
 out_drop_write:
 	ovl_drop_write(dentry);
@@ -581,7 +581,7 @@ int ovl_fileattr_set(struct user_namespace *mnt_userns,
 		inode_set_flags(inode, flags, OVL_COPY_I_FLAGS_MASK);
 
 		/* Update ctime */
-		ovl_copyattr(ovl_inode_real(inode), inode);
+		ovl_copy_realattr(inode);
 	}
 	ovl_drop_write(dentry);
 out:
@@ -791,7 +791,7 @@ void ovl_inode_init(struct inode *inode, struct ovl_inode_params *oip,
 		oi->lowerdata = igrab(d_inode(oip->lowerdata));
 
 	realinode = ovl_inode_real(inode);
-	ovl_copyattr(realinode, inode);
+	ovl_copy_realattr(inode);
 	ovl_copyflags(realinode, inode);
 	ovl_map_ino(inode, ino, fsid);
 }
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c3f7e5602699..1715cbfae2a0 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -615,15 +615,19 @@ bool ovl_lookup_trap_inode(struct super_block *sb, struct dentry *dir);
 struct inode *ovl_get_trap_inode(struct super_block *sb, struct dentry *dir);
 struct inode *ovl_get_inode(struct super_block *sb,
 			    struct ovl_inode_params *oip);
-static inline void ovl_copyattr(struct inode *from, struct inode *to)
-{
-	to->i_uid = from->i_uid;
-	to->i_gid = from->i_gid;
-	to->i_mode = from->i_mode;
-	to->i_atime = from->i_atime;
-	to->i_mtime = from->i_mtime;
-	to->i_ctime = from->i_ctime;
-	i_size_write(to, i_size_read(from));
+void ovl_do_copyattr(struct vfsmount *realmnt, struct inode *realinode,
+		     struct inode *inode);
+static inline void ovl_copy_upperattr(struct inode *upperinode, struct inode *to)
+{
+	ovl_do_copyattr(ovl_upper_mnt(OVL_FS(to->i_sb)), upperinode, to);
+}
+
+static inline void ovl_copy_realattr(struct inode *to)
+{
+	struct path realpath;
+
+	ovl_i_path_real(to, &realpath);
+	ovl_do_copyattr(realpath.mnt, d_inode(realpath.dentry), to);
 }
 
 /* vfs inode flags copied from real to ovl inode */
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 7dd9901c9d17..79fae06ee10a 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -500,7 +500,7 @@ static void ovl_dir_version_inc(struct dentry *dentry, bool impurity)
 void ovl_dir_modified(struct dentry *dentry, bool impurity)
 {
 	/* Copy mtime/ctime */
-	ovl_copyattr(d_inode(ovl_dentry_upper(dentry)), d_inode(dentry));
+	ovl_copy_upperattr(d_inode(ovl_dentry_upper(dentry)), d_inode(dentry));
 
 	ovl_dir_version_inc(dentry, impurity);
 }
@@ -1116,3 +1116,28 @@ int ovl_sync_status(struct ovl_fs *ofs)
 
 	return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
 }
+
+/*
+ * ovl_do_copyattr() - copy inode attributes from layer to ovl inode
+ *
+ * When overlay copies inode information from an upper or lower layer to the
+ * relevant overlay inode it will apply the idmapping of the upper or lower
+ * layer when doing so ensuring that the ovl inode ownership will correctly
+ * reflect the ownership of the idmapped upper or lower layer. For example, an
+ * idmapped upper or lower layer mapping id 1001 to id 1000 will take care to
+ * map any lower or upper inode owned by id 1001 to id 1000. These mapping
+ * helpers are nops when the relevant layer isn't idmapped.
+ */
+void ovl_do_copyattr(struct vfsmount *realmnt, struct inode *realinode,
+		     struct inode *inode)
+{
+	struct user_namespace *real_idmap = mnt_user_ns(realmnt);
+
+	inode->i_uid = i_uid_into_mnt(real_idmap, realinode);
+	inode->i_gid = i_gid_into_mnt(real_idmap, realinode);
+	inode->i_mode = realinode->i_mode;
+	inode->i_atime = realinode->i_atime;
+	inode->i_mtime = realinode->i_mtime;
+	inode->i_ctime = realinode->i_ctime;
+	i_size_write(inode, i_size_read(realinode));
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/18] ovl: handle idmappings in ovl_permission()
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (13 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 14/18] ovl: use ovl_copy_{real,upper}attr() wrappers Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 16/18] ovl: handle idmappings in layer open helpers Christian Brauner
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Use the previously introduced ovl_i_path_real() helper to retrieve the
relevant upper or lower path and take the mount's idmapping into account
for the lower layer permission check. This is needed to support idmapped
base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/inode.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 44fa578267fa..0b09e62091da 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -280,12 +280,14 @@ int ovl_permission(struct user_namespace *mnt_userns,
 		   struct inode *inode, int mask)
 {
 	struct inode *upperinode = ovl_inode_upper(inode);
-	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
+	struct inode *realinode;
+	struct path realpath;
 	const struct cred *old_cred;
 	int err;
 
 	/* Careful in RCU walk mode */
-	if (!realinode) {
+	ovl_i_path_real(inode, &realpath);
+	if (!realpath.dentry) {
 		WARN_ON(!(mask & MAY_NOT_BLOCK));
 		return -ECHILD;
 	}
@@ -298,6 +300,7 @@ int ovl_permission(struct user_namespace *mnt_userns,
 	if (err)
 		return err;
 
+	realinode = d_inode(realpath.dentry);
 	old_cred = ovl_override_creds(inode->i_sb);
 	if (!upperinode &&
 	    !special_file(realinode->i_mode) && mask & MAY_WRITE) {
@@ -305,7 +308,7 @@ int ovl_permission(struct user_namespace *mnt_userns,
 		/* Make sure mounter can read file for copy up later */
 		mask |= MAY_READ;
 	}
-	err = inode_permission(&init_user_ns, realinode, mask);
+	err = inode_permission(mnt_user_ns(realpath.mnt), realinode, mask);
 	revert_creds(old_cred);
 
 	return err;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/18] ovl: handle idmappings in layer open helpers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (14 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 15/18] ovl: handle idmappings in ovl_permission() Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 17/18] ovl: handle idmappings in ovl_xattr_{g,s}et() Christian Brauner
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

In earlier patches we already passed down the relevant upper or lower
path to ovl_open_realfile(). Now let the open helpers actually take the
idmapping of the relevant mount into account when checking permissions.
This is needed to support idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/file.c | 7 +++++--
 fs/overlayfs/util.c | 5 +++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 656c30bf20a6..7dd44f4e2757 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -42,6 +42,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 {
 	struct inode *realinode = d_inode(realpath->dentry);
 	struct inode *inode = file_inode(file);
+	struct user_namespace *real_idmap;
 	struct file *realfile;
 	const struct cred *old_cred;
 	int flags = file->f_flags | OVL_OPEN_FLAGS;
@@ -51,12 +52,14 @@ static struct file *ovl_open_realfile(const struct file *file,
 	if (flags & O_APPEND)
 		acc_mode |= MAY_APPEND;
 
+
 	old_cred = ovl_override_creds(inode->i_sb);
-	err = inode_permission(&init_user_ns, realinode, MAY_OPEN | acc_mode);
+	real_idmap = mnt_user_ns(realpath->mnt);
+	err = inode_permission(real_idmap, realinode, MAY_OPEN | acc_mode);
 	if (err) {
 		realfile = ERR_PTR(err);
 	} else {
-		if (!inode_owner_or_capable(&init_user_ns, realinode))
+		if (!inode_owner_or_capable(real_idmap, realinode))
 			flags &= ~O_NOATIME;
 
 		realfile = open_with_fake_path(&file->f_path, flags, realinode,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 79fae06ee10a..7dd2e5e6662a 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -523,6 +523,7 @@ bool ovl_is_whiteout(struct dentry *dentry)
 struct file *ovl_path_open(struct path *path, int flags)
 {
 	struct inode *inode = d_inode(path->dentry);
+	struct user_namespace *real_idmap = mnt_user_ns(path->mnt);
 	int err, acc_mode;
 
 	if (flags & ~(O_ACCMODE | O_LARGEFILE))
@@ -539,12 +540,12 @@ struct file *ovl_path_open(struct path *path, int flags)
 		BUG();
 	}
 
-	err = inode_permission(&init_user_ns, inode, acc_mode | MAY_OPEN);
+	err = inode_permission(real_idmap, inode, acc_mode | MAY_OPEN);
 	if (err)
 		return ERR_PTR(err);
 
 	/* O_NOATIME is an optimization, don't fail if not permitted */
-	if (inode_owner_or_capable(&init_user_ns, inode))
+	if (inode_owner_or_capable(real_idmap, inode))
 		flags |= O_NOATIME;
 
 	return dentry_open(path, flags, current_cred());
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 17/18] ovl: handle idmappings in ovl_xattr_{g,s}et()
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (15 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 16/18] ovl: handle idmappings in layer open helpers Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH 18/18] ovl: support idmapped layers Christian Brauner
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

When retrieving xattrs from the upper or lower layers take the relevant
mount's idmapping into account. We rely on the previously introduced
ovl_i_path_real() helper to retrieve the relevant path. This is needed
to support idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/inode.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 0b09e62091da..a3fcb61844ff 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -349,6 +349,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
 	struct dentry *upperdentry = ovl_i_dentry_upper(inode);
 	struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry);
+	struct path realpath;
 	const struct cred *old_cred;
 
 	err = ovl_want_write(dentry);
@@ -356,8 +357,9 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		goto out;
 
 	if (!value && !upperdentry) {
+		ovl_path_lower(dentry, &realpath);
 		old_cred = ovl_override_creds(dentry->d_sb);
-		err = vfs_getxattr(&init_user_ns, realdentry, name, NULL, 0);
+		err = vfs_getxattr(mnt_user_ns(realpath.mnt), realdentry, name, NULL, 0);
 		revert_creds(old_cred);
 		if (err < 0)
 			goto out_drop_write;
@@ -395,11 +397,11 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 {
 	ssize_t res;
 	const struct cred *old_cred;
-	struct dentry *realdentry =
-		ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry);
+	struct path realpath;
 
+	ovl_i_path_real(inode, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
-	res = vfs_getxattr(&init_user_ns, realdentry, name, value, size);
+	res = vfs_getxattr(mnt_user_ns(realpath.mnt), realpath.dentry, name, value, size);
 	revert_creds(old_cred);
 	return res;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 18/18] ovl: support idmapped layers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (16 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 17/18] ovl: handle idmappings in ovl_xattr_{g,s}et() Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 10:35 ` [PATCH] common: allow to run all tests on idmapped mounts Christian Brauner
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Christian Brauner, Christoph Hellwig, linux-unionfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Now that overlay is able to take a layers idmapping into account allow
overlay mounts to be created on top of idmapped mounts. Since NFS
doesn't support idmapped mounts we don't allow idmapped base layers in
combination with the nfs_export=on mount option.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/overlayfs/super.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 9a656a24f7b1..d4cc07f7a2ef 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -874,10 +874,6 @@ static int ovl_mount_dir_noesc(const char *name, struct path *path)
 		pr_err("filesystem on '%s' not supported\n", name);
 		goto out_put;
 	}
-	if (is_idmapped_mnt(path->mnt)) {
-		pr_err("idmapped layers are currently not supported\n");
-		goto out_put;
-	}
 	if (!d_is_dir(path->dentry)) {
 		pr_err("'%s' not a directory\n", name);
 		goto out_put;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] common: allow to run all tests on idmapped mounts
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (17 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH 18/18] ovl: support idmapped layers Christian Brauner
@ 2022-03-29 10:35 ` Christian Brauner
  2022-03-29 12:25 ` [PATCH 00/18] overlay: support idmapped layers Miklos Szeredi
  2022-03-30 20:58 ` Vivek Goyal
  20 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 10:35 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Eryu Guan, fstests
  Cc: Christian Brauner, Christoph Hellwig, Aleksa Sarai,
	Giuseppe Scrivano, Rodrigo Campos Catelin, Seth Forshee,
	Luca Bocassi, Lennart Poettering, Stéphane Graber,
	Eryu Guan

In addition to the generic and filesystem-specific idmapped mount
testsuites that already exist upstream today add simple infrastructure
so any test can be run on idmapped mounts simply by setting
IDMAPPED_MOUNTS=true in the config file or section. The main user for
now will be overlay to verify it works correctly on idmapped mounts.

Cc: Eryu Guan <guaneryu@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <fstests@vger.kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 common/config  |  1 +
 common/overlay |  2 ++
 common/rc      | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/common/config b/common/config
index 479e50d1..1033b890 100644
--- a/common/config
+++ b/common/config
@@ -647,6 +647,7 @@ _overlay_config_override()
 	# Set fsck options, use default if user not set directly.
 	export FSCK_OPTIONS="$OVERLAY_FSCK_OPTIONS"
 	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
+	export IDMAPPED_MOUNTS="$IDMAPPED_MOUNTS"
 }
 
 _overlay_config_restore()
diff --git a/common/overlay b/common/overlay
index 1ca37e29..fff67ba1 100644
--- a/common/overlay
+++ b/common/overlay
@@ -73,6 +73,7 @@ _overlay_base_mount()
 
 	if [ -z "$dev" -o -z "$mnt" ] || \
 		_check_mounted_on $devname $dev $mntname $mnt; then
+		_idmapped_mount $dev $mnt
 		# no base fs or already mounted
 		return 0
 	elif [ $? -ne 1 ]; then
@@ -81,6 +82,7 @@ _overlay_base_mount()
 	fi
 
 	_mount $* $dev $mnt
+	_idmapped_mount $dev $mnt
 }
 
 _overlay_base_test_mount()
diff --git a/common/rc b/common/rc
index faf54ef9..5090cbf8 100644
--- a/common/rc
+++ b/common/rc
@@ -334,6 +334,7 @@ _try_scratch_mount()
 		return $?
 	fi
 	_mount -t $FSTYP `_scratch_mount_options $*`
+	_idmapped_mount $SCRATCH_DEV $SCRATCH_MNT
 }
 
 # mount scratch device with given options and _fail if mount fails
@@ -444,6 +445,53 @@ _scratch_shutdown_handle()
 	fi
 }
 
+_move_mount()
+{
+	local mnt=$1
+	local tmp=$2
+
+	# Replace $mnt with $tmp. Use a temporary bind-mount because
+	# mount --move will fail with certain mount propagation layouts.
+	$UMOUNT_PROG $mnt || _fail "Failed to unmount $mnt"
+	_mount --bind $tmp $mnt || _fail "Failed to bind-mount $tmp to $mnt"
+	$UMOUNT_PROG $tmp || _fail "Failed to unmount $tmp"
+	rmdir $tmp
+}
+
+_idmapped_mount()
+{
+	[ "$IDMAPPED_MOUNTS" = "true" ] || return 0
+
+	local dev=$1
+	local mnt=$2
+	local status=0
+	local tmp=`mktemp -d`
+
+	local mount_rec=`findmnt -rncv -S $dev -o OPTIONS`
+	if [[ "$mount_rec" == *"idmapped"* ]]; then
+		return 0
+	fi
+
+	# We create an idmapped mount where {g,u}id 0 writes to disk as
+	# {g,u}id 10000000 and $(id -u fsgqa) + 10000000. We change ownership
+        # of $mnt so {g,u} id 0 can actually create objects in there.
+	chown 10000000:10000000 $mnt || return 1
+	$here/src/idmapped-mounts/mount-idmapped \
+		--map-mount b:10000000:0:100000000000 \
+		$mnt $tmp
+	if [ $? -ne 0 ]; then
+		rmdir $tmp
+		return 1
+	fi
+
+	# The next call ensures we don't end up stacking an idmapped mount on
+	# top of the original mount. Instead we fully replace the original
+	# mount with the idmapped mount. This will not just allow a clean mount
+        # layout it also makes unmount and remounting way simpler.
+	_move_mount $mnt $tmp
+	return $?
+}
+
 _test_mount()
 {
     if [ "$FSTYP" == "overlay" ]; then
@@ -452,6 +500,7 @@ _test_mount()
     fi
     _test_options mount
     _mount -t $FSTYP $TEST_OPTIONS $TEST_FS_MOUNT_OPTS $SELINUX_MOUNT_OPTIONS $* $TEST_DEV $TEST_DIR
+    _idmapped_mount $TEST_DEV $TEST_DIR
 }
 
 _test_unmount()
@@ -3007,6 +3056,7 @@ _mount_or_remount_rw()
 	if [ $USE_REMOUNT -eq 0 ]; then
 		if [ "$FSTYP" != "overlay" ]; then
 			_mount -t $FSTYP $mount_opts $device $mountpoint
+			_idmapped_mount $device $mountpoint
 		else
 			_overlay_mount $device $mountpoint
 		fi
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls
  2022-03-29 10:35 ` [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls Christian Brauner
@ 2022-03-29 11:18   ` Miklos Szeredi
  0 siblings, 0 replies; 29+ messages in thread
From: Miklos Szeredi @ 2022-03-29 11:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Amir Goldstein, Miklos Szeredi, Christoph Hellwig, overlayfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, 29 Mar 2022 at 12:36, Christian Brauner <brauner@kernel.org> wrote:
>

> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 2cd5741c873b..6a53ca0d2c96 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -183,10 +183,9 @@ static inline int ovl_do_symlink(struct inode *dir, struct dentry *dentry,
>  }
>
>  static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
> -                                     enum ovl_xattr ox, void *value,
> +                                     const char *name, void *value,
>                                       size_t size)
>  {
> -       const char *name = ovl_xattr(ofs, ox);
>         int err = vfs_getxattr(&init_user_ns, dentry, name, value, size);
>         int len = (value && err > 0) ? err : 0;

Previously direct calls to vfs_*xattr() didn't print debug info.  This
was deliberate as debugging normal operations would drown out the
interesting calls.  But perhaps it doesn't matter nowadays, since
overlayfs is stable and nobody enables debugging anymore...

Anyway, I feel that this should be a separate change, or at least
documented in the patch header.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/18] ovl: handle idmappings in creation operations
  2022-03-29 10:35 ` [PATCH 05/18] ovl: handle idmappings in " Christian Brauner
@ 2022-03-29 11:22   ` Miklos Szeredi
  0 siblings, 0 replies; 29+ messages in thread
From: Miklos Szeredi @ 2022-03-29 11:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Amir Goldstein, Miklos Szeredi, Christoph Hellwig, overlayfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, 29 Mar 2022 at 12:36, Christian Brauner <brauner@kernel.org> wrote:
>
> When creating objects in the upper layer we need to pass down the upper
> idmapping into the respective vfs helpers in order to support idmapped
> base layers. The vfs helpers will take care of the rest.
>
> Cc: <linux-unionfs@vger.kernel.org>
> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> ---
>  fs/overlayfs/overlayfs.h | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 8fae64722eda..27f79be097b1 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -125,7 +125,7 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
>  static inline int ovl_do_rmdir(struct ovl_fs *ofs,
>                                struct inode *dir, struct dentry *dentry)
>  {
> -       int err = vfs_rmdir(&init_user_ns, dir, dentry);
> +       int err = vfs_rmdir(ovl_upper_idmap(ofs), dir, dentry);

ovl_upper_idmap() is not defined by this or earlier patches.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper
  2022-03-29 10:35 ` [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper Christian Brauner
@ 2022-03-29 11:56   ` Miklos Szeredi
  2022-03-29 12:36     ` Christian Brauner
  0 siblings, 1 reply; 29+ messages in thread
From: Miklos Szeredi @ 2022-03-29 11:56 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Amir Goldstein, Miklos Szeredi, Christoph Hellwig, overlayfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, 29 Mar 2022 at 12:37, Christian Brauner <brauner@kernel.org> wrote:
>
> Introduce ovl_do_notify_change() as a simple wrapper around
> notify_change() to support idmapped layers. The helper mirrors other
> ovl_do_*() helpers that operate on the upper layers.
>
> When changing ownership of an upper object the intended ownership needs
> to be mapped according to the upper layer's idmapping. This mapping is
> the inverse to the mapping applied when copying inode information from
> an upper layer to the corresponding overlay inode. So e.g., when an
> upper mount maps files that are stored on-disk as owned by id 1001 to
> 1000 this means that calling stat on this object from an idmapped mount
> will report the file as being owned by id 1000. Consequently in order to
> change ownership of an object in this filesystem so it appears as being
> owned by id 1000 in the upper idmapped layer it needs to store id 1001
> on disk. The mnt mapping helpers take care of this.
>
> All idmapping helpers are nops when no idmapped base layers are used.
>
> Cc: <linux-unionfs@vger.kernel.org>
> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> ---
>  fs/overlayfs/copy_up.c   |  8 ++++----
>  fs/overlayfs/dir.c       |  2 +-
>  fs/overlayfs/inode.c     |  3 ++-
>  fs/overlayfs/overlayfs.h | 25 +++++++++++++++++++++++++
>  fs/overlayfs/ovl_entry.h |  5 +++++
>  fs/overlayfs/super.c     |  2 +-
>  6 files changed, 38 insertions(+), 7 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 2c336acb2ba0..a5d68302693f 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -301,7 +301,7 @@ static int ovl_set_size(struct ovl_fs *ofs,
>                 .ia_size = stat->size,
>         };
>
> -       return notify_change(&init_user_ns, upperdentry, &attr, NULL);
> +       return ovl_do_notify_change(ofs, upperdentry, &attr);
>  }
>
>  static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
> @@ -314,7 +314,7 @@ static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
>                 .ia_mtime = stat->mtime,
>         };
>
> -       return notify_change(&init_user_ns, upperdentry, &attr, NULL);
> +       return ovl_do_notify_change(ofs, upperdentry, &attr);
>  }
>
>  int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
> @@ -327,7 +327,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
>                         .ia_valid = ATTR_MODE,
>                         .ia_mode = stat->mode,
>                 };
> -               err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
> +               err = ovl_do_notify_change(ofs, upperdentry, &attr);
>         }
>         if (!err) {
>                 struct iattr attr = {
> @@ -335,7 +335,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
>                         .ia_uid = stat->uid,
>                         .ia_gid = stat->gid,
>                 };
> -               err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
> +               err = ovl_do_notify_change(ofs, upperdentry, &attr);
>         }
>         if (!err)
>                 ovl_set_timestamps(ofs, upperdentry, stat);
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 27a40b6754f4..9ae0352ff52a 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -516,7 +516,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
>                         .ia_mode = cattr->mode,
>                 };
>                 inode_lock(newdentry->d_inode);
> -               err = notify_change(&init_user_ns, newdentry, &attr, NULL);
> +               err = ovl_do_notify_change(ofs, newdentry, &attr);
>                 inode_unlock(newdentry->d_inode);
>                 if (err)
>                         goto out_cleanup;
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index c51a9dd36cc7..9a8e6b94d9e8 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -21,6 +21,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>                 struct iattr *attr)
>  {
>         int err;
> +       struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
>         bool full_copy_up = false;
>         struct dentry *upperdentry;
>         const struct cred *old_cred;
> @@ -77,7 +78,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>
>                 inode_lock(upperdentry->d_inode);
>                 old_cred = ovl_override_creds(dentry->d_sb);
> -               err = notify_change(&init_user_ns, upperdentry, attr, NULL);
> +               err = ovl_do_notify_change(ofs, upperdentry, attr);
>                 revert_creds(old_cred);
>                 if (!err)
>                         ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 816a69b46b67..eff8a1edb693 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -122,6 +122,31 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
>         return ovl_xattr_table[ox][ofs->config.userxattr];
>  }
>
> +/*
> + * When changing ownership of an upper object map the intended ownership
> + * according to the upper layer's idmapping. When an upper mount idmaps files
> + * that are stored on-disk as owned by id 1001 to id 1000 this means stat on
> + * this object will report it as being owned by id 1000 when calling via the
> + * upper mount. So in order to change ownership of an object so it reports id
> + * 1000 when calling stat on it through the upper mount the value written to
> + * disk must be 1001. The mount mapping helper will take of this. The mnt
> + * idmapping helpers are nops if the upper layer isn't idmapped.
> + */
> +static inline int ovl_do_notify_change(struct ovl_fs *ofs,
> +                                      struct dentry *upperdentry,
> +                                      struct iattr *attr)
> +{
> +       struct user_namespace *upper_idmap = ovl_upper_idmap(ofs);
> +       struct user_namespace *fs_idmap = i_user_ns(d_inode(upperdentry));
> +
> +       if (attr->ia_valid & ATTR_UID)
> +               attr->ia_uid = mapped_kuid_user(upper_idmap, fs_idmap, attr->ia_uid);
> +       if (attr->ia_valid & ATTR_GID)
> +               attr->ia_gid = mapped_kgid_user(upper_idmap, fs_idmap, attr->ia_gid);

I see a similar transformation in chown_common() but I can't say I
fully understand how this works...

> +
> +       return notify_change(ovl_upper_idmap(ofs), upperdentry, attr, NULL);

First argument can use upper_idmap, right?

Thanks,
Miklos

> +}
> +
>  static inline int ovl_do_rmdir(struct ovl_fs *ofs,
>                                struct inode *dir, struct dentry *dentry)
>  {
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 63efee554f69..cf6aebcf893c 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -90,6 +90,11 @@ static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
>         return ofs->layers[0].mnt;
>  }
>
> +static inline struct user_namespace *ovl_upper_idmap(struct ovl_fs *ofs)
> +{
> +       return mnt_user_ns(ovl_upper_mnt(ofs));
> +}

Ah, here it is.

Can this function be introduced early in the series, but return &init_userns?

Then when everything in place the mnt userns support can be switched on.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/18] overlay: support idmapped layers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (18 preceding siblings ...)
  2022-03-29 10:35 ` [PATCH] common: allow to run all tests on idmapped mounts Christian Brauner
@ 2022-03-29 12:25 ` Miklos Szeredi
  2022-03-29 15:02   ` Amir Goldstein
  2022-03-30 20:58 ` Vivek Goyal
  20 siblings, 1 reply; 29+ messages in thread
From: Miklos Szeredi @ 2022-03-29 12:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Amir Goldstein, Miklos Szeredi, overlayfs, Christoph Hellwig,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, 29 Mar 2022 at 12:35, Christian Brauner <brauner@kernel.org> wrote:
>
> From: "Christian Brauner (Microsoft)" <brauner@kernel.org>
>
> Hey,
>
> This adds support for mounting overlay on top of idmapped layers.

Looks good at first glance.

I'm wondering if there's a way to deduplicate the info stored in
ovl_entry and ovl_inode, but that can come later...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper
  2022-03-29 11:56   ` Miklos Szeredi
@ 2022-03-29 12:36     ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2022-03-29 12:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Miklos Szeredi, Christoph Hellwig, overlayfs,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, Mar 29, 2022 at 01:56:06PM +0200, Miklos Szeredi wrote:
> On Tue, 29 Mar 2022 at 12:37, Christian Brauner <brauner@kernel.org> wrote:
> >
> > Introduce ovl_do_notify_change() as a simple wrapper around
> > notify_change() to support idmapped layers. The helper mirrors other
> > ovl_do_*() helpers that operate on the upper layers.
> >
> > When changing ownership of an upper object the intended ownership needs
> > to be mapped according to the upper layer's idmapping. This mapping is
> > the inverse to the mapping applied when copying inode information from
> > an upper layer to the corresponding overlay inode. So e.g., when an
> > upper mount maps files that are stored on-disk as owned by id 1001 to
> > 1000 this means that calling stat on this object from an idmapped mount
> > will report the file as being owned by id 1000. Consequently in order to
> > change ownership of an object in this filesystem so it appears as being
> > owned by id 1000 in the upper idmapped layer it needs to store id 1001
> > on disk. The mnt mapping helpers take care of this.
> >
> > All idmapping helpers are nops when no idmapped base layers are used.
> >
> > Cc: <linux-unionfs@vger.kernel.org>
> > Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> > ---
> >  fs/overlayfs/copy_up.c   |  8 ++++----
> >  fs/overlayfs/dir.c       |  2 +-
> >  fs/overlayfs/inode.c     |  3 ++-
> >  fs/overlayfs/overlayfs.h | 25 +++++++++++++++++++++++++
> >  fs/overlayfs/ovl_entry.h |  5 +++++
> >  fs/overlayfs/super.c     |  2 +-
> >  6 files changed, 38 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 2c336acb2ba0..a5d68302693f 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -301,7 +301,7 @@ static int ovl_set_size(struct ovl_fs *ofs,
> >                 .ia_size = stat->size,
> >         };
> >
> > -       return notify_change(&init_user_ns, upperdentry, &attr, NULL);
> > +       return ovl_do_notify_change(ofs, upperdentry, &attr);
> >  }
> >
> >  static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
> > @@ -314,7 +314,7 @@ static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
> >                 .ia_mtime = stat->mtime,
> >         };
> >
> > -       return notify_change(&init_user_ns, upperdentry, &attr, NULL);
> > +       return ovl_do_notify_change(ofs, upperdentry, &attr);
> >  }
> >
> >  int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
> > @@ -327,7 +327,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
> >                         .ia_valid = ATTR_MODE,
> >                         .ia_mode = stat->mode,
> >                 };
> > -               err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
> > +               err = ovl_do_notify_change(ofs, upperdentry, &attr);
> >         }
> >         if (!err) {
> >                 struct iattr attr = {
> > @@ -335,7 +335,7 @@ int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upperdentry,
> >                         .ia_uid = stat->uid,
> >                         .ia_gid = stat->gid,
> >                 };
> > -               err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
> > +               err = ovl_do_notify_change(ofs, upperdentry, &attr);
> >         }
> >         if (!err)
> >                 ovl_set_timestamps(ofs, upperdentry, stat);
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index 27a40b6754f4..9ae0352ff52a 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -516,7 +516,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> >                         .ia_mode = cattr->mode,
> >                 };
> >                 inode_lock(newdentry->d_inode);
> > -               err = notify_change(&init_user_ns, newdentry, &attr, NULL);
> > +               err = ovl_do_notify_change(ofs, newdentry, &attr);
> >                 inode_unlock(newdentry->d_inode);
> >                 if (err)
> >                         goto out_cleanup;
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index c51a9dd36cc7..9a8e6b94d9e8 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -21,6 +21,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> >                 struct iattr *attr)
> >  {
> >         int err;
> > +       struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> >         bool full_copy_up = false;
> >         struct dentry *upperdentry;
> >         const struct cred *old_cred;
> > @@ -77,7 +78,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> >
> >                 inode_lock(upperdentry->d_inode);
> >                 old_cred = ovl_override_creds(dentry->d_sb);
> > -               err = notify_change(&init_user_ns, upperdentry, attr, NULL);
> > +               err = ovl_do_notify_change(ofs, upperdentry, attr);
> >                 revert_creds(old_cred);
> >                 if (!err)
> >                         ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 816a69b46b67..eff8a1edb693 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -122,6 +122,31 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
> >         return ovl_xattr_table[ox][ofs->config.userxattr];
> >  }
> >
> > +/*
> > + * When changing ownership of an upper object map the intended ownership
> > + * according to the upper layer's idmapping. When an upper mount idmaps files
> > + * that are stored on-disk as owned by id 1001 to id 1000 this means stat on
> > + * this object will report it as being owned by id 1000 when calling via the
> > + * upper mount. So in order to change ownership of an object so it reports id
> > + * 1000 when calling stat on it through the upper mount the value written to
> > + * disk must be 1001. The mount mapping helper will take of this. The mnt
> > + * idmapping helpers are nops if the upper layer isn't idmapped.
> > + */
> > +static inline int ovl_do_notify_change(struct ovl_fs *ofs,
> > +                                      struct dentry *upperdentry,
> > +                                      struct iattr *attr)
> > +{
> > +       struct user_namespace *upper_idmap = ovl_upper_idmap(ofs);
> > +       struct user_namespace *fs_idmap = i_user_ns(d_inode(upperdentry));
> > +
> > +       if (attr->ia_valid & ATTR_UID)
> > +               attr->ia_uid = mapped_kuid_user(upper_idmap, fs_idmap, attr->ia_uid);
> > +       if (attr->ia_valid & ATTR_GID)
> > +               attr->ia_gid = mapped_kgid_user(upper_idmap, fs_idmap, attr->ia_gid);
> 
> I see a similar transformation in chown_common() but I can't say I
> fully understand how this works...

I'll take the systemd-homed example. systemd-homed will keep all files
owned by 65534 on-disk. Now when a user logs into their machine with id
1000 systemd-homed will create and idmapped mount where 65534 is mapped
to id 1000. That means when a user calls ls -al in that idmapped mount
they will see all files reported as being owned by id 1000 since
i_uid_into_mnt() will be called before reporting ownership to the user.
This maps the 65534 to 1000.
Now let's say someone with appropriate privileges will want change to
some file in that idmapped mount so that it is owned by id 1000 when
calling ls -al in that mount. In order to do that they need to write the
file to disk as 65534, ia_uid contains 65534. And that is what
mapped_kuid_user() is doing. This is extensively verified in xfstests in
generic/633 and a bunch of other places.
There's a lenghty explanation also in
Documentation/filesystems/idmappings.rst fwiw (Although I need to update
that since the functions have been renamed.).

> 
> > +
> > +       return notify_change(ovl_upper_idmap(ofs), upperdentry, attr, NULL);
> 
> First argument can use upper_idmap, right?

Yes, indeed.

> 
> Thanks,
> Miklos
> 
> > +}
> > +
> >  static inline int ovl_do_rmdir(struct ovl_fs *ofs,
> >                                struct inode *dir, struct dentry *dentry)
> >  {
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index 63efee554f69..cf6aebcf893c 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -90,6 +90,11 @@ static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> >         return ofs->layers[0].mnt;
> >  }
> >
> > +static inline struct user_namespace *ovl_upper_idmap(struct ovl_fs *ofs)
> > +{
> > +       return mnt_user_ns(ovl_upper_mnt(ofs));
> > +}
> 
> Ah, here it is.

Ah, sorry. That got messed up my final rebase. Sorry about that. I
usually do a git rebase -x make v5.17 but apparently I missed it after
the rebase.

> 
> Can this function be introduced early in the series, but return &init_userns?

Yes, of course. I had a series like this. Amir suggested to to it a
little later.

> 
> Then when everything in place the mnt userns support can be switched on.

Yes, will do.

Thanks,
Christian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/18] overlay: support idmapped layers
  2022-03-29 12:25 ` [PATCH 00/18] overlay: support idmapped layers Miklos Szeredi
@ 2022-03-29 15:02   ` Amir Goldstein
  0 siblings, 0 replies; 29+ messages in thread
From: Amir Goldstein @ 2022-03-29 15:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Miklos Szeredi, overlayfs, Christoph Hellwig,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, Mar 29, 2022 at 3:25 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 29 Mar 2022 at 12:35, Christian Brauner <brauner@kernel.org> wrote:
> >
> > From: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> >
> > Hey,
> >
> > This adds support for mounting overlay on top of idmapped layers.
>
> Looks good at first glance.
>
> I'm wondering if there's a way to deduplicate the info stored in
> ovl_entry and ovl_inode, but that can come later...
>

Yes at this point the deduplication of lowerstack elements became a bit odd,
but the final step is not obvious.

I don't think we want a variable size ovl_inode. That would be suboptimal and
make inode cache monitoring hard.

I suppose we could have lowerstack or 'stack' allocated and hanging
off ovl_inode.
At that point, dentry->d_fsdata can become just flat flags.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/18] overlay: support idmapped layers
  2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
                   ` (19 preceding siblings ...)
  2022-03-29 12:25 ` [PATCH 00/18] overlay: support idmapped layers Miklos Szeredi
@ 2022-03-30 20:58 ` Vivek Goyal
  2022-03-31  8:47   ` Christian Brauner
  20 siblings, 1 reply; 29+ messages in thread
From: Vivek Goyal @ 2022-03-30 20:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, Christoph Hellwig,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Tue, Mar 29, 2022 at 12:35:07PM +0200, Christian Brauner wrote:
> From: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> 
> Hey,
> 
> This adds support for mounting overlay on top of idmapped layers.
> 
> I have to start by saying a massive thank you to Amir! He did not just
> answer my constant overlay questions but also provided quite a few
> patches himself in this series in addition to reviews, comments and a
> lot of suggestions. Thank you!
> 
> There have been a lot of requests to unblock this. For just a few select
> examples see [3], [4], and [5]. I've worked closely with various
> communities among them containerd, Kubernetes, Podman, LXD, runC, crun,
> and systemd (For the curious please see the various pull-request and
> issues below.) a lot of them already support idmapped mounts since they
> are enabled for btrfs, ext4, and xfs (and f2fs and fat fwiw). In
> additon, a few colleagues at Microsoft and from Red Hat work on a
> Kubernetes Enhancement Proposals (KEP) that also relies on overlayfs
> supporting idmapped layers, see [12].
> 
> Overlayfs on top of idmapped layers will be used in various ways:
> 
> * Container managers use overlayfs to efficiently share layers between
>   containers. However, this only works for privileged containers.
>   Layers cannot be shared if both privileged and unprivileged containers
>   are used. Layers can also not be shared if non-overlapping idmappings
>   are used for unprivileged containers. Layers cannot be shared because
>   of the conflicting ownership requirements between the containers.

Hi Christian,

Thank you for this work. This is awesome.

Wanted to test it. I was wondering how to test it. Some simple
instructions will help.

I am assuming I will need to be priviliged to be able to setup idmapped
mounts. And then overlay can be mounted on top of these layers (either
as a priviliged operation outside container or unprivileged overlayfs
inside container). Sorry, just trying to wrap my head around which
operations will require me to be priviliged (root in init_user_ns)
and which operations I can do unprivileged (as normal user).

Thanks
Vivek

> 
>   Both the KEP proposal (see [13]) and LXD (see [14]) use
>   non-overlapping idmappings to increase isolation.
> 
>   All of these cases can be supported if overlayfs can be mounted on
>   idmapped layers. The container runtime will create idmapped mounts for
>   the layers supposed to be shared. Each container will be given
>   idmapped mounts with the correct idmapping. Either by attaching a
>   custom or its own user namespace depending on the use-case. Then an
>   overlay mount can be mounted on top of the idmapped layers. The
>   underlying idmapped mounts can then be unmounted and the mount table
>   will be in a clean state.
>   This approach has been tested an verified by Giuseppe and others.
> 
> * Because of the inability to share layers preparing a layer causes a
>   big runtime overhead as ownership needs to be recursively changed.
>   This becomes increasingly costly with the bigger the layers are.
>   Especially for a large rootfs it becomes prohibitively expensive.
> 
>   This recursive ownership change also causes a lot storage overhead.
>   Without metacopy it can quickly become unmanagable. With metacopy it
>   still wastes a lot of space and inodes.
> 
>   For both the KEP and container manager being able to use idmapped
>   layer with overlayfs on top of it solves all these problems.
> 
> * Container managers such as LXD do run full system containers. Such
>   systems are managed like virtual machines and users run application
>   containers inside. Since LXD uses an idmapped mounts for the rootfs of
>   the container with the container's user namespace attached to the
>   mount container runtimes cannot user overlayfs inside.
> 
>   Once overlayfs can be mounted on top of idmapped layers it will be
>   possible for container runtimes to use overlayfs inside LXD
>   containers.
> 
> * The systemd-homed daemon and toolsuite is a new component of systemd
>   to manage home areas in a portable way. systemd-homed makes use of
>   idmapped mounts for the home areas. If the kernel and used file system
>   support it the user's home area will be mounted as an idmapped mount
>   when they log into their system.
> 
>   All files are internally owned by the "nobody" user (i.e. the user
>   typically used for indicating "this ownership is not mapped"), and
>   dynamically mapped to the {g,u}id used locally on the system via
>   idmapped mounts. This makes migrating home areas between different
>   systems trivial because recursively chown()ing file system trees is no
>   longer necessary. This also means that it is impossible to store files
>   as {g,u}id 0 on disk.
> 
>   Once overlayfs can be mounted on top of idmapped layers it will be
>   possible for container runtimes to work better together with
>   systemd-homed.
> 
> * systemd provides various sandboxing features (see [11]) for services.
>   These serve as hardening measures. In this context, idmapped mounts
>   can be used to prevent services from creating files on disk as {g,u}id
>   0, making specific files inaccessible, or to prevent access to whole
>   directories. Since such services may also make use of overlayfs for
>   e.g. the ExtensionImages= option supporting overlayfs on top of
>   idmapped layers would be another huge hardening win.
> 
> * systemd provides the ability to use system extension images [15] for
>   /usr and /opt (with a look to /etc in the future). Such system
>   extension images contain files and directories similar in fashion to
>   a regular OS system tree. When one or more system extension images are
>   activated, their /usr/ and /opt/ hierarchies are combined via
>   overlay with the same hierarchies of the host OS, and the host /usr/
>   and /opt/ overmounted with it ("merging"). These images are read-only.
> 
>   This feature is available to unprivileged container and sandboxed
>   services as well. Idmapped layers are used here to avoid runtime and
>   storage overhead from recursively changing ownership and ultimately an
>   overlay mount is supposed to be created on top of it.
> 
> Giuseppe provided testing for runC/crun/Podman and he put up a pull
> request to support overlayfs on top of idmapped layers at [16]. That
> already covers a lot of users. Other tools have put up pull requests as
> well and they are linked below.
> 
> The patchset has been extensively tested for about 2 weeks with
> xfstests. The tests and results are explained in the following
> paragraphs.
> 
> In order to test overlayfs with idmapped mounts a simple patch to
> xfstests has been added which is part of this series. The patchset
> simply allows for each test in the xfstests suite to be run on top of
> idmapped mounts. That is in addition to the generic idmapped mount tests
> that have existed and are already run for a long time.
> 
> Since idmapped mounts can be created on top of btrfs, ext4, and xfs and
> these are the most relevant filesystems for users they were taken into
> the test matrix.
> 
> Amir ensured that the test matrix also includes metacopy. So all tests
> are run once with metacopy=on and once with metacopy=off.
> 
> Additionally, the unionmount tesuite that Amir maintains was run as part
> of xfstests. This brings testing for multi layer overlay and a few
> rarely used overlayfs use-cases.
> 
> And last, xfstests were run with and without idmapped mounts.
> 
> Since the patch series is based on Linux 5.17 the 5.17 kernel was chosen
> as a baseline kernel. The baseline kernel is pure 5.17 upstream without
> any of the patches in this series. The baseline kernel was used to run
> all xfstests with the same parameters minus the idmapped mount part of
> the test matrix. This ensured that regressions would be immediately
> noticeable.
> 
> So the full test matrix is:
> 
> ext4 x metacopy=off x -idmapped mounts
> ext4 x metacopy=on  x -idmapped mounts
> ext4 x metacopy=off x +idmapped mounts
> ext4 x metacopy=on  x +idmapped mounts
> 
> xfs x metacopy=off x -idmapped mounts
> xfs x metacopy=on  x -idmapped mounts
> xfs x metacopy=off x +idmapped mounts
> xfs x metacopy=on  x +idmapped mounts
> 
> btrfs x metacopy=off x -idmapped mounts
> btrfs x metacopy=on  x -idmapped mounts
> btrfs x metacopy=off x +idmapped mounts
> btrfs x metacopy=on  x +idmapped mounts
> 
> The test runs were started with:
> 
> sudo ./check -overlay
> 
> so they encompass all xfstests apart from those that needed to be
> excluded because they triggered known issues (One such example is
> generic/530 which causes an xfs corruption that is currently under
> discussion for a fix upstream.).
> 
> A single testrun with over 750 tests takes a bonkers long time given
> that I run them in a beefy VM on top of an ssd and not on bare metal.
> 
> The successes and failures are identical for the whole test matrix with
> and without idmapped mounts. The tests and failures are also identical
> compared to the baseline kernel. All in all this should provide a good
> amount of convidence. The full test output can be found in the following
> repo: https://gitlab.com/brauner/fs.idmapped.overlay.xfstests.output
> 
> In order to create an idmapped mount the mount_setattr() system call
> (see [17]) can be used together with the MOUNT_ATTR_IDMAP flag to attach
> a userns fd to a detached mount created with open_tree()'s
> OPEN_TREE_CLONE flag. The only effect is that the idmapping of the
> userns becomes the idmapping of the relevant. The links below should
> serve to illustrate how widely they are already used.
> 
> For people who would like to test I've created a tag fs.idmapped.overlay.v1
> which can be fetched:
> 
> git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git fs.idmapped.overlay.v1
> 
> the same goes for xfstests:
> 
> git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git fs.idmapped.overlay.v1
> 
> Thanks!
> Christian
> 
> [1]: OCI runtime-spec extension for idmapped mounts
>      https://github.com/opencontainers/runtime-spec/pull/1143
> 
> [2]: Support for idmapped mounts for shared volumes
>      https://github.com/opencontainers/runc/pull/3429
> 
> [3]: containerd support for idmapped mounts with focus on overlayfs
>      https://github.com/containerd/containerd/pull/5890
> 
> [4]: runC support for idmapped mounts with focus on overlayfs
>      https://github.com/opencontainers/runc/issues/2821
> 
> [5]: Podman support for idmapped mounts with focus on overlayfs
>      https://github.com/containers/podman/issues/10374
> 
> [6]: systemd-nspawn support for idmapped mounts
>      https://github.com/systemd/systemd/pull/19438
> 
> [7]: systemd-homed support for idmapped mounts
>      https://github.com/systemd/systemd/pull/21136
> 
> [8]: LXD support for idmapped mounts
>      https://github.com/lxc/lxd/pull/8778
> 
> [9]: LXC support for idmapped mounts
>      https://github.com/lxc/lxc/pull/3709
> 
> [10]: crun support for idmapped mounts
>       https://github.com/containers/crun/pull/874
> 
> [11]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing
> 
> [12]: https://github.com/kubernetes/enhancements/pull/3065
> 
> [13]: https://github.com/kubernetes/enhancements/pull/3065/files#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaR118
> 
> [14]: https://github.com/lxc/lxd/blob/master/doc/instances.md
>       (The relevant entry can be found by looking for the
>        "security.idmap.isolated" key.)
> 
> [15]: https://www.freedesktop.org/software/systemd/man/systemd-sysext.html
> 
> [16]: https://github.com/containers/storage/pull/1180
> 
> [17]: https://man7.org/linux/man-pages/man2/mount_setattr.2.html
> 
> Amir Goldstein (3):
>   ovl: use wrappers to all vfs_*xattr() calls
>   ovl: pass layer mnt to ovl_open_realfile()
>   ovl: store lower path in ovl_inode
> 
> Christian Brauner (15):
>   fs: add two trivial lookup helpers
>   exportfs: support idmapped mounts
>   ovl: pass ofs to creation operations
>   ovl: handle idmappings in creation operations
>   ovl: pass ofs to setattr operations
>   ovl: use ovl_do_notify_change() wrapper
>   ovl: use ovl_lookup_upper() wrapper
>   ovl: use ovl_path_getxattr() wrapper
>   ovl: handle idmappings for layer fileattrs
>   ovl: handle idmappings for layer lookup
>   ovl: use ovl_copy_{real,upper}attr() wrappers
>   ovl: handle idmappings in ovl_permission()
>   ovl: handle idmappings in layer open helpers
>   ovl: handle idmappings in ovl_xattr_{g,s}et()
>   ovl: support idmapped layers
> 
>  fs/exportfs/expfs.c      |   5 +-
>  fs/namei.c               |  52 +++++++--
>  fs/overlayfs/copy_up.c   |  97 +++++++++-------
>  fs/overlayfs/dir.c       | 133 +++++++++++-----------
>  fs/overlayfs/export.c    |   3 +-
>  fs/overlayfs/file.c      |  44 ++++----
>  fs/overlayfs/inode.c     |  63 +++++++----
>  fs/overlayfs/namei.c     |  49 ++++++---
>  fs/overlayfs/overlayfs.h | 231 ++++++++++++++++++++++++++++-----------
>  fs/overlayfs/ovl_entry.h |   7 +-
>  fs/overlayfs/readdir.c   |  48 ++++----
>  fs/overlayfs/super.c     |  57 +++++-----
>  fs/overlayfs/util.c      | 142 +++++++++++++++++++-----
>  include/linux/namei.h    |   2 +
>  14 files changed, 607 insertions(+), 326 deletions(-)
> 
> 
> base-commit: f443e374ae131c168a065ea1748feac6b2e76613
> -- 
> 2.32.0
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/18] overlay: support idmapped layers
  2022-03-30 20:58 ` Vivek Goyal
@ 2022-03-31  8:47   ` Christian Brauner
  2022-03-31  9:55     ` Giuseppe Scrivano
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Brauner @ 2022-03-31  8:47 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, Christoph Hellwig,
	Aleksa Sarai, Giuseppe Scrivano, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

On Wed, Mar 30, 2022 at 04:58:03PM -0400, Vivek Goyal wrote:
> On Tue, Mar 29, 2022 at 12:35:07PM +0200, Christian Brauner wrote:
> > From: "Christian Brauner (Microsoft)" <brauner@kernel.org>
> > 
> > Hey,
> > 
> > This adds support for mounting overlay on top of idmapped layers.
> > 
> > I have to start by saying a massive thank you to Amir! He did not just
> > answer my constant overlay questions but also provided quite a few
> > patches himself in this series in addition to reviews, comments and a
> > lot of suggestions. Thank you!
> > 
> > There have been a lot of requests to unblock this. For just a few select
> > examples see [3], [4], and [5]. I've worked closely with various
> > communities among them containerd, Kubernetes, Podman, LXD, runC, crun,
> > and systemd (For the curious please see the various pull-request and
> > issues below.) a lot of them already support idmapped mounts since they
> > are enabled for btrfs, ext4, and xfs (and f2fs and fat fwiw). In
> > additon, a few colleagues at Microsoft and from Red Hat work on a
> > Kubernetes Enhancement Proposals (KEP) that also relies on overlayfs
> > supporting idmapped layers, see [12].
> > 
> > Overlayfs on top of idmapped layers will be used in various ways:
> > 
> > * Container managers use overlayfs to efficiently share layers between
> >   containers. However, this only works for privileged containers.
> >   Layers cannot be shared if both privileged and unprivileged containers
> >   are used. Layers can also not be shared if non-overlapping idmappings
> >   are used for unprivileged containers. Layers cannot be shared because
> >   of the conflicting ownership requirements between the containers.
> 
> Hi Christian,
> 
> Thank you for this work. This is awesome.

Thanks for saying that! Appreciate it.

> 
> Wanted to test it. I was wondering how to test it. Some simple
> instructions will help.

Of course. There are various ways:

1. Giuseppe has already put up support the containers/storage go library:
   https://github.com/containers/storage/pull/1180
   and he's been running workloads with that branch. If you're familiar
   with that tooling you can just use that.

2. xfstests have - for a long time already - support for a generic
   idmapped mount testsuite. Alongside that testsuite I added the
   "mount-idmapped" binary which makes it easy to create idmapped
   mounts.

   You can compile it via xfstests:
   https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/mount-idmapped.c

I've pushed detailed instructions to the test repo at 
https://gitlab.com/brauner/fs.idmapped.overlay.xfstests.output
They are avaiable in the "manual.test.instructions" file.

> 
> I am assuming I will need to be priviliged to be able to setup idmapped
> mounts. And then overlay can be mounted on top of these layers (either

Yes.

> as a priviliged operation outside container or unprivileged overlayfs
> inside container). Sorry, just trying to wrap my head around which

Correct.

> operations will require me to be priviliged (root in init_user_ns)
> and which operations I can do unprivileged (as normal user).
> 
> Thanks
> Vivek
> 
> > 
> >   Both the KEP proposal (see [13]) and LXD (see [14]) use
> >   non-overlapping idmappings to increase isolation.
> > 
> >   All of these cases can be supported if overlayfs can be mounted on
> >   idmapped layers. The container runtime will create idmapped mounts for
> >   the layers supposed to be shared. Each container will be given
> >   idmapped mounts with the correct idmapping. Either by attaching a
> >   custom or its own user namespace depending on the use-case. Then an
> >   overlay mount can be mounted on top of the idmapped layers. The
> >   underlying idmapped mounts can then be unmounted and the mount table
> >   will be in a clean state.
> >   This approach has been tested an verified by Giuseppe and others.
> > 
> > * Because of the inability to share layers preparing a layer causes a
> >   big runtime overhead as ownership needs to be recursively changed.
> >   This becomes increasingly costly with the bigger the layers are.
> >   Especially for a large rootfs it becomes prohibitively expensive.
> > 
> >   This recursive ownership change also causes a lot storage overhead.
> >   Without metacopy it can quickly become unmanagable. With metacopy it
> >   still wastes a lot of space and inodes.
> > 
> >   For both the KEP and container manager being able to use idmapped
> >   layer with overlayfs on top of it solves all these problems.
> > 
> > * Container managers such as LXD do run full system containers. Such
> >   systems are managed like virtual machines and users run application
> >   containers inside. Since LXD uses an idmapped mounts for the rootfs of
> >   the container with the container's user namespace attached to the
> >   mount container runtimes cannot user overlayfs inside.
> > 
> >   Once overlayfs can be mounted on top of idmapped layers it will be
> >   possible for container runtimes to use overlayfs inside LXD
> >   containers.
> > 
> > * The systemd-homed daemon and toolsuite is a new component of systemd
> >   to manage home areas in a portable way. systemd-homed makes use of
> >   idmapped mounts for the home areas. If the kernel and used file system
> >   support it the user's home area will be mounted as an idmapped mount
> >   when they log into their system.
> > 
> >   All files are internally owned by the "nobody" user (i.e. the user
> >   typically used for indicating "this ownership is not mapped"), and
> >   dynamically mapped to the {g,u}id used locally on the system via
> >   idmapped mounts. This makes migrating home areas between different
> >   systems trivial because recursively chown()ing file system trees is no
> >   longer necessary. This also means that it is impossible to store files
> >   as {g,u}id 0 on disk.
> > 
> >   Once overlayfs can be mounted on top of idmapped layers it will be
> >   possible for container runtimes to work better together with
> >   systemd-homed.
> > 
> > * systemd provides various sandboxing features (see [11]) for services.
> >   These serve as hardening measures. In this context, idmapped mounts
> >   can be used to prevent services from creating files on disk as {g,u}id
> >   0, making specific files inaccessible, or to prevent access to whole
> >   directories. Since such services may also make use of overlayfs for
> >   e.g. the ExtensionImages= option supporting overlayfs on top of
> >   idmapped layers would be another huge hardening win.
> > 
> > * systemd provides the ability to use system extension images [15] for
> >   /usr and /opt (with a look to /etc in the future). Such system
> >   extension images contain files and directories similar in fashion to
> >   a regular OS system tree. When one or more system extension images are
> >   activated, their /usr/ and /opt/ hierarchies are combined via
> >   overlay with the same hierarchies of the host OS, and the host /usr/
> >   and /opt/ overmounted with it ("merging"). These images are read-only.
> > 
> >   This feature is available to unprivileged container and sandboxed
> >   services as well. Idmapped layers are used here to avoid runtime and
> >   storage overhead from recursively changing ownership and ultimately an
> >   overlay mount is supposed to be created on top of it.
> > 
> > Giuseppe provided testing for runC/crun/Podman and he put up a pull
> > request to support overlayfs on top of idmapped layers at [16]. That
> > already covers a lot of users. Other tools have put up pull requests as
> > well and they are linked below.
> > 
> > The patchset has been extensively tested for about 2 weeks with
> > xfstests. The tests and results are explained in the following
> > paragraphs.
> > 
> > In order to test overlayfs with idmapped mounts a simple patch to
> > xfstests has been added which is part of this series. The patchset
> > simply allows for each test in the xfstests suite to be run on top of
> > idmapped mounts. That is in addition to the generic idmapped mount tests
> > that have existed and are already run for a long time.
> > 
> > Since idmapped mounts can be created on top of btrfs, ext4, and xfs and
> > these are the most relevant filesystems for users they were taken into
> > the test matrix.
> > 
> > Amir ensured that the test matrix also includes metacopy. So all tests
> > are run once with metacopy=on and once with metacopy=off.
> > 
> > Additionally, the unionmount tesuite that Amir maintains was run as part
> > of xfstests. This brings testing for multi layer overlay and a few
> > rarely used overlayfs use-cases.
> > 
> > And last, xfstests were run with and without idmapped mounts.
> > 
> > Since the patch series is based on Linux 5.17 the 5.17 kernel was chosen
> > as a baseline kernel. The baseline kernel is pure 5.17 upstream without
> > any of the patches in this series. The baseline kernel was used to run
> > all xfstests with the same parameters minus the idmapped mount part of
> > the test matrix. This ensured that regressions would be immediately
> > noticeable.
> > 
> > So the full test matrix is:
> > 
> > ext4 x metacopy=off x -idmapped mounts
> > ext4 x metacopy=on  x -idmapped mounts
> > ext4 x metacopy=off x +idmapped mounts
> > ext4 x metacopy=on  x +idmapped mounts
> > 
> > xfs x metacopy=off x -idmapped mounts
> > xfs x metacopy=on  x -idmapped mounts
> > xfs x metacopy=off x +idmapped mounts
> > xfs x metacopy=on  x +idmapped mounts
> > 
> > btrfs x metacopy=off x -idmapped mounts
> > btrfs x metacopy=on  x -idmapped mounts
> > btrfs x metacopy=off x +idmapped mounts
> > btrfs x metacopy=on  x +idmapped mounts
> > 
> > The test runs were started with:
> > 
> > sudo ./check -overlay
> > 
> > so they encompass all xfstests apart from those that needed to be
> > excluded because they triggered known issues (One such example is
> > generic/530 which causes an xfs corruption that is currently under
> > discussion for a fix upstream.).
> > 
> > A single testrun with over 750 tests takes a bonkers long time given
> > that I run them in a beefy VM on top of an ssd and not on bare metal.
> > 
> > The successes and failures are identical for the whole test matrix with
> > and without idmapped mounts. The tests and failures are also identical
> > compared to the baseline kernel. All in all this should provide a good
> > amount of convidence. The full test output can be found in the following
> > repo: https://gitlab.com/brauner/fs.idmapped.overlay.xfstests.output
> > 
> > In order to create an idmapped mount the mount_setattr() system call
> > (see [17]) can be used together with the MOUNT_ATTR_IDMAP flag to attach
> > a userns fd to a detached mount created with open_tree()'s
> > OPEN_TREE_CLONE flag. The only effect is that the idmapping of the
> > userns becomes the idmapping of the relevant. The links below should
> > serve to illustrate how widely they are already used.
> > 
> > For people who would like to test I've created a tag fs.idmapped.overlay.v1
> > which can be fetched:
> > 
> > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git fs.idmapped.overlay.v1
> > 
> > the same goes for xfstests:
> > 
> > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git fs.idmapped.overlay.v1
> > 
> > Thanks!
> > Christian
> > 
> > [1]: OCI runtime-spec extension for idmapped mounts
> >      https://github.com/opencontainers/runtime-spec/pull/1143
> > 
> > [2]: Support for idmapped mounts for shared volumes
> >      https://github.com/opencontainers/runc/pull/3429
> > 
> > [3]: containerd support for idmapped mounts with focus on overlayfs
> >      https://github.com/containerd/containerd/pull/5890
> > 
> > [4]: runC support for idmapped mounts with focus on overlayfs
> >      https://github.com/opencontainers/runc/issues/2821
> > 
> > [5]: Podman support for idmapped mounts with focus on overlayfs
> >      https://github.com/containers/podman/issues/10374
> > 
> > [6]: systemd-nspawn support for idmapped mounts
> >      https://github.com/systemd/systemd/pull/19438
> > 
> > [7]: systemd-homed support for idmapped mounts
> >      https://github.com/systemd/systemd/pull/21136
> > 
> > [8]: LXD support for idmapped mounts
> >      https://github.com/lxc/lxd/pull/8778
> > 
> > [9]: LXC support for idmapped mounts
> >      https://github.com/lxc/lxc/pull/3709
> > 
> > [10]: crun support for idmapped mounts
> >       https://github.com/containers/crun/pull/874
> > 
> > [11]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing
> > 
> > [12]: https://github.com/kubernetes/enhancements/pull/3065
> > 
> > [13]: https://github.com/kubernetes/enhancements/pull/3065/files#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaR118
> > 
> > [14]: https://github.com/lxc/lxd/blob/master/doc/instances.md
> >       (The relevant entry can be found by looking for the
> >        "security.idmap.isolated" key.)
> > 
> > [15]: https://www.freedesktop.org/software/systemd/man/systemd-sysext.html
> > 
> > [16]: https://github.com/containers/storage/pull/1180
> > 
> > [17]: https://man7.org/linux/man-pages/man2/mount_setattr.2.html
> > 
> > Amir Goldstein (3):
> >   ovl: use wrappers to all vfs_*xattr() calls
> >   ovl: pass layer mnt to ovl_open_realfile()
> >   ovl: store lower path in ovl_inode
> > 
> > Christian Brauner (15):
> >   fs: add two trivial lookup helpers
> >   exportfs: support idmapped mounts
> >   ovl: pass ofs to creation operations
> >   ovl: handle idmappings in creation operations
> >   ovl: pass ofs to setattr operations
> >   ovl: use ovl_do_notify_change() wrapper
> >   ovl: use ovl_lookup_upper() wrapper
> >   ovl: use ovl_path_getxattr() wrapper
> >   ovl: handle idmappings for layer fileattrs
> >   ovl: handle idmappings for layer lookup
> >   ovl: use ovl_copy_{real,upper}attr() wrappers
> >   ovl: handle idmappings in ovl_permission()
> >   ovl: handle idmappings in layer open helpers
> >   ovl: handle idmappings in ovl_xattr_{g,s}et()
> >   ovl: support idmapped layers
> > 
> >  fs/exportfs/expfs.c      |   5 +-
> >  fs/namei.c               |  52 +++++++--
> >  fs/overlayfs/copy_up.c   |  97 +++++++++-------
> >  fs/overlayfs/dir.c       | 133 +++++++++++-----------
> >  fs/overlayfs/export.c    |   3 +-
> >  fs/overlayfs/file.c      |  44 ++++----
> >  fs/overlayfs/inode.c     |  63 +++++++----
> >  fs/overlayfs/namei.c     |  49 ++++++---
> >  fs/overlayfs/overlayfs.h | 231 ++++++++++++++++++++++++++++-----------
> >  fs/overlayfs/ovl_entry.h |   7 +-
> >  fs/overlayfs/readdir.c   |  48 ++++----
> >  fs/overlayfs/super.c     |  57 +++++-----
> >  fs/overlayfs/util.c      | 142 +++++++++++++++++++-----
> >  include/linux/namei.h    |   2 +
> >  14 files changed, 607 insertions(+), 326 deletions(-)
> > 
> > 
> > base-commit: f443e374ae131c168a065ea1748feac6b2e76613
> > -- 
> > 2.32.0
> > 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/18] overlay: support idmapped layers
  2022-03-31  8:47   ` Christian Brauner
@ 2022-03-31  9:55     ` Giuseppe Scrivano
  0 siblings, 0 replies; 29+ messages in thread
From: Giuseppe Scrivano @ 2022-03-31  9:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Vivek Goyal, Amir Goldstein, Miklos Szeredi, linux-unionfs,
	Christoph Hellwig, Aleksa Sarai, Rodrigo Campos Catelin,
	Seth Forshee, Luca Bocassi, Lennart Poettering,
	Stéphane Graber

Christian Brauner <brauner@kernel.org> writes:

> On Wed, Mar 30, 2022 at 04:58:03PM -0400, Vivek Goyal wrote:
>> On Tue, Mar 29, 2022 at 12:35:07PM +0200, Christian Brauner wrote:
>> > From: "Christian Brauner (Microsoft)" <brauner@kernel.org>
>> > 
>> > Hey,
>> > 
>> > This adds support for mounting overlay on top of idmapped layers.
>> > 
>> > I have to start by saying a massive thank you to Amir! He did not just
>> > answer my constant overlay questions but also provided quite a few
>> > patches himself in this series in addition to reviews, comments and a
>> > lot of suggestions. Thank you!
>> > 
>> > There have been a lot of requests to unblock this. For just a few select
>> > examples see [3], [4], and [5]. I've worked closely with various
>> > communities among them containerd, Kubernetes, Podman, LXD, runC, crun,
>> > and systemd (For the curious please see the various pull-request and
>> > issues below.) a lot of them already support idmapped mounts since they
>> > are enabled for btrfs, ext4, and xfs (and f2fs and fat fwiw). In
>> > additon, a few colleagues at Microsoft and from Red Hat work on a
>> > Kubernetes Enhancement Proposals (KEP) that also relies on overlayfs
>> > supporting idmapped layers, see [12].
>> > 
>> > Overlayfs on top of idmapped layers will be used in various ways:
>> > 
>> > * Container managers use overlayfs to efficiently share layers between
>> >   containers. However, this only works for privileged containers.
>> >   Layers cannot be shared if both privileged and unprivileged containers
>> >   are used. Layers can also not be shared if non-overlapping idmappings
>> >   are used for unprivileged containers. Layers cannot be shared because
>> >   of the conflicting ownership requirements between the containers.
>> 
>> Hi Christian,
>> 
>> Thank you for this work. This is awesome.
>
> Thanks for saying that! Appreciate it.
>
>> 
>> Wanted to test it. I was wondering how to test it. Some simple
>> instructions will help.
>
> Of course. There are various ways:
>
> 1. Giuseppe has already put up support the containers/storage go library:
>    https://github.com/containers/storage/pull/1180
>    and he's been running workloads with that branch. If you're familiar
>    with that tooling you can just use that.

Vivek, if you vendor the containers/storage branch into Podman you can run something like:

# podman run --uidmap 0:2000:40000 --gidmap 0:8000:50000 --rm fedora sh -c 'grep . /proc/self/?id_map; ls -l /'
/proc/self/gid_map:         0       8000      50000
/proc/self/uid_map:         0       2000      40000
total 16
lrwxrwxrwx.   1 root   root      7 Jul 21  2021 bin -> usr/bin
dr-xr-xr-x.   1 root   root      0 Jul 21  2021 boot
drwxr-xr-x.   5 root   root    340 Mar 31 09:46 dev
drwxr-xr-x.   1 root   root   1682 Mar 31 09:45 etc
drwxr-xr-x.   1 root   root      0 Jul 21  2021 home
lrwxrwxrwx.   1 root   root      7 Jul 21  2021 lib -> usr/lib
lrwxrwxrwx.   1 root   root      9 Jul 21  2021 lib64 -> usr/lib64
drwx------.   1 root   root      0 Feb 21 06:47 lost+found
drwxr-xr-x.   1 root   root      0 Jul 21  2021 media
drwxr-xr-x.   1 root   root      0 Jul 21  2021 mnt
drwxr-xr-x.   1 root   root      0 Jul 21  2021 opt
dr-xr-xr-x. 262 nobody nobody    0 Mar 31 09:46 proc
dr-xr-x---.   1 root   root    206 Mar 31 09:45 root
drwxr-xr-x.   1 root   root     40 Mar 31 09:46 run
lrwxrwxrwx.   1 root   root      8 Jul 21  2021 sbin -> usr/sbin
drwxr-xr-x.   1 root   root      0 Jul 21  2021 srv
dr-xr-xr-x.  12 nobody nobody    0 Mar 31 09:42 sys
drwxrwxrwt.   1 root   root      0 Feb 21 06:47 tmp
drwxr-xr-x.   1 root   root    100 Feb 21 06:47 usr
drwxr-xr-x.   1 root   root    154 Feb 21 06:47 var

you'll notice that it is using idmapped mounts because the container
starts immediately when you change the mappings; without these kernel
patches Podman needs to chown the image first.

Regards,
Giuseppe


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-03-31  9:56 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-29 10:35 [PATCH 00/18] overlay: support idmapped layers Christian Brauner
2022-03-29 10:35 ` [PATCH 01/18] fs: add two trivial lookup helpers Christian Brauner
2022-03-29 10:35 ` [PATCH 02/18] exportfs: support idmapped mounts Christian Brauner
2022-03-29 10:35 ` [PATCH 03/18] ovl: use wrappers to all vfs_*xattr() calls Christian Brauner
2022-03-29 11:18   ` Miklos Szeredi
2022-03-29 10:35 ` [PATCH 04/18] ovl: pass ofs to creation operations Christian Brauner
2022-03-29 10:35 ` [PATCH 05/18] ovl: handle idmappings in " Christian Brauner
2022-03-29 11:22   ` Miklos Szeredi
2022-03-29 10:35 ` [PATCH 06/18] ovl: pass ofs to setattr operations Christian Brauner
2022-03-29 10:35 ` [PATCH 07/18] ovl: pass layer mnt to ovl_open_realfile() Christian Brauner
2022-03-29 10:35 ` [PATCH 08/18] ovl: use ovl_do_notify_change() wrapper Christian Brauner
2022-03-29 11:56   ` Miklos Szeredi
2022-03-29 12:36     ` Christian Brauner
2022-03-29 10:35 ` [PATCH 09/18] ovl: use ovl_lookup_upper() wrapper Christian Brauner
2022-03-29 10:35 ` [PATCH 10/18] ovl: use ovl_path_getxattr() wrapper Christian Brauner
2022-03-29 10:35 ` [PATCH 11/18] ovl: handle idmappings for layer fileattrs Christian Brauner
2022-03-29 10:35 ` [PATCH 12/18] ovl: handle idmappings for layer lookup Christian Brauner
2022-03-29 10:35 ` [PATCH 13/18] ovl: store lower path in ovl_inode Christian Brauner
2022-03-29 10:35 ` [PATCH 14/18] ovl: use ovl_copy_{real,upper}attr() wrappers Christian Brauner
2022-03-29 10:35 ` [PATCH 15/18] ovl: handle idmappings in ovl_permission() Christian Brauner
2022-03-29 10:35 ` [PATCH 16/18] ovl: handle idmappings in layer open helpers Christian Brauner
2022-03-29 10:35 ` [PATCH 17/18] ovl: handle idmappings in ovl_xattr_{g,s}et() Christian Brauner
2022-03-29 10:35 ` [PATCH 18/18] ovl: support idmapped layers Christian Brauner
2022-03-29 10:35 ` [PATCH] common: allow to run all tests on idmapped mounts Christian Brauner
2022-03-29 12:25 ` [PATCH 00/18] overlay: support idmapped layers Miklos Szeredi
2022-03-29 15:02   ` Amir Goldstein
2022-03-30 20:58 ` Vivek Goyal
2022-03-31  8:47   ` Christian Brauner
2022-03-31  9:55     ` Giuseppe Scrivano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.