linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl
@ 2019-07-15 13:38 Amir Goldstein
  2019-07-15 13:38 ` [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories Amir Goldstein
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-15 13:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, linux-unionfs

Miklos,

I was trying to think of a way forward w.r.t container runtime
mount leaks and overlayfs mount failures, see [1][2][3].

It does not seem reasonable to expect they will fix all the mount
leaks and that new ones will not show up. It's a hard problem to
solve.

I posted a fix patch to mitigate the recent regression with
index=off [2], but it does not seem reasonable to hold index=on feature
hostage for eternity or until all mount leaks are fixed.

The proposal I have come up with is to provide an API for containers
to shutdown the instance before unmount, so the leaked mounts become
"zombies" with no ability to do any harm beyond hogging resources.

The SHUTDOWN ioctl, used by xfs/ext4/f2fs to stop any access to
underlying blockdev is implemented in overlayfs to stop any access
to underlying layers. The real objects are still referenced, but no
vfs API can be called on underlying layers.

Naturally, SHUTDOWN releases the inuse locks to mitigate the mount
failures.

I wrote an xfstest to verify SHUTDOWN solves the mount leak issue [5].

Thoughts?

Thanks,
Amir.

[1] https://github.com/containers/libpod/issues/3540
[2] https://github.com/moby/moby/issues/39475
[3] https://github.com/coreos/linux/pull/339
[4] https://github.com/amir73il/linux/commits/ovl-overlap-detect-regression-fix
[5] https://github.com/amir73il/xfstests/commit/a56d5560e404cc8052a8e47850676364b5e8c76c

Amir Goldstein (4):
  ovl: support [S|G]ETFLAGS ioctl for directories
  ovl: use generic vfs_ioc_setflags_prepare() helper
  ovl: add pre/post access hooks to underlying layers
  ovl: add support for SHUTDOWN ioctl

 fs/overlayfs/copy_up.c   |  10 +-
 fs/overlayfs/dir.c       |  26 +++--
 fs/overlayfs/file.c      | 200 ++++++++++++++++++++++++++-------------
 fs/overlayfs/inode.c     |  64 +++++++++----
 fs/overlayfs/namei.c     |  15 ++-
 fs/overlayfs/overlayfs.h |  15 ++-
 fs/overlayfs/ovl_entry.h |   7 ++
 fs/overlayfs/readdir.c   |  17 +++-
 fs/overlayfs/super.c     |   9 +-
 fs/overlayfs/util.c      |  75 +++++++++++++--
 10 files changed, 318 insertions(+), 120 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories
  2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
@ 2019-07-15 13:38 ` Amir Goldstein
  2019-07-26  6:57   ` Amir Goldstein
  2019-07-15 13:38 ` [PATCH 2/4] ovl: use generic vfs_ioc_setflags_prepare() helper Amir Goldstein
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2019-07-15 13:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, linux-unionfs

[S|G]ETFLAGS and FS[S|G]ETXATTR ioctls are applicable to both files and
directories, so add ioctl operations to dir as well.

ifdef away compat ioctl implementation to conform to standard practice.

With this change, xfstest generic/079 which tests these ioctls on files
and directories passes.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c      | 10 ++++++----
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/readdir.c   |  4 ++++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index e235a635d9ec..c6426e4d3f1f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -502,7 +502,7 @@ static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
 				   ovl_fsxflags_to_iflags(fa.fsx_xflags));
 }
 
-static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 
@@ -527,8 +527,8 @@ static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	return ret;
 }
 
-static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
-			     unsigned long arg)
+#ifdef CONFIG_COMPAT
+long ovl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case FS_IOC32_GETFLAGS:
@@ -545,6 +545,7 @@ static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
 
 	return ovl_ioctl(file, cmd, arg);
 }
+#endif
 
 enum ovl_copyop {
 	OVL_COPY,
@@ -646,8 +647,9 @@ const struct file_operations ovl_file_operations = {
 	.fallocate	= ovl_fallocate,
 	.fadvise	= ovl_fadvise,
 	.unlocked_ioctl	= ovl_ioctl,
+#ifdef CONFIG_COMPAT
 	.compat_ioctl	= ovl_compat_ioctl,
-
+#endif
 	.copy_file_range	= ovl_copy_file_range,
 	.remap_file_range	= ovl_remap_file_range,
 };
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6934bcf030f0..7c94cc3521cb 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -416,6 +416,8 @@ struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr);
 
 /* file.c */
 extern const struct file_operations ovl_file_operations;
+long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+long ovl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 47a91c9733a5..eff8fbfccc7c 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -907,6 +907,10 @@ const struct file_operations ovl_dir_operations = {
 	.llseek		= ovl_dir_llseek,
 	.fsync		= ovl_dir_fsync,
 	.release	= ovl_dir_release,
+	.unlocked_ioctl	= ovl_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ovl_compat_ioctl,
+#endif
 };
 
 int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] ovl: use generic vfs_ioc_setflags_prepare() helper
  2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
  2019-07-15 13:38 ` [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories Amir Goldstein
@ 2019-07-15 13:38 ` Amir Goldstein
  2019-07-15 13:38 ` [PATCH 3/4] ovl: add pre/post access hooks to underlying layers Amir Goldstein
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-15 13:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, linux-unionfs

Canonalize to ioctl FS_* flags instead of inode S_* flags.

Note that we do not call the helper vfs_ioc_fssetxattr_check()
for FS_IOC_FSSETXATTR ioctl. The reason is that underlying filesystem
will perform all the checks. We only need to perform the capability
check before overriding credentials.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 62 ++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c6426e4d3f1f..1ba40845fba0 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -406,12 +406,28 @@ static long ovl_real_ioctl(struct file *file, unsigned int cmd,
 	return ret;
 }
 
+static unsigned int ovl_iflags_to_fsflags(unsigned int iflags)
+{
+	unsigned int flags = 0;
+
+	if (iflags & S_SYNC)
+		flags |= FS_SYNC_FL;
+	if (iflags & S_APPEND)
+		flags |= FS_APPEND_FL;
+	if (iflags & S_IMMUTABLE)
+		flags |= FS_IMMUTABLE_FL;
+	if (iflags & S_NOATIME)
+		flags |= FS_NOATIME_FL;
+
+	return flags;
+}
+
 static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
-				unsigned long arg, unsigned int iflags)
+				unsigned long arg, unsigned int flags)
 {
 	long ret;
 	struct inode *inode = file_inode(file);
-	unsigned int old_iflags;
+	unsigned int oldflags;
 
 	if (!inode_owner_or_capable(inode))
 		return -EACCES;
@@ -423,10 +439,9 @@ static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
 	inode_lock(inode);
 
 	/* Check the capability before cred override */
-	ret = -EPERM;
-	old_iflags = READ_ONCE(inode->i_flags);
-	if (((iflags ^ old_iflags) & (S_APPEND | S_IMMUTABLE)) &&
-	    !capable(CAP_LINUX_IMMUTABLE))
+	oldflags = ovl_iflags_to_fsflags(READ_ONCE(inode->i_flags));
+	ret = vfs_ioc_setflags_prepare(inode, oldflags, flags);
+	if (ret)
 		goto unlock;
 
 	ret = ovl_maybe_copy_up(file_dentry(file), O_WRONLY);
@@ -445,22 +460,6 @@ static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
 
 }
 
-static unsigned int ovl_fsflags_to_iflags(unsigned int flags)
-{
-	unsigned int iflags = 0;
-
-	if (flags & FS_SYNC_FL)
-		iflags |= S_SYNC;
-	if (flags & FS_APPEND_FL)
-		iflags |= S_APPEND;
-	if (flags & FS_IMMUTABLE_FL)
-		iflags |= S_IMMUTABLE;
-	if (flags & FS_NOATIME_FL)
-		iflags |= S_NOATIME;
-
-	return iflags;
-}
-
 static long ovl_ioctl_set_fsflags(struct file *file, unsigned int cmd,
 				  unsigned long arg)
 {
@@ -469,24 +468,23 @@ static long ovl_ioctl_set_fsflags(struct file *file, unsigned int cmd,
 	if (get_user(flags, (int __user *) arg))
 		return -EFAULT;
 
-	return ovl_ioctl_set_flags(file, cmd, arg,
-				   ovl_fsflags_to_iflags(flags));
+	return ovl_ioctl_set_flags(file, cmd, arg, flags);
 }
 
-static unsigned int ovl_fsxflags_to_iflags(unsigned int xflags)
+static unsigned int ovl_fsxflags_to_fsflags(unsigned int xflags)
 {
-	unsigned int iflags = 0;
+	unsigned int flags = 0;
 
 	if (xflags & FS_XFLAG_SYNC)
-		iflags |= S_SYNC;
+		flags |= FS_SYNC_FL;
 	if (xflags & FS_XFLAG_APPEND)
-		iflags |= S_APPEND;
+		flags |= FS_APPEND_FL;
 	if (xflags & FS_XFLAG_IMMUTABLE)
-		iflags |= S_IMMUTABLE;
+		flags |= FS_IMMUTABLE_FL;
 	if (xflags & FS_XFLAG_NOATIME)
-		iflags |= S_NOATIME;
+		flags |= FS_NOATIME_FL;
 
-	return iflags;
+	return flags;
 }
 
 static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
@@ -499,7 +497,7 @@ static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
 		return -EFAULT;
 
 	return ovl_ioctl_set_flags(file, cmd, arg,
-				   ovl_fsxflags_to_iflags(fa.fsx_xflags));
+				   ovl_fsxflags_to_fsflags(fa.fsx_xflags));
 }
 
 long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] ovl: add pre/post access hooks to underlying layers
  2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
  2019-07-15 13:38 ` [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories Amir Goldstein
  2019-07-15 13:38 ` [PATCH 2/4] ovl: use generic vfs_ioc_setflags_prepare() helper Amir Goldstein
@ 2019-07-15 13:38 ` Amir Goldstein
  2019-07-15 13:38 ` [PATCH 4/4] ovl: add support for SHUTDOWN ioctl Amir Goldstein
  2019-07-16  5:35 ` [PFC][PATCH 0/4] Overlayfs " Amir Goldstein
  4 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-15 13:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, linux-unionfs

write access to underlying layers is surrounded with ovl_want_write()/
ovl_drop_write() pair and error may be returned when write access
is denied.

read/lookup access to underlying layers is also surrounded with
ovl_override_creds()/revert_creds(), but access cannot be denied.

Change all call sites to use ovl_override_creds()/ovl_revert_creds()
pair and allow error to be returned when access to underlying layers
is denied.

This is a prep patch for adding SHUTDOWN ioctl support.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 10 +++--
 fs/overlayfs/dir.c       | 26 +++++++----
 fs/overlayfs/file.c      | 94 +++++++++++++++++++++++++++-------------
 fs/overlayfs/inode.c     | 64 +++++++++++++++++++--------
 fs/overlayfs/namei.c     | 15 ++++---
 fs/overlayfs/overlayfs.h |  3 +-
 fs/overlayfs/readdir.c   | 13 ++++--
 fs/overlayfs/util.c      | 26 ++++++++---
 8 files changed, 174 insertions(+), 77 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b801c6353100..47a924973b32 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -850,10 +850,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 
 int ovl_copy_up_flags(struct dentry *dentry, int flags)
 {
-	int err = 0;
-	const struct cred *old_cred = ovl_override_creds(dentry->d_sb);
+	int err;
+	const struct cred *old_cred;
 	bool disconnected = (dentry->d_flags & DCACHE_DISCONNECTED);
 
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return err;
+
 	/*
 	 * With NFS export, copy up can get called for a disconnected non-dir.
 	 * In this case, we will copy up lower inode to index dir without
@@ -886,7 +890,7 @@ int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		dput(parent);
 		dput(next);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 702aa63f6774..cb43064a6b4e 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -544,7 +544,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 	if (err)
 		return err;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return err;
 
 	/*
 	 * When linking a file with copy up origin into a new parent, mark the
@@ -579,7 +581,7 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 			err = ovl_create_over_whiteout(dentry, inode, attr);
 	}
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return err;
 }
 
@@ -653,9 +655,12 @@ static int ovl_set_link_redirect(struct dentry *dentry)
 	const struct cred *old_cred;
 	int err;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return err;
+
 	err = ovl_set_redirect(dentry, false);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -846,12 +851,15 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 	if (err)
 		goto out_drop_write;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		goto out_drop_write;
+
 	if (!lower_positive)
 		err = ovl_remove_upper(dentry, is_dir, &list);
 	else
 		err = ovl_remove_and_whiteout(dentry, &list);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (!err) {
 		if (is_dir)
 			clear_nlink(dentry->d_inode);
@@ -1096,7 +1104,9 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 		update_nlink = true;
 	}
 
-	old_cred = ovl_override_creds(old->d_sb);
+	err = ovl_override_creds(old->d_sb, &old_cred);
+	if (err)
+		goto out_drop_write;
 
 	if (!list_empty(&list)) {
 		opaquedir = ovl_clear_empty(new, &list);
@@ -1221,7 +1231,7 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
 out_unlock:
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(old->d_sb, old_cred);
 	if (update_nlink)
 		ovl_nlink_end(new);
 out_drop_write:
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 1ba40845fba0..dbcf7549068d 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -28,11 +28,15 @@ static struct file *ovl_open_realfile(const struct file *file,
 	struct file *realfile;
 	const struct cred *old_cred;
 	int flags = file->f_flags | O_NOATIME | FMODE_NONOTIFY;
+	int err;
+
+	err = ovl_override_creds(inode->i_sb, &old_cred);
+	if (err)
+		return ERR_PTR(err);
 
-	old_cred = ovl_override_creds(inode->i_sb);
 	realfile = open_with_fake_path(&file->f_path, flags, realinode,
 				       current_cred());
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
 		 file, file, ovl_whatisit(inode, realinode), file->f_flags,
@@ -174,9 +178,11 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 	inode_lock(inode);
 	real.file->f_pos = file->f_pos;
 
-	old_cred = ovl_override_creds(inode->i_sb);
-	ret = vfs_llseek(real.file, offset, whence);
-	revert_creds(old_cred);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (!ret) {
+		ret = vfs_llseek(real.file, offset, whence);
+		ovl_revert_creds(inode->i_sb, old_cred);
+	}
 
 	file->f_pos = real.file->f_pos;
 	inode_unlock(inode);
@@ -228,6 +234,7 @@ static rwf_t ovl_iocb_to_rwf(struct kiocb *iocb)
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
 	struct fd real;
 	const struct cred *old_cred;
 	ssize_t ret;
@@ -239,13 +246,17 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (ret)
 		return ret;
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (ret)
+		goto out_fdput;
+
 	ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
 			    ovl_iocb_to_rwf(iocb));
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	ovl_file_accessed(file);
 
+out_fdput:
 	fdput(real);
 
 	return ret;
@@ -273,16 +284,20 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (ret)
 		goto out_unlock;
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (ret)
+		goto out_fdput;
+
 	file_start_write(real.file);
 	ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
 			     ovl_iocb_to_rwf(iocb));
 	file_end_write(real.file);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(ovl_inode_real(inode), inode);
 
+out_fdput:
 	fdput(real);
 
 out_unlock:
@@ -293,6 +308,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
+	struct inode *inode = file_inode(file);
 	struct fd real;
 	const struct cred *old_cred;
 	int ret;
@@ -302,10 +318,12 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 
 	/* Don't sync lower file for fear of receiving EROFS error */
-	if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
-		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		ret = vfs_fsync_range(real.file, start, end, datasync);
-		revert_creds(old_cred);
+	if (file_inode(real.file) == ovl_inode_upper(inode)) {
+		ret = ovl_override_creds(inode->i_sb, &old_cred);
+		if (!ret) {
+			ret = vfs_fsync_range(real.file, start, end, datasync);
+			ovl_revert_creds(inode->i_sb, old_cred);
+		}
 	}
 
 	fdput(real);
@@ -316,6 +334,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct file *realfile = file->private_data;
+	struct inode *inode = file_inode(file);
 	const struct cred *old_cred;
 	int ret;
 
@@ -327,24 +346,25 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 
 	vma->vm_file = get_file(realfile);
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = call_mmap(vma->vm_file, vma);
-	revert_creds(old_cred);
-
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (!ret) {
+		ret = call_mmap(vma->vm_file, vma);
+		ovl_revert_creds(inode->i_sb, old_cred);
+	}
 	if (ret) {
 		/* Drop reference count from new vm_file value */
 		fput(realfile);
 	} else {
 		/* Drop reference count from previous vm_file value */
 		fput(file);
+		ovl_file_accessed(file);
 	}
 
-	ovl_file_accessed(file);
-
 	return ret;
 }
 
-static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
+static long ovl_fallocate(struct file *file, int mode, loff_t offset,
+			  loff_t len)
 {
 	struct inode *inode = file_inode(file);
 	struct fd real;
@@ -355,13 +375,17 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	if (ret)
 		return ret;
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (ret)
+		goto out_fdput;
+
 	ret = vfs_fallocate(real.file, mode, offset, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(ovl_inode_real(inode), inode);
 
+out_fdput:
 	fdput(real);
 
 	return ret;
@@ -369,6 +393,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 {
+	struct inode *inode = file_inode(file);
 	struct fd real;
 	const struct cred *old_cred;
 	int ret;
@@ -377,9 +402,11 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 	if (ret)
 		return ret;
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fadvise(real.file, offset, len, advice);
-	revert_creds(old_cred);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (!ret) {
+		ret = vfs_fadvise(real.file, offset, len, advice);
+		ovl_revert_creds(inode->i_sb, old_cred);
+	}
 
 	fdput(real);
 
@@ -389,6 +416,7 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 static long ovl_real_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
+	struct inode *inode = file_inode(file);
 	struct fd real;
 	const struct cred *old_cred;
 	long ret;
@@ -397,9 +425,11 @@ static long ovl_real_ioctl(struct file *file, unsigned int cmd,
 	if (ret)
 		return ret;
 
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_ioctl(real.file, cmd, arg);
-	revert_creds(old_cred);
+	ret = ovl_override_creds(inode->i_sb, &old_cred);
+	if (!ret) {
+		ret = vfs_ioctl(real.file, cmd, arg);
+		ovl_revert_creds(inode->i_sb, old_cred);
+	}
 
 	fdput(real);
 
@@ -570,7 +600,10 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 		return ret;
 	}
 
-	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
+	ret = ovl_override_creds(inode_out->i_sb, &old_cred);
+	if (ret)
+		goto out_fdput;
+
 	switch (op) {
 	case OVL_COPY:
 		ret = vfs_copy_file_range(real_in.file, pos_in,
@@ -588,11 +621,12 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 						flags);
 		break;
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(inode_out->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(ovl_inode_real(inode_out), inode_out);
 
+out_fdput:
 	fdput(real_in);
 	fdput(real_out);
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 7663aeb85fa3..f99833161e2b 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -59,9 +59,11 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 			attr->ia_valid &= ~ATTR_MODE;
 
 		inode_lock(upperdentry->d_inode);
-		old_cred = ovl_override_creds(dentry->d_sb);
-		err = notify_change(upperdentry, attr, NULL);
-		revert_creds(old_cred);
+		err = ovl_override_creds(dentry->d_sb, &old_cred);
+		if (!err) {
+			err = notify_change(upperdentry, attr, NULL);
+			ovl_revert_creds(dentry->d_sb, old_cred);
+		}
 		if (!err)
 			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
 		inode_unlock(upperdentry->d_inode);
@@ -154,7 +156,10 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	metacopy_blocks = ovl_is_metacopy_dentry(dentry);
 
 	type = ovl_path_real(dentry, &realpath);
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return err;
+
 	err = vfs_getattr(&realpath, stat, request_mask, flags);
 	if (err)
 		goto out;
@@ -257,7 +262,7 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 		stat->nlink = dentry->d_inode->i_nlink;
 
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -283,7 +288,10 @@ int ovl_permission(struct inode *inode, int mask)
 	if (err)
 		return err;
 
-	old_cred = ovl_override_creds(inode->i_sb);
+	err = ovl_override_creds(inode->i_sb, &old_cred);
+	if (err)
+		return err;
+
 	if (!upperinode &&
 	    !special_file(realinode->i_mode) && mask & MAY_WRITE) {
 		mask &= ~(MAY_WRITE | MAY_APPEND);
@@ -291,7 +299,7 @@ int ovl_permission(struct inode *inode, int mask)
 		mask |= MAY_READ;
 	}
 	err = inode_permission(realinode, mask);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
@@ -302,13 +310,17 @@ static const char *ovl_get_link(struct dentry *dentry,
 {
 	const struct cred *old_cred;
 	const char *p;
+	int err;
 
 	if (!dentry)
 		return ERR_PTR(-ECHILD);
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return ERR_PTR(err);
+
 	p = vfs_get_link(ovl_dentry_real(dentry), done);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return p;
 }
 
@@ -344,14 +356,17 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		realdentry = ovl_dentry_upper(dentry);
 	}
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		goto out_drop_write;
+
 	if (value)
 		err = vfs_setxattr(realdentry, name, value, size, flags);
 	else {
 		WARN_ON(flags != XATTR_REPLACE);
 		err = vfs_removexattr(realdentry, name);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	/* copy c/mtime */
 	ovl_copyattr(d_inode(realdentry), inode);
@@ -370,9 +385,12 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 	struct dentry *realdentry =
 		ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry);
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	res = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (res)
+		return res;
+
 	res = vfs_getxattr(realdentry, name, value, size);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return res;
 }
 
@@ -394,9 +412,13 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
 	char *s;
 	const struct cred *old_cred;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	res = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (res)
+		return res;
+
 	res = vfs_listxattr(realdentry, list, size);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
+
 	if (res <= 0 || size == 0)
 		return res;
 
@@ -429,9 +451,11 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 	if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode))
 		return NULL;
 
-	old_cred = ovl_override_creds(inode->i_sb);
+	if (ovl_override_creds(inode->i_sb, &old_cred))
+		return NULL;
+
 	acl = get_acl(realinode, type);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return acl;
 }
@@ -463,13 +487,15 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	if (!realinode->i_op->fiemap)
 		return -EOPNOTSUPP;
 
-	old_cred = ovl_override_creds(inode->i_sb);
+	err = ovl_override_creds(inode->i_sb, &old_cred);
+	if (err)
+		return err;
 
 	if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC)
 		filemap_write_and_wait(realinode->i_mapping);
 
 	err = realinode->i_op->fiemap(realinode, fieinfo, start, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index e9717c2f7d45..5aa448ca0bb5 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -836,7 +836,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > ofs->namelen)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return ERR_PTR(err);
+
 	upperdir = ovl_dentry_upper(dentry->d_parent);
 	if (upperdir) {
 		err = ovl_lookup_layer(upperdir, &d, &upperdentry);
@@ -1074,7 +1077,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_free_oe;
 	}
 
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (origin_path) {
 		dput(origin_path->dentry);
 		kfree(origin_path);
@@ -1101,7 +1104,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	kfree(upperredirect);
 out:
 	kfree(d.redirect);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return ERR_PTR(err);
 }
 
@@ -1125,7 +1128,9 @@ bool ovl_lower_positive(struct dentry *dentry)
 	if (!ovl_dentry_upper(dentry))
 		return true;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	if (ovl_override_creds(dentry->d_sb, &old_cred))
+		return false;
+
 	/* Positive upper -> have to look up lower to see whether it exists */
 	for (i = 0; !done && !positive && i < poe->numlower; i++) {
 		struct dentry *this;
@@ -1155,7 +1160,7 @@ bool ovl_lower_positive(struct dentry *dentry)
 			dput(this);
 		}
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return positive;
 }
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7c94cc3521cb..72f0d84d2d4b 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -204,7 +204,8 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
-const struct cred *ovl_override_creds(struct super_block *sb);
+int ovl_override_creds(struct super_block *sb, const struct cred **old_cred);
+void ovl_revert_creds(struct super_block *sb, const struct cred *old_cred);
 struct super_block *ovl_same_sb(struct super_block *sb);
 int ovl_can_decode_fh(struct super_block *sb);
 struct dentry *ovl_indexdir(struct super_block *sb);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index eff8fbfccc7c..b13f4e983784 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -271,7 +271,9 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
 	struct dentry *dentry;
 	const struct cred *old_cred;
 
-	old_cred = ovl_override_creds(rdd->dentry->d_sb);
+	err = ovl_override_creds(rdd->dentry->d_sb, &old_cred);
+	if (err)
+		return err;
 
 	err = down_write_killable(&dir->d_inode->i_rwsem);
 	if (!err) {
@@ -286,7 +288,7 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
 		}
 		inode_unlock(dir->d_inode);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(rdd->dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -920,9 +922,12 @@ int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
 	struct rb_root root = RB_ROOT;
 	const struct cred *old_cred;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		return err;
+
 	err = ovl_dir_read_merged(dentry, list, &root);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (err)
 		return err;
 
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f5678a3f8350..146b351a0d84 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -33,11 +33,17 @@ struct dentry *ovl_workdir(struct dentry *dentry)
 	return ofs->workdir;
 }
 
-const struct cred *ovl_override_creds(struct super_block *sb)
+int ovl_override_creds(struct super_block *sb, const struct cred **old_cred)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 
-	return override_creds(ofs->creator_cred);
+	*old_cred = override_creds(ofs->creator_cred);
+	return 0;
+}
+
+void ovl_revert_creds(struct super_block *sb, const struct cred *old_cred)
+{
+	revert_creds(old_cred);
 }
 
 struct super_block *ovl_same_sb(struct super_block *sb)
@@ -783,7 +789,10 @@ int ovl_nlink_start(struct dentry *dentry)
 	if (d_is_dir(dentry) || !ovl_test_flag(OVL_INDEX, inode))
 		goto out;
 
-	old_cred = ovl_override_creds(dentry->d_sb);
+	err = ovl_override_creds(dentry->d_sb, &old_cred);
+	if (err)
+		goto out;
+
 	/*
 	 * The overlay inode nlink should be incremented/decremented IFF the
 	 * upper operation succeeds, along with nlink change of upper inode.
@@ -791,7 +800,7 @@ int ovl_nlink_start(struct dentry *dentry)
 	 * value relative to the upper inode nlink in an upper inode xattr.
 	 */
 	err = ovl_set_nlink_upper(dentry);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 out:
 	if (err)
@@ -806,10 +815,13 @@ void ovl_nlink_end(struct dentry *dentry)
 
 	if (ovl_test_flag(OVL_INDEX, inode) && inode->i_nlink == 0) {
 		const struct cred *old_cred;
+		int err;
 
-		old_cred = ovl_override_creds(dentry->d_sb);
-		ovl_cleanup_index(dentry);
-		revert_creds(old_cred);
+		err = ovl_override_creds(dentry->d_sb, &old_cred);
+		if (!err) {
+			ovl_cleanup_index(dentry);
+			ovl_revert_creds(dentry->d_sb, old_cred);
+		}
 	}
 
 	ovl_inode_unlock(inode);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] ovl: add support for SHUTDOWN ioctl
  2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
                   ` (2 preceding siblings ...)
  2019-07-15 13:38 ` [PATCH 3/4] ovl: add pre/post access hooks to underlying layers Amir Goldstein
@ 2019-07-15 13:38 ` Amir Goldstein
  2019-07-16  5:35 ` [PFC][PATCH 0/4] Overlayfs " Amir Goldstein
  4 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-15 13:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, linux-unionfs

Keep accounting of active users accessing underlying layers.
On SHUTDOWN ioctl, deny new users from accessing underlying layers.

After SHUTDOWN ioctl, when active users count drops to zero, we release
the inuse exclusive locks on upperdir and workdir, because no user can
access those dirs from this instance anymore.

This will allow container runtimes to issue the SHUTDOWN ioctl before
unmounting overlayfs to mitigate overlayfs mount failures that result
from mount leaks of old overlayfs mounts.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c      | 34 ++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h | 10 +++++++-
 fs/overlayfs/ovl_entry.h |  7 ++++++
 fs/overlayfs/super.c     |  9 +++-----
 fs/overlayfs/util.c      | 49 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index dbcf7549068d..2668a48046ef 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -530,6 +530,33 @@ static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
 				   ovl_fsxflags_to_fsflags(fa.fsx_xflags));
 }
 
+static int ovl_ioctl_shutdown(struct super_block *sb, unsigned long arg)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	__u32 flags;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (get_user(flags, (__u32 __user *)arg))
+		return -EFAULT;
+
+	if (flags != OVL_SHUTDOWN_FLAGS_NOSYNC)
+		return -EINVAL;
+
+	down_write(&sb->s_umount);
+
+	if (!ofs->goingdown) {
+		pr_info("overlayfs: shutdown requested\n");
+		ofs->goingdown = true;
+		ovl_drop_active(ofs);
+	}
+
+	up_write(&sb->s_umount);
+
+	return 0;
+}
+
 long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long ret;
@@ -548,6 +575,10 @@ long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		ret = ovl_ioctl_set_fsxflags(file, cmd, arg);
 		break;
 
+	case OVL_IOC_SHUTDOWN:
+		ret = ovl_ioctl_shutdown(file_inode(file)->i_sb, arg);
+		break;
+
 	default:
 		ret = -ENOTTY;
 	}
@@ -567,6 +598,9 @@ long ovl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		cmd = FS_IOC_SETFLAGS;
 		break;
 
+	case OVL_IOC_SHUTDOWN:
+		break;
+
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 72f0d84d2d4b..8f282c722b00 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -28,6 +28,14 @@ enum ovl_path_type {
 #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper"
 #define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
 
+/*
+ * Should be same as XFS_IOC_GOINGDOWN.
+ * We only support the NOSYNC flag.
+ */
+#define OVL_IOC_SHUTDOWN		_IOR('X', 125, __u32)
+#define OVL_SHUTDOWN_FLAGS_NOSYNC	0x2
+
+
 enum ovl_inode_flag {
 	/* Pure upper dir that may contain non pure upper entries */
 	OVL_IMPURE,
@@ -201,6 +209,7 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
 }
 
 /* util.c */
+void ovl_drop_active(struct ovl_fs *ofs);
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
@@ -301,7 +310,6 @@ static inline void ovl_inode_unlock(struct inode *inode)
 	mutex_unlock(&OVL_I(inode)->lock);
 }
 
-
 /* namei.c */
 int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
 struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index a8279280e88d..5449222621c0 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -71,6 +71,13 @@ struct ovl_fs {
 	struct inode *indexdir_trap;
 	/* Inode numbers in all layers do not use the high xino_bits */
 	unsigned int xino_bits;
+	/*
+	 * Number of users currently accessing underlying layers (+1)
+	 * When zero, access to underlying layers is denied.
+	 */
+	atomic_t active;
+	/* Filesystem will shutdown when the last active reference drops */
+	bool goingdown;
 };
 
 /* private information held for every overlayfs dentry */
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index afbcb116a7f1..119610bef31a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -212,17 +212,14 @@ static void ovl_free_fs(struct ovl_fs *ofs)
 {
 	unsigned i;
 
+	ovl_drop_active(ofs);
 	iput(ofs->workbasedir_trap);
 	iput(ofs->indexdir_trap);
 	iput(ofs->workdir_trap);
 	iput(ofs->upperdir_trap);
 	dput(ofs->indexdir);
 	dput(ofs->workdir);
-	if (ofs->workdir_locked)
-		ovl_inuse_unlock(ofs->workbasedir);
 	dput(ofs->workbasedir);
-	if (ofs->upperdir_locked)
-		ovl_inuse_unlock(ofs->upper_mnt->mnt_root);
 	mntput(ofs->upper_mnt);
 	for (i = 0; i < ofs->numlower; i++) {
 		iput(ofs->lower_layers[i].trap);
@@ -237,8 +234,6 @@ static void ovl_free_fs(struct ovl_fs *ofs)
 	kfree(ofs->config.upperdir);
 	kfree(ofs->config.workdir);
 	kfree(ofs->config.redirect_mode);
-	if (ofs->creator_cred)
-		put_cred(ofs->creator_cred);
 	kfree(ofs);
 }
 
@@ -1570,6 +1565,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ofs)
 		goto out;
 
+	atomic_set(&ofs->active, 1);
+
 	ofs->creator_cred = cred = prepare_creds();
 	if (!cred)
 		goto out_err;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 146b351a0d84..5e622c00bd82 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -15,16 +15,58 @@
 #include <linux/ratelimit.h>
 #include "overlayfs.h"
 
+static bool ovl_get_active(struct ovl_fs *ofs)
+{
+	if (!atomic_inc_not_zero(&ofs->active))
+		return false;
+
+	/*
+	 * Not handing out any more active references when filesystem is
+	 * going down. The atomic ops on ofs->active serve as memory barriers
+	 * for the ofs->goingdown test.
+	 */
+	if (!ofs->goingdown)
+		return true;
+
+	ovl_drop_active(ofs);
+	return false;
+}
+
+void ovl_drop_active(struct ovl_fs *ofs)
+{
+	if (atomic_dec_and_test(&ofs->active)) {
+		/*
+		 * No users will ever access underlying layers from this
+		 * instance, so we can release exclusive inuse locks.
+		 */
+		if (ofs->workdir_locked)
+			ovl_inuse_unlock(ofs->workbasedir);
+		if (ofs->upperdir_locked)
+			ovl_inuse_unlock(ofs->upper_mnt->mnt_root);
+		/* "revoke" credentials to access underlying layers */
+		put_cred(ofs->creator_cred);
+		ofs->creator_cred = NULL;
+		if (ofs->goingdown)
+			pr_info("overlayfs: filesystem is shutdown\n");
+	}
+}
+
 int ovl_want_write(struct dentry *dentry)
 {
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
+	if (!ovl_get_active(ofs))
+		return -EIO;
+
 	return mnt_want_write(ofs->upper_mnt);
 }
 
 void ovl_drop_write(struct dentry *dentry)
 {
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
 	mnt_drop_write(ofs->upper_mnt);
+	ovl_drop_active(ofs);
 }
 
 struct dentry *ovl_workdir(struct dentry *dentry)
@@ -37,6 +79,12 @@ int ovl_override_creds(struct super_block *sb, const struct cred **old_cred)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 
+	if (!ovl_get_active(ofs))
+		return -EIO;
+
+	if (WARN_ON_ONCE(!ofs->creator_cred))
+		return -EIO;
+
 	*old_cred = override_creds(ofs->creator_cred);
 	return 0;
 }
@@ -44,6 +92,7 @@ int ovl_override_creds(struct super_block *sb, const struct cred **old_cred)
 void ovl_revert_creds(struct super_block *sb, const struct cred *old_cred)
 {
 	revert_creds(old_cred);
+	ovl_drop_active(sb->s_fs_info);
 }
 
 struct super_block *ovl_same_sb(struct super_block *sb)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl
  2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
                   ` (3 preceding siblings ...)
  2019-07-15 13:38 ` [PATCH 4/4] ovl: add support for SHUTDOWN ioctl Amir Goldstein
@ 2019-07-16  5:35 ` Amir Goldstein
  4 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-16  5:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, linux-fsdevel, overlayfs, Linux Containers, Colin Walters

CC: containers folks for feedback

On Mon, Jul 15, 2019 at 4:38 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Miklos,
>
> I was trying to think of a way forward w.r.t container runtime
> mount leaks and overlayfs mount failures, see [1][2][3].
>
> It does not seem reasonable to expect they will fix all the mount
> leaks and that new ones will not show up. It's a hard problem to
> solve.
>
> I posted a fix patch to mitigate the recent regression with
> index=off [2], but it does not seem reasonable to hold index=on feature
> hostage for eternity or until all mount leaks are fixed.
>
> The proposal I have come up with is to provide an API for containers
> to shutdown the instance before unmount, so the leaked mounts become
> "zombies" with no ability to do any harm beyond hogging resources.
>
> The SHUTDOWN ioctl, used by xfs/ext4/f2fs to stop any access to
> underlying blockdev is implemented in overlayfs to stop any access
> to underlying layers. The real objects are still referenced, but no
> vfs API can be called on underlying layers.
>
> Naturally, SHUTDOWN releases the inuse locks to mitigate the mount
> failures.
>
> I wrote an xfstest to verify SHUTDOWN solves the mount leak issue [5].
>
> Thoughts?

I had one thought myself:
On kernel < v4.19 a SHUTDOWN ioctl on an overlayfs file will have
a completely different consequence - the underlying fs would shutdown!
So we have several options:
1. Allow SHUTDOWN only on dir inode (it's usually executed on the
mount root anyway)
2. Introduce a new SHUTDOWN flag that existing no fs (kernel < v4.19) supports
3. Use one of the new mount APIs to request "umount & shutdown sb"
4. Other ideas?

I personally like the compromise of option #1, but I have not spent
time studying option #3 which could be better.

>
> Thanks,
> Amir.
>
> [1] https://github.com/containers/libpod/issues/3540
> [2] https://github.com/moby/moby/issues/39475
> [3] https://github.com/coreos/linux/pull/339
> [4] https://github.com/amir73il/linux/commits/ovl-overlap-detect-regression-fix
> [5] https://github.com/amir73il/xfstests/commit/a56d5560e404cc8052a8e47850676364b5e8c76c
>
> Amir Goldstein (4):
>   ovl: support [S|G]ETFLAGS ioctl for directories
>   ovl: use generic vfs_ioc_setflags_prepare() helper
>   ovl: add pre/post access hooks to underlying layers
>   ovl: add support for SHUTDOWN ioctl
>
>  fs/overlayfs/copy_up.c   |  10 +-
>  fs/overlayfs/dir.c       |  26 +++--
>  fs/overlayfs/file.c      | 200 ++++++++++++++++++++++++++-------------
>  fs/overlayfs/inode.c     |  64 +++++++++----
>  fs/overlayfs/namei.c     |  15 ++-
>  fs/overlayfs/overlayfs.h |  15 ++-
>  fs/overlayfs/ovl_entry.h |   7 ++
>  fs/overlayfs/readdir.c   |  17 +++-
>  fs/overlayfs/super.c     |   9 +-
>  fs/overlayfs/util.c      |  75 +++++++++++++--
>  10 files changed, 318 insertions(+), 120 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories
  2019-07-15 13:38 ` [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories Amir Goldstein
@ 2019-07-26  6:57   ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2019-07-26  6:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, overlayfs

On Mon, Jul 15, 2019 at 4:38 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls are applicable to both files and
> directories, so add ioctl operations to dir as well.
>
> ifdef away compat ioctl implementation to conform to standard practice.
>
> With this change, xfstest generic/079 which tests these ioctls on files
> and directories passes.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/file.c      | 10 ++++++----
>  fs/overlayfs/overlayfs.h |  2 ++
>  fs/overlayfs/readdir.c   |  4 ++++
>  3 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index e235a635d9ec..c6426e4d3f1f 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -502,7 +502,7 @@ static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
>                                    ovl_fsxflags_to_iflags(fa.fsx_xflags));
>  }
>
> -static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  {
>         long ret;
>
> @@ -527,8 +527,8 @@ static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>         return ret;
>  }
>
> -static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
> -                            unsigned long arg)
> +#ifdef CONFIG_COMPAT
> +long ovl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  {
>         switch (cmd) {
>         case FS_IOC32_GETFLAGS:
> @@ -545,6 +545,7 @@ static long ovl_compat_ioctl(struct file *file, unsigned int cmd,
>
>         return ovl_ioctl(file, cmd, arg);
>  }
> +#endif
>
>  enum ovl_copyop {
>         OVL_COPY,
> @@ -646,8 +647,9 @@ const struct file_operations ovl_file_operations = {
>         .fallocate      = ovl_fallocate,
>         .fadvise        = ovl_fadvise,
>         .unlocked_ioctl = ovl_ioctl,
> +#ifdef CONFIG_COMPAT
>         .compat_ioctl   = ovl_compat_ioctl,
> -
> +#endif
>         .copy_file_range        = ovl_copy_file_range,
>         .remap_file_range       = ovl_remap_file_range,
>  };
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 6934bcf030f0..7c94cc3521cb 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -416,6 +416,8 @@ struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr);
>
>  /* file.c */
>  extern const struct file_operations ovl_file_operations;
> +long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> +long ovl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
>
>  /* copy_up.c */
>  int ovl_copy_up(struct dentry *dentry);
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 47a91c9733a5..eff8fbfccc7c 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -907,6 +907,10 @@ const struct file_operations ovl_dir_operations = {
>         .llseek         = ovl_dir_llseek,
>         .fsync          = ovl_dir_fsync,
>         .release        = ovl_dir_release,
> +       .unlocked_ioctl = ovl_ioctl,
> +#ifdef CONFIG_COMPAT
> +       .compat_ioctl   = ovl_compat_ioctl,
> +#endif
>  };
>

Big self NACK!!!

Cannot call ovl_ioctl => ovl_real_ioctl => ovl_real_fdget with a directory.
If we do this need to implement ovl_dir_ioctl and refactor the ioctl helpers.

Sorry,
Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-26  6:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-15 13:38 [PFC][PATCH 0/4] Overlayfs SHUTDOWN ioctl Amir Goldstein
2019-07-15 13:38 ` [PATCH 1/4] ovl: support [S|G]ETFLAGS ioctl for directories Amir Goldstein
2019-07-26  6:57   ` Amir Goldstein
2019-07-15 13:38 ` [PATCH 2/4] ovl: use generic vfs_ioc_setflags_prepare() helper Amir Goldstein
2019-07-15 13:38 ` [PATCH 3/4] ovl: add pre/post access hooks to underlying layers Amir Goldstein
2019-07-15 13:38 ` [PATCH 4/4] ovl: add support for SHUTDOWN ioctl Amir Goldstein
2019-07-16  5:35 ` [PFC][PATCH 0/4] Overlayfs " Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).