All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] overlayfs multiple mount protection
@ 2017-05-23  9:50 Amir Goldstein
  2017-05-23  9:50 ` [PATCH 1/2] vfs: introduce inode 'inuse' lock Amir Goldstein
  2017-05-23  9:50 ` [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
  0 siblings, 2 replies; 11+ messages in thread
From: Amir Goldstein @ 2017-05-23  9:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Miklos,

I've implemented verification that lower root dir matches the origin file
handle stored at upper root dir with the verify_lower mount option [1].

As you correctly noted, before we move on to verifying that upper dir
'belongs to' lower dir and that index dir 'belongs to' upper dir for the
case of mounting not at the same time, we first need to cover the case of
mount at the same time.

This patch set provides protection against reuse of upperdir and workdir
by two different overlay instances at the same time, e.g.:

root@kvm-xfstests:~/unionmount-testsuite# mount -t overlay
overlay on /mnt type overlay (rw,noatime,lowerdir=/lower,upperdir=/upper/0,workdir=/upper/work)
root@kvm-xfstests:~/unionmount-testsuite# mount -t overlay overlay /backup/ -o rw,noatime,lowerdir=/lower,upperdir=/upper/0,workdir=/upper/work
overlayfs: upperdir in-use by another overlay mount?
mount: overlay is already mounted or /backup busy
       overlay is already mounted on /mnt
root@kvm-xfstests:~/unionmount-testsuite# mkdir /upper/1
root@kvm-xfstests:~/unionmount-testsuite# mount -t overlay overlay /snapshot/ -o rw,noatime,lowerdir=/lower,upperdir=/upper/1,workdir=/upper/work
overlayfs: workdir in-use by another overlay mount?
mount: overlay is already mounted or /snapshot busy
       overlay is already mounted on /mnt

It also provides protection against removal of workdir just after mount,
which would have caused failures to copy up:

root@kvm-xfstests:~/unionmount-testsuite# rmdir /upper/work/work/
rmdir: failed to remove '/upper/work/work/': Device or resource busy

[1] https://github.com/amir73il/linux/commits/ovl-verify-dir

Amir Goldstein (2):
  vfs: introduce inode 'inuse' lock
  ovl: get exclusive ownership on upper/work dirs

 fs/btrfs/ioctl.c     |  3 +++
 fs/inode.c           | 40 ++++++++++++++++++++++++++++++
 fs/namei.c           |  3 +++
 fs/overlayfs/super.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h   | 16 ++++++++++++
 5 files changed, 129 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] vfs: introduce inode 'inuse' lock
  2017-05-23  9:50 [PATCH 0/2] overlayfs multiple mount protection Amir Goldstein
@ 2017-05-23  9:50 ` Amir Goldstein
  2017-05-31 10:09   ` Miklos Szeredi
  2017-05-23  9:50 ` [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
  1 sibling, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-05-23  9:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Added an i_state flag I_INUSE and helpers to set/clear/test the bit.

The 'inuse' lock is an 'advisory' inode lock, which also provides
may_delete() protection, so can be used to extend exclusive create
protection beyond parent->i_mutex lock among cooperating users.

This is going to be used by overlayfs to get exclusive ownership
on upper and work dirs among overlayfs mounts.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/btrfs/ioctl.c   |  3 +++
 fs/inode.c         | 40 ++++++++++++++++++++++++++++++++++++++++
 fs/namei.c         |  3 +++
 include/linux/fs.h | 16 ++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e176375..17fa239 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -782,6 +782,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
  *  9. We can't remove a root or mountpoint.
  * 10. We don't allow removal of NFS sillyrenamed files; it's handled by
  *     nfs_async_unlink().
+ * 11. We don't allow removal of inodes marked 'inuse'.
  */
 
 static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
@@ -813,6 +814,8 @@ static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
 		return -ENOENT;
 	if (victim->d_flags & DCACHE_NFSFS_RENAMED)
 		return -EBUSY;
+	if (inode_inuse(d_inode(victim)))
+		return -EBUSY;
 	return 0;
 }
 
diff --git a/fs/inode.c b/fs/inode.c
index db59147..0552c8b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,43 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+/**
+ * inode_inuse_trylock - try to get an exclusive 'inuse' lock on inode
+ * @inode: inode being locked
+ *
+ * The 'inuse' lock is an 'advisory' inode lock, which also provides
+ * may_delete() protection, so can be used to extend exclusive create
+ * protection beyond parent->i_mutex lock among cooperating users.
+ * Used by overlayfs to get exclusive ownership on upper and work dirs
+ * among overlayfs mounts.
+ *
+ * Return true if I_INUSE flag was set by this call.
+ */
+bool inode_inuse_trylock(struct inode *inode)
+{
+	bool locked = false;
+
+	spin_lock(&inode->i_lock);
+	if (!(inode->i_state & (I_FREEING|I_WILL_FREE|I_INUSE))) {
+		inode->i_state |= I_INUSE;
+		locked = true;
+	}
+	spin_unlock(&inode->i_lock);
+	return locked;
+}
+EXPORT_SYMBOL(inode_inuse_trylock);
+
+/*
+ * Non-cooperating users should not be calling this functions and cooperating
+ * users should call this function only if they have the exclusive 'inuse' lock.
+ */
+void inode_inuse_unlock(struct inode *inode)
+{
+	WARN_ON(!inode_inuse(inode));
+
+	spin_lock(&inode->i_lock);
+	inode->i_state &= ~I_INUSE;
+	spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(inode_inuse_unlock);
diff --git a/fs/namei.c b/fs/namei.c
index 837da8b..c371b25 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2790,6 +2790,7 @@ EXPORT_SYMBOL(__check_sticky);
  * 10. We can't remove a root or mountpoint.
  * 11. We don't allow removal of NFS sillyrenamed files; it's handled by
  *     nfs_async_unlink().
+ * 12. We don't allow removal of inodes marked 'inuse'.
  */
 static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
 {
@@ -2823,6 +2824,8 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
 		return -ENOENT;
 	if (victim->d_flags & DCACHE_NFSFS_RENAMED)
 		return -EBUSY;
+	if (inode_inuse(d_inode(victim)))
+		return -EBUSY;
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aab10f9..1420e8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1864,6 +1864,7 @@ struct super_operations {
 #define IS_AUTOMOUNT(inode)	((inode)->i_flags & S_AUTOMOUNT)
 #define IS_NOSEC(inode)		((inode)->i_flags & S_NOSEC)
 #define IS_DAX(inode)		((inode)->i_flags & S_DAX)
+#define IS_INUSE(inode)		((inode)->i_flags & S_INUSE)
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
@@ -1929,6 +1930,13 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
  *			wb stat updates to grab mapping->tree_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
+ * I_INUSE		An 'advisory' bit to get exclusive ownership on inode
+ *			using inode_inuse_trylock().  Also provides may_delete()
+ *			protection, so can be used to extend exclusive create
+ *			protection beyond parent->i_mutex lock.
+ *			Used by overlayfs to get exclusive ownership on upper
+ *			and work dirs among overlayfs mounts.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -1949,6 +1957,7 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 #define __I_DIRTY_TIME_EXPIRED	12
 #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
 #define I_WB_SWITCH		(1 << 13)
+#define I_INUSE			(1 << 14)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
@@ -3258,5 +3267,12 @@ static inline bool dir_relax_shared(struct inode *inode)
 
 extern bool path_noexec(const struct path *path);
 extern void inode_nohighmem(struct inode *inode);
+extern bool inode_inuse_trylock(struct inode *inode);
+extern void inode_inuse_unlock(struct inode *inode);
+
+static inline bool inode_inuse(struct inode *inode)
+{
+	return inode->i_state & I_INUSE;
+}
 
 #endif /* _LINUX_FS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs
  2017-05-23  9:50 [PATCH 0/2] overlayfs multiple mount protection Amir Goldstein
  2017-05-23  9:50 ` [PATCH 1/2] vfs: introduce inode 'inuse' lock Amir Goldstein
@ 2017-05-23  9:50 ` Amir Goldstein
  2017-05-31 10:18   ` Miklos Szeredi
  1 sibling, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-05-23  9:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

Bad things can happen if several concurrent overlay mounts try to
use the same upperdir path and/or workdir path.

Try to get the 'inuse' advisory lock on upper and work dir.
Fail mount if another overlay mount instance or another user
holds the 'inuse' lock.

Note that this provides no protection for concurrent overlay
mount that use overlapping (i.e. descendant) upper dirs or
work dirs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 67 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4882ffb..ac9212d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -165,12 +165,42 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
 	.d_weak_revalidate = ovl_dentry_weak_revalidate,
 };
 
+/* Get exclusive ownership on upper/work dir among overlay mounts */
+static bool ovl_dir_lock(struct dentry *dentry)
+{
+	struct inode *inode;
+
+	if (!dentry)
+		return false;
+
+	inode = d_inode(dentry);
+	if (!inode || inode_inuse(inode))
+		return false;
+
+	return inode_inuse_trylock(inode);
+}
+
+static void ovl_dir_unlock(struct dentry *dentry)
+{
+	struct inode *inode;
+
+	if (!dentry)
+		return;
+
+	inode = d_inode(dentry);
+	if (inode && inode_inuse(inode))
+		inode_inuse_unlock(inode);
+}
+
 static void ovl_put_super(struct super_block *sb)
 {
 	struct ovl_fs *ufs = sb->s_fs_info;
 	unsigned i;
 
+	ovl_dir_unlock(ufs->workdir);
 	dput(ufs->workdir);
+	if (ufs->upper_mnt)
+		ovl_dir_unlock(ufs->upper_mnt->mnt_root);
 	mntput(ufs->upper_mnt);
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
@@ -407,6 +437,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 			if (retried)
 				goto out_dput;
 
+			/*
+			 * We have parent i_mutex, so this test is race free
+			 * w.r.t. ovl_dir_lock() below by another overlay mount.
+			 */
+			err = -EBUSY;
+			if (inode_inuse(work->d_inode))
+				goto out_dput;
+
 			retried = true;
 			ovl_workdir_cleanup(dir, mnt, work, 0);
 			dput(work);
@@ -446,6 +484,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
 		inode_unlock(work->d_inode);
 		if (err)
 			goto out_dput;
+
+		/*
+		 * Protect our work dir from being deleted/renamed and from
+		 * being reused by another overlay mount.
+		 */
+		err = -EBUSY;
+		if (!ovl_dir_lock(work))
+			goto out_dput;
 	}
 out_unlock:
 	inode_unlock(dir);
@@ -849,6 +895,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 			pr_err("overlayfs: failed to clone upperpath\n");
 			goto out_put_lowerpath;
 		}
+		/*
+		 * Protect our upper dir from being deleted/renamed and from
+		 * being reused by another overlay mount.
+		 */
+		err = -EBUSY;
+		if (!ovl_dir_lock(upperpath.dentry)) {
+			pr_err("overlayfs: upperdir in-use by another overlay mount?\n");
+			goto out_put_upper_mnt;
+		}
+
 		/* Don't inherit atime flags */
 		ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
 
@@ -857,6 +913,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
 		err = PTR_ERR(ufs->workdir);
 		if (IS_ERR(ufs->workdir)) {
+			if (err == -EBUSY) {
+				pr_err("overlayfs: workdir in-use by another overlay mount?\n");
+				goto out_unlock_upperdir;
+			}
 			pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
 				ufs->config.workdir, OVL_WORKDIR_NAME, -err);
 			sb->s_flags |= MS_RDONLY;
@@ -874,7 +934,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 			err = ovl_check_d_type_supported(&workpath);
 			if (err < 0)
-				goto out_put_workdir;
+				goto out_unlock_workdir;
 
 			/*
 			 * We allowed this configuration and don't want to
@@ -910,7 +970,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	err = -ENOMEM;
 	ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
 	if (ufs->lower_mnt == NULL)
-		goto out_put_workdir;
+		goto out_unlock_workdir;
 	for (i = 0; i < numlower; i++) {
 		struct vfsmount *mnt = clone_private_mount(&stack[i]);
 
@@ -1002,8 +1062,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	for (i = 0; i < ufs->numlower; i++)
 		mntput(ufs->lower_mnt[i]);
 	kfree(ufs->lower_mnt);
-out_put_workdir:
+out_unlock_workdir:
+	ovl_dir_unlock(ufs->workdir);
 	dput(ufs->workdir);
+out_unlock_upperdir:
+	ovl_dir_unlock(upperpath.dentry);
+out_put_upper_mnt:
 	mntput(ufs->upper_mnt);
 out_put_lowerpath:
 	for (i = 0; i < numlower; i++)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] vfs: introduce inode 'inuse' lock
  2017-05-23  9:50 ` [PATCH 1/2] vfs: introduce inode 'inuse' lock Amir Goldstein
@ 2017-05-31 10:09   ` Miklos Szeredi
  2017-05-31 13:54     ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2017-05-31 10:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Added an i_state flag I_INUSE and helpers to set/clear/test the bit.
>
> The 'inuse' lock is an 'advisory' inode lock, which also provides
> may_delete() protection, so can be used to extend exclusive create
> protection beyond parent->i_mutex lock among cooperating users.
>
> This is going to be used by overlayfs to get exclusive ownership
> on upper and work dirs among overlayfs mounts.

Not sure I like the delete protection.  Any modification of workdir or
layers while mounted might cause inconsistencies or errors in the
overlay.  So why single out deletion of base directories?

Otherwise okay from me.

Thanks,
Miklos

>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/btrfs/ioctl.c   |  3 +++
>  fs/inode.c         | 40 ++++++++++++++++++++++++++++++++++++++++
>  fs/namei.c         |  3 +++
>  include/linux/fs.h | 16 ++++++++++++++++
>  4 files changed, 62 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index e176375..17fa239 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -782,6 +782,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>   *  9. We can't remove a root or mountpoint.
>   * 10. We don't allow removal of NFS sillyrenamed files; it's handled by
>   *     nfs_async_unlink().
> + * 11. We don't allow removal of inodes marked 'inuse'.
>   */
>
>  static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
> @@ -813,6 +814,8 @@ static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
>                 return -ENOENT;
>         if (victim->d_flags & DCACHE_NFSFS_RENAMED)
>                 return -EBUSY;
> +       if (inode_inuse(d_inode(victim)))
> +               return -EBUSY;
>         return 0;
>  }
>
> diff --git a/fs/inode.c b/fs/inode.c
> index db59147..0552c8b 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2120,3 +2120,43 @@ struct timespec current_time(struct inode *inode)
>         return timespec_trunc(now, inode->i_sb->s_time_gran);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +/**
> + * inode_inuse_trylock - try to get an exclusive 'inuse' lock on inode
> + * @inode: inode being locked
> + *
> + * The 'inuse' lock is an 'advisory' inode lock, which also provides
> + * may_delete() protection, so can be used to extend exclusive create
> + * protection beyond parent->i_mutex lock among cooperating users.
> + * Used by overlayfs to get exclusive ownership on upper and work dirs
> + * among overlayfs mounts.
> + *
> + * Return true if I_INUSE flag was set by this call.
> + */
> +bool inode_inuse_trylock(struct inode *inode)
> +{
> +       bool locked = false;
> +
> +       spin_lock(&inode->i_lock);
> +       if (!(inode->i_state & (I_FREEING|I_WILL_FREE|I_INUSE))) {
> +               inode->i_state |= I_INUSE;
> +               locked = true;
> +       }
> +       spin_unlock(&inode->i_lock);
> +       return locked;
> +}
> +EXPORT_SYMBOL(inode_inuse_trylock);
> +
> +/*
> + * Non-cooperating users should not be calling this functions and cooperating
> + * users should call this function only if they have the exclusive 'inuse' lock.
> + */
> +void inode_inuse_unlock(struct inode *inode)
> +{
> +       WARN_ON(!inode_inuse(inode));
> +
> +       spin_lock(&inode->i_lock);
> +       inode->i_state &= ~I_INUSE;
> +       spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(inode_inuse_unlock);
> diff --git a/fs/namei.c b/fs/namei.c
> index 837da8b..c371b25 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2790,6 +2790,7 @@ EXPORT_SYMBOL(__check_sticky);
>   * 10. We can't remove a root or mountpoint.
>   * 11. We don't allow removal of NFS sillyrenamed files; it's handled by
>   *     nfs_async_unlink().
> + * 12. We don't allow removal of inodes marked 'inuse'.
>   */
>  static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
>  {
> @@ -2823,6 +2824,8 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
>                 return -ENOENT;
>         if (victim->d_flags & DCACHE_NFSFS_RENAMED)
>                 return -EBUSY;
> +       if (inode_inuse(d_inode(victim)))
> +               return -EBUSY;
>         return 0;
>  }
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index aab10f9..1420e8b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1864,6 +1864,7 @@ struct super_operations {
>  #define IS_AUTOMOUNT(inode)    ((inode)->i_flags & S_AUTOMOUNT)
>  #define IS_NOSEC(inode)                ((inode)->i_flags & S_NOSEC)
>  #define IS_DAX(inode)          ((inode)->i_flags & S_DAX)
> +#define IS_INUSE(inode)                ((inode)->i_flags & S_INUSE)
>
>  #define IS_WHITEOUT(inode)     (S_ISCHR(inode->i_mode) && \
>                                  (inode)->i_rdev == WHITEOUT_DEV)
> @@ -1929,6 +1930,13 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
>   *                     wb stat updates to grab mapping->tree_lock.  See
>   *                     inode_switch_wb_work_fn() for details.
>   *
> + * I_INUSE             An 'advisory' bit to get exclusive ownership on inode
> + *                     using inode_inuse_trylock().  Also provides may_delete()
> + *                     protection, so can be used to extend exclusive create
> + *                     protection beyond parent->i_mutex lock.
> + *                     Used by overlayfs to get exclusive ownership on upper
> + *                     and work dirs among overlayfs mounts.
> + *
>   * Q: What is the difference between I_WILL_FREE and I_FREEING?
>   */
>  #define I_DIRTY_SYNC           (1 << 0)
> @@ -1949,6 +1957,7 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
>  #define __I_DIRTY_TIME_EXPIRED 12
>  #define I_DIRTY_TIME_EXPIRED   (1 << __I_DIRTY_TIME_EXPIRED)
>  #define I_WB_SWITCH            (1 << 13)
> +#define I_INUSE                        (1 << 14)
>
>  #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
>  #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
> @@ -3258,5 +3267,12 @@ static inline bool dir_relax_shared(struct inode *inode)
>
>  extern bool path_noexec(const struct path *path);
>  extern void inode_nohighmem(struct inode *inode);
> +extern bool inode_inuse_trylock(struct inode *inode);
> +extern void inode_inuse_unlock(struct inode *inode);
> +
> +static inline bool inode_inuse(struct inode *inode)
> +{
> +       return inode->i_state & I_INUSE;
> +}
>
>  #endif /* _LINUX_FS_H */
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs
  2017-05-23  9:50 ` [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
@ 2017-05-31 10:18   ` Miklos Szeredi
  2017-05-31 12:47     ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2017-05-31 10:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Bad things can happen if several concurrent overlay mounts try to
> use the same upperdir path and/or workdir path.
>
> Try to get the 'inuse' advisory lock on upper and work dir.
> Fail mount if another overlay mount instance or another user
> holds the 'inuse' lock.
>
> Note that this provides no protection for concurrent overlay
> mount that use overlapping (i.e. descendant) upper dirs or
> work dirs.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/super.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 67 insertions(+), 3 deletions(-)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 4882ffb..ac9212d 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -165,12 +165,42 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
>         .d_weak_revalidate = ovl_dentry_weak_revalidate,
>  };
>
> +/* Get exclusive ownership on upper/work dir among overlay mounts */
> +static bool ovl_dir_lock(struct dentry *dentry)
> +{
> +       struct inode *inode;
> +
> +       if (!dentry)
> +               return false;
> +
> +       inode = d_inode(dentry);
> +       if (!inode || inode_inuse(inode))
> +               return false;
> +
> +       return inode_inuse_trylock(inode);
> +}
> +
> +static void ovl_dir_unlock(struct dentry *dentry)
> +{
> +       struct inode *inode;
> +
> +       if (!dentry)
> +               return;
> +
> +       inode = d_inode(dentry);
> +       if (inode && inode_inuse(inode))
> +               inode_inuse_unlock(inode);
> +}

Seems a bit overcomplicated.   Aren't we always dealing with positive
dentries?   In which case these can just be trivial wrappers around
inode_inuse_{try|un}lock(), or can be gotten rid of completely.

> +
>  static void ovl_put_super(struct super_block *sb)
>  {
>         struct ovl_fs *ufs = sb->s_fs_info;
>         unsigned i;
>
> +       ovl_dir_unlock(ufs->workdir);
>         dput(ufs->workdir);
> +       if (ufs->upper_mnt)
> +               ovl_dir_unlock(ufs->upper_mnt->mnt_root);
>         mntput(ufs->upper_mnt);
>         for (i = 0; i < ufs->numlower; i++)
>                 mntput(ufs->lower_mnt[i]);
> @@ -407,6 +437,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>                         if (retried)
>                                 goto out_dput;
>
> +                       /*
> +                        * We have parent i_mutex, so this test is race free
> +                        * w.r.t. ovl_dir_lock() below by another overlay mount.
> +                        */
> +                       err = -EBUSY;
> +                       if (inode_inuse(work->d_inode))
> +                               goto out_dput;
> +

Why not lock it here?

>                         retried = true;
>                         ovl_workdir_cleanup(dir, mnt, work, 0);
>                         dput(work);
> @@ -446,6 +484,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>                 inode_unlock(work->d_inode);
>                 if (err)
>                         goto out_dput;
> +
> +               /*
> +                * Protect our work dir from being deleted/renamed and from
> +                * being reused by another overlay mount.
> +                */
> +               err = -EBUSY;
> +               if (!ovl_dir_lock(work))
> +                       goto out_dput;
>         }
>  out_unlock:
>         inode_unlock(dir);
> @@ -849,6 +895,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                         pr_err("overlayfs: failed to clone upperpath\n");
>                         goto out_put_lowerpath;
>                 }
> +               /*
> +                * Protect our upper dir from being deleted/renamed and from
> +                * being reused by another overlay mount.
> +                */
> +               err = -EBUSY;
> +               if (!ovl_dir_lock(upperpath.dentry)) {
> +                       pr_err("overlayfs: upperdir in-use by another overlay mount?\n");
> +                       goto out_put_upper_mnt;
> +               }
> +
>                 /* Don't inherit atime flags */
>                 ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
>
> @@ -857,6 +913,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                 ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
>                 err = PTR_ERR(ufs->workdir);
>                 if (IS_ERR(ufs->workdir)) {
> +                       if (err == -EBUSY) {
> +                               pr_err("overlayfs: workdir in-use by another overlay mount?\n");

Why ask?  Aren't we sure?

> +                               goto out_unlock_upperdir;
> +                       }
>                         pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
>                                 ufs->config.workdir, OVL_WORKDIR_NAME, -err);
>                         sb->s_flags |= MS_RDONLY;
> @@ -874,7 +934,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>
>                         err = ovl_check_d_type_supported(&workpath);
>                         if (err < 0)
> -                               goto out_put_workdir;
> +                               goto out_unlock_workdir;
>
>                         /*
>                          * We allowed this configuration and don't want to
> @@ -910,7 +970,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>         err = -ENOMEM;
>         ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
>         if (ufs->lower_mnt == NULL)
> -               goto out_put_workdir;
> +               goto out_unlock_workdir;
>         for (i = 0; i < numlower; i++) {
>                 struct vfsmount *mnt = clone_private_mount(&stack[i]);
>
> @@ -1002,8 +1062,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>         for (i = 0; i < ufs->numlower; i++)
>                 mntput(ufs->lower_mnt[i]);
>         kfree(ufs->lower_mnt);
> -out_put_workdir:
> +out_unlock_workdir:
> +       ovl_dir_unlock(ufs->workdir);
>         dput(ufs->workdir);
> +out_unlock_upperdir:
> +       ovl_dir_unlock(upperpath.dentry);
> +out_put_upper_mnt:
>         mntput(ufs->upper_mnt);
>  out_put_lowerpath:
>         for (i = 0; i < numlower; i++)
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs
  2017-05-31 10:18   ` Miklos Szeredi
@ 2017-05-31 12:47     ` Amir Goldstein
  2017-05-31 13:05       ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-05-31 12:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 1:18 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Bad things can happen if several concurrent overlay mounts try to
>> use the same upperdir path and/or workdir path.
>>
>> Try to get the 'inuse' advisory lock on upper and work dir.
>> Fail mount if another overlay mount instance or another user
>> holds the 'inuse' lock.
>>
>> Note that this provides no protection for concurrent overlay
>> mount that use overlapping (i.e. descendant) upper dirs or
>> work dirs.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/super.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 67 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index 4882ffb..ac9212d 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -165,12 +165,42 @@ static const struct dentry_operations ovl_reval_dentry_operations = {
>>         .d_weak_revalidate = ovl_dentry_weak_revalidate,
>>  };
>>
>> +/* Get exclusive ownership on upper/work dir among overlay mounts */
>> +static bool ovl_dir_lock(struct dentry *dentry)
>> +{
>> +       struct inode *inode;
>> +
>> +       if (!dentry)
>> +               return false;
>> +
>> +       inode = d_inode(dentry);
>> +       if (!inode || inode_inuse(inode))
>> +               return false;
>> +
>> +       return inode_inuse_trylock(inode);
>> +}
>> +
>> +static void ovl_dir_unlock(struct dentry *dentry)
>> +{
>> +       struct inode *inode;
>> +
>> +       if (!dentry)
>> +               return;
>> +
>> +       inode = d_inode(dentry);
>> +       if (inode && inode_inuse(inode))
>> +               inode_inuse_unlock(inode);
>> +}
>
> Seems a bit overcomplicated.   Aren't we always dealing with positive
> dentries?   In which case these can just be trivial wrappers around
> inode_inuse_{try|un}lock(), or can be gotten rid of completely.
>

I find the wrappers convenient for cleanup code, but sure
I can make them much thinner, dput() style.

>> +
>>  static void ovl_put_super(struct super_block *sb)
>>  {
>>         struct ovl_fs *ufs = sb->s_fs_info;
>>         unsigned i;
>>
>> +       ovl_dir_unlock(ufs->workdir);
>>         dput(ufs->workdir);
>> +       if (ufs->upper_mnt)
>> +               ovl_dir_unlock(ufs->upper_mnt->mnt_root);
>>         mntput(ufs->upper_mnt);
>>         for (i = 0; i < ufs->numlower; i++)
>>                 mntput(ufs->lower_mnt[i]);
>> @@ -407,6 +437,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>>                         if (retried)
>>                                 goto out_dput;
>>
>> +                       /*
>> +                        * We have parent i_mutex, so this test is race free
>> +                        * w.r.t. ovl_dir_lock() below by another overlay mount.
>> +                        */
>> +                       err = -EBUSY;
>> +                       if (inode_inuse(work->d_inode))
>> +                               goto out_dput;
>> +
>
> Why not lock it here?
>
>>                         retried = true;
>>                         ovl_workdir_cleanup(dir, mnt, work, 0);
>>                         dput(work);
>> @@ -446,6 +484,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>>                 inode_unlock(work->d_inode);
>>                 if (err)
>>                         goto out_dput;
>> +
>> +               /*
>> +                * Protect our work dir from being deleted/renamed and from
>> +                * being reused by another overlay mount.
>> +                */
>> +               err = -EBUSY;
>> +               if (!ovl_dir_lock(work))
>> +                       goto out_dput;
>>         }
>>  out_unlock:
>>         inode_unlock(dir);
>> @@ -849,6 +895,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>                         pr_err("overlayfs: failed to clone upperpath\n");
>>                         goto out_put_lowerpath;
>>                 }
>> +               /*
>> +                * Protect our upper dir from being deleted/renamed and from
>> +                * being reused by another overlay mount.
>> +                */
>> +               err = -EBUSY;
>> +               if (!ovl_dir_lock(upperpath.dentry)) {
>> +                       pr_err("overlayfs: upperdir in-use by another overlay mount?\n");
>> +                       goto out_put_upper_mnt;
>> +               }
>> +
>>                 /* Don't inherit atime flags */
>>                 ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
>>
>> @@ -857,6 +913,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>                 ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
>>                 err = PTR_ERR(ufs->workdir);
>>                 if (IS_ERR(ufs->workdir)) {
>> +                       if (err == -EBUSY) {
>> +                               pr_err("overlayfs: workdir in-use by another overlay mount?\n");
>
> Why ask?  Aren't we sure?
>
>> +                               goto out_unlock_upperdir;
>> +                       }
>>                         pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n",
>>                                 ufs->config.workdir, OVL_WORKDIR_NAME, -err);
>>                         sb->s_flags |= MS_RDONLY;
>> @@ -874,7 +934,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>
>>                         err = ovl_check_d_type_supported(&workpath);
>>                         if (err < 0)
>> -                               goto out_put_workdir;
>> +                               goto out_unlock_workdir;
>>
>>                         /*
>>                          * We allowed this configuration and don't want to
>> @@ -910,7 +970,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>         err = -ENOMEM;
>>         ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
>>         if (ufs->lower_mnt == NULL)
>> -               goto out_put_workdir;
>> +               goto out_unlock_workdir;
>>         for (i = 0; i < numlower; i++) {
>>                 struct vfsmount *mnt = clone_private_mount(&stack[i]);
>>
>> @@ -1002,8 +1062,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>         for (i = 0; i < ufs->numlower; i++)
>>                 mntput(ufs->lower_mnt[i]);
>>         kfree(ufs->lower_mnt);
>> -out_put_workdir:
>> +out_unlock_workdir:
>> +       ovl_dir_unlock(ufs->workdir);
>>         dput(ufs->workdir);
>> +out_unlock_upperdir:
>> +       ovl_dir_unlock(upperpath.dentry);
>> +out_put_upper_mnt:
>>         mntput(ufs->upper_mnt);
>>  out_put_lowerpath:
>>         for (i = 0; i < numlower; i++)
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs
  2017-05-31 12:47     ` Amir Goldstein
@ 2017-05-31 13:05       ` Amir Goldstein
  2017-05-31 13:24         ` Miklos Szeredi
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-05-31 13:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 3:47 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, May 31, 2017 at 1:18 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Bad things can happen if several concurrent overlay mounts try to
>>> use the same upperdir path and/or workdir path.
>>>
>>> Try to get the 'inuse' advisory lock on upper and work dir.
>>> Fail mount if another overlay mount instance or another user
>>> holds the 'inuse' lock.
>>>
>>> Note that this provides no protection for concurrent overlay
>>> mount that use overlapping (i.e. descendant) upper dirs or
>>> work dirs.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
...
>>> @@ -407,6 +437,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>>>                         if (retried)
>>>                                 goto out_dput;
>>>
>>> +                       /*
>>> +                        * We have parent i_mutex, so this test is race free
>>> +                        * w.r.t. ovl_dir_lock() below by another overlay mount.
>>> +                        */
>>> +                       err = -EBUSY;
>>> +                       if (inode_inuse(work->d_inode))
>>> +                               goto out_dput;
>>> +
>>
>> Why not lock it here?

Because we are locking 'work' inode, not workdir
and the 'work' inode is about to be zapped and replaced with a new
work inode on retry.
Are you suggesting to move the inuse lock to the workdir inode? doable.
I guess I choose to lock workdir/work because of the may_delete
protection it provides,
but you questioned that part anyway.

>>
>>>                         retried = true;
>>>                         ovl_workdir_cleanup(dir, mnt, work, 0);
>>>                         dput(work);
>>> @@ -446,6 +484,14 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt,
>>>                 inode_unlock(work->d_inode);
>>>                 if (err)
>>>                         goto out_dput;
>>> +
>>> +               /*
>>> +                * Protect our work dir from being deleted/renamed and from
>>> +                * being reused by another overlay mount.
>>> +                */
>>> +               err = -EBUSY;
>>> +               if (!ovl_dir_lock(work))
>>> +                       goto out_dput;
>>>         }
>>>  out_unlock:
>>>         inode_unlock(dir);
>>> @@ -849,6 +895,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>                         pr_err("overlayfs: failed to clone upperpath\n");
>>>                         goto out_put_lowerpath;
>>>                 }
>>> +               /*
>>> +                * Protect our upper dir from being deleted/renamed and from
>>> +                * being reused by another overlay mount.
>>> +                */
>>> +               err = -EBUSY;
>>> +               if (!ovl_dir_lock(upperpath.dentry)) {
>>> +                       pr_err("overlayfs: upperdir in-use by another overlay mount?\n");
>>> +                       goto out_put_upper_mnt;
>>> +               }
>>> +
>>>                 /* Don't inherit atime flags */
>>>                 ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME);
>>>
>>> @@ -857,6 +913,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>>>                 ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry);
>>>                 err = PTR_ERR(ufs->workdir);
>>>                 if (IS_ERR(ufs->workdir)) {
>>> +                       if (err == -EBUSY) {
>>> +                               pr_err("overlayfs: workdir in-use by another overlay mount?\n");
>>
>> Why ask?  Aren't we sure?
>>

So I think ovl_workdir_cleanup() can also return EBUSY if one of the
dirs/files inside
it are used as a mount point or a dir used as a rootdir.
Also, since inode_inuse() is not overlay specific, cannot rule out the
option of some
other code setting inode_inuse on workdir, thus the "?".
I don't mind dropping the "?" though - perhaps phrase the error more
generically:
    pr_err("overlayfs: workdir is in-use by another mount\n");

The man page for mount(2) has a broad phrasing for EBUSY -
"target is still busy (... etc.)".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs
  2017-05-31 13:05       ` Amir Goldstein
@ 2017-05-31 13:24         ` Miklos Szeredi
  0 siblings, 0 replies; 11+ messages in thread
From: Miklos Szeredi @ 2017-05-31 13:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 3:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:

> Are you suggesting to move the inuse lock to the workdir inode? doable.
> I guess I choose to lock workdir/work because of the may_delete
> protection it provides,
> but you questioned that part anyway.

Yes, I think we should protect workdir/upperdir with I_INUSE.

> I don't mind dropping the "?" though - perhaps phrase the error more
> generically:
>     pr_err("overlayfs: workdir is in-use by another mount\n");

Okay.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] vfs: introduce inode 'inuse' lock
  2017-05-31 10:09   ` Miklos Szeredi
@ 2017-05-31 13:54     ` Amir Goldstein
  2017-05-31 14:30       ` Miklos Szeredi
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-05-31 13:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Added an i_state flag I_INUSE and helpers to set/clear/test the bit.
>>
>> The 'inuse' lock is an 'advisory' inode lock, which also provides
>> may_delete() protection, so can be used to extend exclusive create
>> protection beyond parent->i_mutex lock among cooperating users.
>>
>> This is going to be used by overlayfs to get exclusive ownership
>> on upper and work dirs among overlayfs mounts.
>
> Not sure I like the delete protection.  Any modification of workdir or
> layers while mounted might cause inconsistencies or errors in the
> overlay.  So why single out deletion of base directories?
>

There are a few reasons why 'inuse' inode should not be deleted,
regardless of whether delete protection is needed by overlayfs or not
(I think we don't need it).

1. setting INUSE on a  FREEING|WILL_FREE inode is not allowed
so preventing delete on INUSE makes the possible states fewer and
easier to manage.

2. With latest patchset I also implemented wait_on_inode_inuse()
https://github.com/amir73il/linux/blob/ovl-dir-lock/fs/inode.c#L2175
which is later used by to copy up code for index hardlink.
By preventing delete, I can isolate I_INUSE waiters from I_NEW waiters
and don't need to deal with INUSE waiters and inode delete.

3. Backwards justification: the man page for unlink(2) and rmdir(2)
already explain EBUSY in a generic way:
"pathname cannot be unlinked because it is being used by the system ..."
"pathname is currently in use by the system or ..."

So you may think of the new INUSE flag as a declaration by any
module in the system to make the inode qualify for "in use by the system".

Did any of the arguments above convince you to leave delete protection?
Because if I leave delete protection in v2, I agree the reason should
be better documented.

Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] vfs: introduce inode 'inuse' lock
  2017-05-31 13:54     ` Amir Goldstein
@ 2017-05-31 14:30       ` Miklos Szeredi
  2017-05-31 15:16         ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2017-05-31 14:30 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 3:54 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, May 31, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Added an i_state flag I_INUSE and helpers to set/clear/test the bit.
>>>
>>> The 'inuse' lock is an 'advisory' inode lock, which also provides
>>> may_delete() protection, so can be used to extend exclusive create
>>> protection beyond parent->i_mutex lock among cooperating users.
>>>
>>> This is going to be used by overlayfs to get exclusive ownership
>>> on upper and work dirs among overlayfs mounts.
>>
>> Not sure I like the delete protection.  Any modification of workdir or
>> layers while mounted might cause inconsistencies or errors in the
>> overlay.  So why single out deletion of base directories?
>>
>
> There are a few reasons why 'inuse' inode should not be deleted,
> regardless of whether delete protection is needed by overlayfs or not
> (I think we don't need it).
>
> 1. setting INUSE on a  FREEING|WILL_FREE inode is not allowed
> so preventing delete on INUSE makes the possible states fewer and
> easier to manage.

Overlayfs keeps a ref on upperdir, so the inode cannot be deleted only
unhashed.  No ref is kept on workdir, because we don't currently use
it for anything other than creating the empty work directory inside,
but if we mark it inuse, we should keep a ref on it as well.

Maybe the interface should be:

struct dentry *d_try_to_use(struct dentry *dentry)
{
    struct inode *inode = d_inode(dentry);

    spin_lock(&inode->i_lock);
    if (inode->i_state & I_INUSE) {
        spin_unlock(&inode->i_lock);
        return NULL;
    }
    inode->i_state |= I_INUSE;
    spin_unlock(&inode->i_lock);

    return dget(dentry)
}

void d_unuse(struct dentry *dentry)
{
    struct inode *inode = d_inode(dentry);

    WARN_ON(!(inode->i_state & I_INUSE));

    spin_lock(&inode->i_lock);
    inode->i_state &= ~I_INUSE;
    spin_unlock(&inode->i_lock);

    dput(dentry);
}


>
> 2. With latest patchset I also implemented wait_on_inode_inuse()
> https://github.com/amir73il/linux/blob/ovl-dir-lock/fs/inode.c#L2175
> which is later used by to copy up code for index hardlink.

Need to see these patches to see what's going on here.

> By preventing delete, I can isolate I_INUSE waiters from I_NEW waiters
> and don't need to deal with INUSE waiters and inode delete.
>
> 3. Backwards justification: the man page for unlink(2) and rmdir(2)
> already explain EBUSY in a generic way:
> "pathname cannot be unlinked because it is being used by the system ..."
> "pathname is currently in use by the system or ..."

That's fine.  I'm not objecting to the error value.

I'm objecting to special casing the root upperdentry wrt.
delete/modification protection.

Make I_INUSE recursive?  I think it would be an overkill.  Just let it
do the minimal thing that needs to be done to prevent unobvious
configuration errors.  Removing upperdir or workdir is pretty
obviously going to break the overlay, so I don't think we need to
worry about that.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] vfs: introduce inode 'inuse' lock
  2017-05-31 14:30       ` Miklos Szeredi
@ 2017-05-31 15:16         ` Amir Goldstein
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2017-05-31 15:16 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-unionfs, linux-fsdevel

On Wed, May 31, 2017 at 5:30 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, May 31, 2017 at 3:54 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, May 31, 2017 at 1:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Tue, May 23, 2017 at 11:50 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Added an i_state flag I_INUSE and helpers to set/clear/test the bit.
>>>>
>>>> The 'inuse' lock is an 'advisory' inode lock, which also provides
>>>> may_delete() protection, so can be used to extend exclusive create
>>>> protection beyond parent->i_mutex lock among cooperating users.
>>>>
>>>> This is going to be used by overlayfs to get exclusive ownership
>>>> on upper and work dirs among overlayfs mounts.
>>>
>>> Not sure I like the delete protection.  Any modification of workdir or
>>> layers while mounted might cause inconsistencies or errors in the
>>> overlay.  So why single out deletion of base directories?
>>>
>>
>> There are a few reasons why 'inuse' inode should not be deleted,
>> regardless of whether delete protection is needed by overlayfs or not
>> (I think we don't need it).
>>
>> 1. setting INUSE on a  FREEING|WILL_FREE inode is not allowed
>> so preventing delete on INUSE makes the possible states fewer and
>> easier to manage.
>
> Overlayfs keeps a ref on upperdir, so the inode cannot be deleted only
> unhashed.  No ref is kept on workdir, because we don't currently use
> it for anything other than creating the empty work directory inside,
> but if we mark it inuse, we should keep a ref on it as well.
>
> Maybe the interface should be:
>
> struct dentry *d_try_to_use(struct dentry *dentry)
> {
>     struct inode *inode = d_inode(dentry);
>
>     spin_lock(&inode->i_lock);
>     if (inode->i_state & I_INUSE) {
>         spin_unlock(&inode->i_lock);
>         return NULL;
>     }
>     inode->i_state |= I_INUSE;
>     spin_unlock(&inode->i_lock);
>
>     return dget(dentry)
> }
>
> void d_unuse(struct dentry *dentry)
> {
>     struct inode *inode = d_inode(dentry);
>
>     WARN_ON(!(inode->i_state & I_INUSE));
>
>     spin_lock(&inode->i_lock);
>     inode->i_state &= ~I_INUSE;
>     spin_unlock(&inode->i_lock);
>
>     dput(dentry);
> }
>
>
>>
>> 2. With latest patchset I also implemented wait_on_inode_inuse()
>> https://github.com/amir73il/linux/blob/ovl-dir-lock/fs/inode.c#L2175
>> which is later used by to copy up code for index hardlink.
>
> Need to see these patches to see what's going on here.
>

Sure. I'll post the patch bomb tomorrow.

>> By preventing delete, I can isolate I_INUSE waiters from I_NEW waiters
>> and don't need to deal with INUSE waiters and inode delete.
>>
>> 3. Backwards justification: the man page for unlink(2) and rmdir(2)
>> already explain EBUSY in a generic way:
>> "pathname cannot be unlinked because it is being used by the system ..."
>> "pathname is currently in use by the system or ..."
>
> That's fine.  I'm not objecting to the error value.
>
> I'm objecting to special casing the root upperdentry wrt.
> delete/modification protection.
>
> Make I_INUSE recursive?  I think it would be an overkill.  Just let it
> do the minimal thing that needs to be done to prevent unobvious
> configuration errors.  Removing upperdir or workdir is pretty
> obviously going to break the overlay, so I don't think we need to
> worry about that.
>

Again, I don't think we need inuse to provide delete protection
For overlayfs dirs. I think implementing the inuse API is simpler
Without having to deal with inode lifetime consideration.

I'll see if getting rid of delete protection can be done without
To much complications.

Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-31 15:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23  9:50 [PATCH 0/2] overlayfs multiple mount protection Amir Goldstein
2017-05-23  9:50 ` [PATCH 1/2] vfs: introduce inode 'inuse' lock Amir Goldstein
2017-05-31 10:09   ` Miklos Szeredi
2017-05-31 13:54     ` Amir Goldstein
2017-05-31 14:30       ` Miklos Szeredi
2017-05-31 15:16         ` Amir Goldstein
2017-05-23  9:50 ` [PATCH 2/2] ovl: get exclusive ownership on upper/work dirs Amir Goldstein
2017-05-31 10:18   ` Miklos Szeredi
2017-05-31 12:47     ` Amir Goldstein
2017-05-31 13:05       ` Amir Goldstein
2017-05-31 13:24         ` Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.