linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs
@ 2020-11-08 14:02 Chengguang Xu
  2020-11-08 14:02 ` [RFC PATCH v3 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:02 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.

(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.

(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.

This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.

v1->v2:
- Mark overlayfs' inode dirty itself instead of adding notification
mechanism to vfs inode.

v2->v3:
- Introduce overlayfs' extra syncfs wait list to wait target upper inodes
in ->sync_fs.

Chengguang Xu (10):
  ovl: setup overlayfs' private bdi
  ovl: introduce waiting list for syncfs
  ovl: implement ->writepages operation
  ovl: implement overlayfs' ->evict_inode operation
  ovl: mark overlayfs' inode dirty on modification
  ovl: mark overlayfs' inode dirty on shared mmap
  ovl: implement overlayfs' ->write_inode operation
  ovl: cache dirty overlayfs' inode
  ovl: introduce helper of syncfs writeback inode waiting
  ovl: implement containerized syncfs for overlayfs

 fs/overlayfs/file.c      |   2 +
 fs/overlayfs/inode.c     |  25 ++++++++++
 fs/overlayfs/overlayfs.h |   4 ++
 fs/overlayfs/ovl_entry.h |   5 ++
 fs/overlayfs/super.c     | 104 +++++++++++++++++++++++++++++++++++++--
 fs/overlayfs/util.c      |  14 ++++++
 6 files changed, 150 insertions(+), 4 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 01/10] ovl: setup overlayfs' private bdi
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
@ 2020-11-08 14:02 ` Chengguang Xu
  2020-11-08 14:02 ` [RFC PATCH v3 02/10] ovl: introduce waiting list for syncfs Chengguang Xu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:02 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Setup overlayfs' private bdi so that we can collect
overlayfs' own dirty inodes.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..d1e546abce87 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1869,6 +1869,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ofs)
 		goto out;
 
+	err = super_setup_bdi(sb);
+	if (err)
+		goto out_err;
+
 	ofs->creator_cred = cred = prepare_creds();
 	if (!cred)
 		goto out_err;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 02/10] ovl: introduce waiting list for syncfs
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
  2020-11-08 14:02 ` [RFC PATCH v3 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
@ 2020-11-08 14:02 ` Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 03/10] ovl: implement ->writepages operation Chengguang Xu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:02 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Introduce extra waiting list to collect inodes
which are in writeback process.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/ovl_entry.h | 5 +++++
 fs/overlayfs/super.c     | 4 ++++
 2 files changed, 9 insertions(+)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094df8e..72334662253a 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -79,6 +79,9 @@ struct ovl_fs {
 	atomic_long_t last_ino;
 	/* Whiteout dentry cache */
 	struct dentry *whiteout;
+	/* syncfs waiting list and lock */
+	struct list_head syncfs_wait_list;
+	spinlock_t syncfs_wait_list_lock;
 };
 
 static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
@@ -129,6 +132,8 @@ struct ovl_inode {
 
 	/* synchronize copy up and more */
 	struct mutex lock;
+	/* link to ofs->syncfs_wait_list */
+	struct list_head wait_list;
 };
 
 static inline struct ovl_inode *OVL_I(struct inode *inode)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d1e546abce87..1e21feb87297 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -184,6 +184,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
 	oi->lower = NULL;
 	oi->lowerdata = NULL;
 	mutex_init(&oi->lock);
+	INIT_LIST_HEAD(&oi->wait_list);
 
 	return &oi->vfs_inode;
 }
@@ -1869,6 +1870,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ofs)
 		goto out;
 
+	INIT_LIST_HEAD(&ofs->syncfs_wait_list);
+	spin_lock_init(&ofs->syncfs_wait_list_lock);
+
 	err = super_setup_bdi(sb);
 	if (err)
 		goto out_err;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 03/10] ovl: implement ->writepages operation
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
  2020-11-08 14:02 ` [RFC PATCH v3 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
  2020-11-08 14:02 ` [RFC PATCH v3 02/10] ovl: introduce waiting list for syncfs Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 04/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Implement overlayfs' ->writepages operation so that
we can sync dirty data/metadata to upper filesystem.
If writeback mode is sync_mode we add overlayfs inode
to extra syncfs wait list so that we can wait completion
in ->syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b584dca845ba..1f5cbbc60b28 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
 #include <linux/posix_acl.h>
 #include <linux/ratelimit.h>
 #include <linux/fiemap.h>
+#include <linux/writeback.h>
 #include "overlayfs.h"
 
 
@@ -516,7 +517,30 @@ static const struct inode_operations ovl_special_inode_operations = {
 	.update_time	= ovl_update_time,
 };
 
+static int ovl_writepages(struct address_space *mapping,
+			  struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct inode *upper = ovl_inode_upper(inode);
+	int ret;
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+	ret = sync_inode(upper, wbc);
+	if (!ret && wbc->for_sync == 1) {
+		spin_lock(&ofs->syncfs_wait_list_lock);
+		if (list_empty(&OVL_I(inode)->wait_list))
+			list_add_tail(&OVL_I(inode)->wait_list,
+				      &ofs->syncfs_wait_list);
+		spin_unlock(&ofs->syncfs_wait_list_lock);
+	}
+
+	return ret;
+}
+
 static const struct address_space_operations ovl_aops = {
+	.writepages		= ovl_writepages,
 	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
 	.direct_IO		= noop_direct_IO,
 };
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 04/10] ovl: implement overlayfs' ->evict_inode operation
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (2 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 03/10] ovl: implement ->writepages operation Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 05/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Implement overlayfs' ->evict_inode operation,
so that we can clear all inode dirty flags.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 1e21feb87297..694eff76eb09 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -391,11 +391,26 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 	return ret;
 }
 
+static void ovl_evict_inode(struct inode *inode)
+{
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+
+	if (!list_empty(&OVL_I(inode)->wait_list)) {
+		spin_lock(&ofs->syncfs_wait_list_lock);
+		if (!list_empty(&OVL_I(inode)->wait_list))
+			list_del(&OVL_I(inode)->wait_list);
+		spin_unlock(&ofs->syncfs_wait_list_lock);
+	}
+	inode->i_state &= ~I_DIRTY_ALL;
+	clear_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
+	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 05/10] ovl: mark overlayfs' inode dirty on modification
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (3 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 04/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Mark overlayfs' inode dirty on modification so that
we can recognize target inodes during syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c     |  1 +
 fs/overlayfs/overlayfs.h |  4 ++++
 fs/overlayfs/util.c      | 14 ++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1f5cbbc60b28..faa5c345f0cf 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -468,6 +468,7 @@ int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags)
 		if (upperpath.dentry) {
 			touch_atime(&upperpath);
 			inode->i_atime = d_inode(upperpath.dentry)->i_atime;
+			ovl_mark_inode_dirty(inode);
 		}
 	}
 	return 0;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f8880aa2ba0e..eaf1d5b05d8e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -247,6 +247,7 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
 }
 
 /* util.c */
+void ovl_mark_inode_dirty(struct inode *inode);
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
@@ -472,6 +473,9 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
 	to->i_mtime = from->i_mtime;
 	to->i_ctime = from->i_ctime;
 	i_size_write(to, i_size_read(from));
+
+	if (ovl_inode_upper(to) && from->i_state & I_DIRTY_ALL)
+		ovl_mark_inode_dirty(to);
 }
 
 static inline void ovl_copyflags(struct inode *from, struct inode *to)
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f475627d07..a6f59df744ae 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -950,3 +950,17 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	kfree(buf);
 	return ERR_PTR(res);
 }
+
+/*
+ * We intentionally add I_DIRTY_SYNC flag regardless dirty flag
+ * of upper inode so that we have chance to invoke ->write_inode
+ * to re-dirty overlayfs' inode during writeback process.
+ */
+void ovl_mark_inode_dirty(struct inode *inode)
+{
+	struct inode *upper = ovl_inode_upper(inode);
+	unsigned long iflag = I_DIRTY_SYNC;
+
+	iflag |= upper->i_state & I_DIRTY_ALL;
+	__mark_inode_dirty(inode, iflag);
+}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (4 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 05/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-11 13:05   ` 回复:[RFC " Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Overlayfs cannot be notified when mmapped area gets dirty,
so we need to proactively mark inode dirty in ->mmap operation.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index efccb7c1f9bc..662252047fff 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -486,6 +486,8 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 		/* Drop reference count from new vm_file value */
 		fput(realfile);
 	} else {
+		if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE))
+			ovl_mark_inode_dirty(file_inode(file));
 		/* Drop reference count from previous vm_file value */
 		fput(file);
 	}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (5 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-10 13:45   ` Jan Kara
  2020-11-08 14:03 ` [RFC PATCH v3 08/10] ovl: cache dirty overlayfs' inode Chengguang Xu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Implement overlayfs' ->write_inode to sync dirty data
and redirty overlayfs' inode if necessary.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 694eff76eb09..0ddee18b0bab 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -391,6 +391,35 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 	return ret;
 }
 
+static int ovl_write_inode(struct inode *inode,
+			   struct writeback_control *wbc)
+{
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct inode *upper = ovl_inode_upper(inode);
+	unsigned long iflag = 0;
+	int ret = 0;
+
+	if (!upper)
+		return 0;
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+
+	if (upper->i_sb->s_op->write_inode)
+		ret = upper->i_sb->s_op->write_inode(inode, wbc);
+
+	iflag |= upper->i_state & I_DIRTY_ALL;
+
+	if (mapping_writably_mapped(upper->i_mapping) ||
+	    mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
+		iflag |= I_DIRTY_PAGES;
+
+	if (iflag)
+		ovl_mark_inode_dirty(inode);
+
+	return ret;
+}
+
 static void ovl_evict_inode(struct inode *inode)
 {
 	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
@@ -411,6 +440,7 @@ static const struct super_operations ovl_super_operations = {
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
 	.evict_inode	= ovl_evict_inode,
+	.write_inode	= ovl_write_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 08/10] ovl: cache dirty overlayfs' inode
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (6 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Now drop overlayfs' inode will sync dirty data,
so we change to only drop clean inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 0ddee18b0bab..e5607a908d82 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -434,11 +434,20 @@ static void ovl_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
+static int ovl_drop_inode(struct inode *inode)
+{
+	struct inode *upper = ovl_inode_upper(inode);
+
+	if (!upper || !(inode->i_state & I_DIRTY_ALL))
+		return 1;
+	return generic_drop_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
-	.drop_inode	= generic_delete_inode,
+	.drop_inode	= ovl_drop_inode,
 	.evict_inode	= ovl_evict_inode,
 	.write_inode	= ovl_write_inode,
 	.put_super	= ovl_put_super,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (7 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 08/10] ovl: cache dirty overlayfs' inode Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  2020-11-09  3:33   ` 回复:[RFC " Chengguang Xu
  2020-11-08 14:03 ` [RFC PATCH v3 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu
  9 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Introduce a helper ovl_wait_wb_inodes() to wait until all
target upper inodes finish writeback.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e5607a908d82..9a535fc11221 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 	ovl_free_fs(ofs);
 }
 
+void ovl_wait_wb_inodes(struct ovl_fs *ofs)
+{
+	LIST_HEAD(tmp_list);
+	struct ovl_inode *oi;
+	struct inode *upper;
+
+	spin_lock(&ofs->syncfs_wait_list_lock);
+	list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
+
+	while (!list_empty(&tmp_list)) {
+		oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
+		list_del_init(&oi->wait_list);
+		ihold(&oi->vfs_inode);
+		spin_unlock(&ofs->syncfs_wait_list_lock);
+
+		upper = ovl_inode_upper(&oi->vfs_inode);
+		if (!mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK)) {
+			iput(&oi->vfs_inode);
+			goto wait_next;
+		}
+
+		filemap_fdatawait_keep_errors(upper->i_mapping);
+		iput(&oi->vfs_inode);
+		cond_resched();
+wait_next:
+		spin_lock(&ofs->syncfs_wait_list_lock);
+	}
+	spin_unlock(&ofs->syncfs_wait_list_lock);
+}
+
 /* Sync real dirty inodes in upper filesystem (if it exists) */
 static int ovl_sync_fs(struct super_block *sb, int wait)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v3 10/10] ovl: implement containerized syncfs for overlayfs
  2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (8 preceding siblings ...)
  2020-11-08 14:03 ` [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting Chengguang Xu
@ 2020-11-08 14:03 ` Chengguang Xu
  9 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-08 14:03 UTC (permalink / raw)
  To: miklos, jack, amir73il; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Now overlayfs can only sync dirty inode during
syncfs, so remove unnecessary sync_filesystem()
on upper file system.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 9a535fc11221..9fc66563d749 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,7 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include <linux/exportfs.h>
+#include <linux/blkdev.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -301,8 +302,7 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 	 * All the super blocks will be iterated, including upper_sb.
 	 *
-	 * If this is a syncfs(2) call, then we do need to call
-	 * sync_filesystem() on upper_sb, but enough if we do it when being
+	 * If this is a syncfs(2) call, it will be enough we do it when being
 	 * called with wait == 1.
 	 */
 	if (!wait)
@@ -311,7 +311,11 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
 
 	down_read(&upper_sb->s_umount);
-	ret = sync_filesystem(upper_sb);
+	ovl_wait_wb_inodes(ofs);
+	if (upper_sb->s_op->sync_fs)
+		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+	if (!ret)
+		ret = sync_blockdev(upper_sb->s_bdev);
 	up_read(&upper_sb->s_umount);
 
 	return ret;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* 回复:[RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-08 14:03 ` [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting Chengguang Xu
@ 2020-11-09  3:33   ` Chengguang Xu
  2020-11-09  7:07     ` [RFC " Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-09  3:33 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, amir73il, linux-unionfs, linux-fsdevel

 ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > target upper inodes finish writeback.
 > 
 > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > ---
 >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 >  1 file changed, 30 insertions(+)
 > 
 > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > index e5607a908d82..9a535fc11221 100644
 > --- a/fs/overlayfs/super.c
 > +++ b/fs/overlayfs/super.c
 > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 >      ovl_free_fs(ofs);
 >  }
 >  
 > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > +{
 > +    LIST_HEAD(tmp_list);
 > +    struct ovl_inode *oi;
 > +    struct inode *upper;
 > +
 > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > +
 > +    while (!list_empty(&tmp_list)) {
 > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > +        list_del_init(&oi->wait_list);
 > +        ihold(&oi->vfs_inode);

Maybe I overlooked race condition with inode eviction, so still need to introduce
OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-09  3:33   ` 回复:[RFC " Chengguang Xu
@ 2020-11-09  7:07     ` Amir Goldstein
  2020-11-09  8:34       ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-11-09  7:07 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > Introduce a helper ovl_wait_wb_inodes() to wait until all
>  > target upper inodes finish writeback.
>  >
>  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > ---
>  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  >  1 file changed, 30 insertions(+)
>  >
>  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>  > index e5607a908d82..9a535fc11221 100644
>  > --- a/fs/overlayfs/super.c
>  > +++ b/fs/overlayfs/super.c
>  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
>  >      ovl_free_fs(ofs);
>  >  }
>  >
>  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
>  > +{
>  > +    LIST_HEAD(tmp_list);
>  > +    struct ovl_inode *oi;
>  > +    struct inode *upper;
>  > +
>  > +    spin_lock(&ofs->syncfs_wait_list_lock);
>  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
>  > +
>  > +    while (!list_empty(&tmp_list)) {
>  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
>  > +        list_del_init(&oi->wait_list);
>  > +        ihold(&oi->vfs_inode);
>
> Maybe I overlooked race condition with inode eviction, so still need to introduce
> OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
>

I am not sure why you added the ovl wait list.

I think you misunderstood Jan's suggestion.
I think what Jan meant is that ovl_sync_fs() should call
wait_sb_inodes(upper_sb)
to wait for writeback of ALL upper inodes after sync_filesystem()
started writeback
only on this ovl instance upper inodes.

I am not sure if this is acceptable or not - it is certainly an improvement over
current situation, but I have a feeling that on a large scale (many
containers) it
won't be enough.

The idea was to keep it simple without over optimizing, since anyway
you are going for the "correct" solution long term (ovl inode aops),
so I wouldn't
add the wait list.

As long as the upper inode is still dirty, we can keep the ovl inode in cache,
so the worst outcome is that drop_caches needs to get called twice before the
ovl inode can be evicted, no?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-09  7:07     ` [RFC " Amir Goldstein
@ 2020-11-09  8:34       ` Chengguang Xu
  2020-11-09 10:07         ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-09  8:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

 ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > >  > target upper inodes finish writeback.
 > >  >
 > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > ---
 > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  >  1 file changed, 30 insertions(+)
 > >  >
 > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > index e5607a908d82..9a535fc11221 100644
 > >  > --- a/fs/overlayfs/super.c
 > >  > +++ b/fs/overlayfs/super.c
 > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 > >  >      ovl_free_fs(ofs);
 > >  >  }
 > >  >
 > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > >  > +{
 > >  > +    LIST_HEAD(tmp_list);
 > >  > +    struct ovl_inode *oi;
 > >  > +    struct inode *upper;
 > >  > +
 > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > >  > +
 > >  > +    while (!list_empty(&tmp_list)) {
 > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > >  > +        list_del_init(&oi->wait_list);
 > >  > +        ihold(&oi->vfs_inode);
 > >
 > > Maybe I overlooked race condition with inode eviction, so still need to introduce
 > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
 > >
 > 
 > I am not sure why you added the ovl wait list.
 > 
 > I think you misunderstood Jan's suggestion.
 > I think what Jan meant is that ovl_sync_fs() should call
 > wait_sb_inodes(upper_sb)
 > to wait for writeback of ALL upper inodes after sync_filesystem()
 > started writeback
 > only on this ovl instance upper inodes.
 > 


Maybe you are right, the wait list is just for accuracy that can completely
avoid interferes between ovl instances, otherwise we may need to face
waiting interferes  in high density environment. 


 > I am not sure if this is acceptable or not - it is certainly an improvement over
 > current situation, but I have a feeling that on a large scale (many
 > containers) it
 > won't be enough.
 > 

The same as your thought. 


 > The idea was to keep it simple without over optimizing, since anyway
 > you are going for the "correct" solution long term (ovl inode aops),
 > so I wouldn't
 > add the wait list.
 > 

Maybe, I think it depends on how to implement ovl page-cache, so at current
stage I have no idea for the wait list.


 > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
 > so the worst outcome is that drop_caches needs to get called twice before the
 > ovl inode can be evicted, no?
 > 
 
IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim, 
also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput. 
So should we add a shrinker in this series?


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-09  8:34       ` Chengguang Xu
@ 2020-11-09 10:07         ` Amir Goldstein
  2020-11-09 12:06           ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-11-09 10:07 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

On Mon, Nov 9, 2020 at 10:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
>  > >  > target upper inodes finish writeback.
>  > >  >
>  > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > >  > ---
>  > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
>  > >  >  1 file changed, 30 insertions(+)
>  > >  >
>  > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>  > >  > index e5607a908d82..9a535fc11221 100644
>  > >  > --- a/fs/overlayfs/super.c
>  > >  > +++ b/fs/overlayfs/super.c
>  > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
>  > >  >      ovl_free_fs(ofs);
>  > >  >  }
>  > >  >
>  > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
>  > >  > +{
>  > >  > +    LIST_HEAD(tmp_list);
>  > >  > +    struct ovl_inode *oi;
>  > >  > +    struct inode *upper;
>  > >  > +
>  > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
>  > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
>  > >  > +
>  > >  > +    while (!list_empty(&tmp_list)) {
>  > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
>  > >  > +        list_del_init(&oi->wait_list);
>  > >  > +        ihold(&oi->vfs_inode);
>  > >
>  > > Maybe I overlooked race condition with inode eviction, so still need to introduce
>  > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
>  > >
>  >
>  > I am not sure why you added the ovl wait list.
>  >
>  > I think you misunderstood Jan's suggestion.
>  > I think what Jan meant is that ovl_sync_fs() should call
>  > wait_sb_inodes(upper_sb)
>  > to wait for writeback of ALL upper inodes after sync_filesystem()
>  > started writeback
>  > only on this ovl instance upper inodes.
>  >
>
>
> Maybe you are right, the wait list is just for accuracy that can completely
> avoid interferes between ovl instances, otherwise we may need to face
> waiting interferes  in high density environment.
>
>
>  > I am not sure if this is acceptable or not - it is certainly an improvement over
>  > current situation, but I have a feeling that on a large scale (many
>  > containers) it
>  > won't be enough.
>  >
>
> The same as your thought.
>
>
>  > The idea was to keep it simple without over optimizing, since anyway
>  > you are going for the "correct" solution long term (ovl inode aops),
>  > so I wouldn't
>  > add the wait list.
>  >
>
> Maybe, I think it depends on how to implement ovl page-cache, so at current
> stage I have no idea for the wait list.
>
>
>  > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
>  > so the worst outcome is that drop_caches needs to get called twice before the
>  > ovl inode can be evicted, no?
>  >
>
> IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim,
> also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput.
> So should we add a shrinker in this series?
>

Would that add a lot of complexity?
Thinking out loud: maybe we follow Jan's suggestion and fix remaining
performance with followup series?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting
  2020-11-09 10:07         ` Amir Goldstein
@ 2020-11-09 12:06           ` Chengguang Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-09 12:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

 ---- 在 星期一, 2020-11-09 18:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Mon, Nov 9, 2020 at 10:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >  > >
 > >  > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > >  > >  > target upper inodes finish writeback.
 > >  > >  >
 > >  > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > >  > ---
 > >  > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  > >  >  1 file changed, 30 insertions(+)
 > >  > >  >
 > >  > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > >  > index e5607a908d82..9a535fc11221 100644
 > >  > >  > --- a/fs/overlayfs/super.c
 > >  > >  > +++ b/fs/overlayfs/super.c
 > >  > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 > >  > >  >      ovl_free_fs(ofs);
 > >  > >  >  }
 > >  > >  >
 > >  > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > >  > >  > +{
 > >  > >  > +    LIST_HEAD(tmp_list);
 > >  > >  > +    struct ovl_inode *oi;
 > >  > >  > +    struct inode *upper;
 > >  > >  > +
 > >  > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > >  > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > >  > >  > +
 > >  > >  > +    while (!list_empty(&tmp_list)) {
 > >  > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > >  > >  > +        list_del_init(&oi->wait_list);
 > >  > >  > +        ihold(&oi->vfs_inode);
 > >  > >
 > >  > > Maybe I overlooked race condition with inode eviction, so still need to introduce
 > >  > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
 > >  > >
 > >  >
 > >  > I am not sure why you added the ovl wait list.
 > >  >
 > >  > I think you misunderstood Jan's suggestion.
 > >  > I think what Jan meant is that ovl_sync_fs() should call
 > >  > wait_sb_inodes(upper_sb)
 > >  > to wait for writeback of ALL upper inodes after sync_filesystem()
 > >  > started writeback
 > >  > only on this ovl instance upper inodes.
 > >  >
 > >
 > >
 > > Maybe you are right, the wait list is just for accuracy that can completely
 > > avoid interferes between ovl instances, otherwise we may need to face
 > > waiting interferes  in high density environment.
 > >
 > >
 > >  > I am not sure if this is acceptable or not - it is certainly an improvement over
 > >  > current situation, but I have a feeling that on a large scale (many
 > >  > containers) it
 > >  > won't be enough.
 > >  >
 > >
 > > The same as your thought.
 > >
 > >
 > >  > The idea was to keep it simple without over optimizing, since anyway
 > >  > you are going for the "correct" solution long term (ovl inode aops),
 > >  > so I wouldn't
 > >  > add the wait list.
 > >  >
 > >
 > > Maybe, I think it depends on how to implement ovl page-cache, so at current
 > > stage I have no idea for the wait list.
 > >
 > >
 > >  > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
 > >  > so the worst outcome is that drop_caches needs to get called twice before the
 > >  > ovl inode can be evicted, no?
 > >  >
 > >
 > > IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim,
 > > also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput.
 > > So should we add a shrinker in this series?
 > >
 > 
 > Would that add a lot of complexity?

Sorry, don't need any other shrinker because inode and dentry use common vfs shrinker.

 > Thinking out loud: maybe we follow Jan's suggestion and fix remaining
 > performance with followup series?
 > 

Okay,  so let's leave it as homework.


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation
  2020-11-08 14:03 ` [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
@ 2020-11-10 13:45   ` Jan Kara
  2020-11-10 15:12     ` Chengguang Xu
  2020-11-10 16:18     ` Amir Goldstein
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Kara @ 2020-11-10 13:45 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, amir73il, linux-unionfs, linux-fsdevel

On Sun 08-11-20 22:03:04, Chengguang Xu wrote:
> +static int ovl_write_inode(struct inode *inode,
> +			   struct writeback_control *wbc)
> +{
> +	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
> +	struct inode *upper = ovl_inode_upper(inode);
> +	unsigned long iflag = 0;
> +	int ret = 0;
> +
> +	if (!upper)
> +		return 0;
> +
> +	if (!ovl_should_sync(ofs))
> +		return 0;
> +
> +	if (upper->i_sb->s_op->write_inode)
> +		ret = upper->i_sb->s_op->write_inode(inode, wbc);
> +
> +	iflag |= upper->i_state & I_DIRTY_ALL;
> +
> +	if (mapping_writably_mapped(upper->i_mapping) ||
> +	    mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
> +		iflag |= I_DIRTY_PAGES;
> +
> +	if (iflag)
> +		ovl_mark_inode_dirty(inode);

I think you didn't incorporate feedback we were speaking about in the last
version of the series. May comment in [1] still applies - you can miss
inodes dirtied through mmap when you decide to clean the inode here. So
IMHO you need something like:

	if (inode_is_open_for_write(inode))
		ovl_mark_inode_dirty(inode);

here to keep inode dirty while it is open for write (and not based on upper
inode state which is unreliable).

								Honza

[1] https://lore.kernel.org/linux-fsdevel/20201105140332.GG32718@quack2.suse.cz/

> +
> +	return ret;
> +}
> +
>  static void ovl_evict_inode(struct inode *inode)
>  {
>  	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
> @@ -411,6 +440,7 @@ static const struct super_operations ovl_super_operations = {
>  	.destroy_inode	= ovl_destroy_inode,
>  	.drop_inode	= generic_delete_inode,
>  	.evict_inode	= ovl_evict_inode,
> +	.write_inode	= ovl_write_inode,
>  	.put_super	= ovl_put_super,
>  	.sync_fs	= ovl_sync_fs,
>  	.statfs		= ovl_statfs,
> -- 
> 2.26.2
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation
  2020-11-10 13:45   ` Jan Kara
@ 2020-11-10 15:12     ` Chengguang Xu
  2020-11-11 10:54       ` Jan Kara
  2020-11-10 16:18     ` Amir Goldstein
  1 sibling, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-10 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: miklos, amir73il, linux-unionfs, linux-fsdevel

 ---- 在 星期二, 2020-11-10 21:45:51 Jan Kara <jack@suse.cz> 撰写 ----
 > On Sun 08-11-20 22:03:04, Chengguang Xu wrote:
 > > +static int ovl_write_inode(struct inode *inode,
 > > +               struct writeback_control *wbc)
 > > +{
 > > +    struct ovl_fs *ofs = inode->i_sb->s_fs_info;
 > > +    struct inode *upper = ovl_inode_upper(inode);
 > > +    unsigned long iflag = 0;
 > > +    int ret = 0;
 > > +
 > > +    if (!upper)
 > > +        return 0;
 > > +
 > > +    if (!ovl_should_sync(ofs))
 > > +        return 0;
 > > +
 > > +    if (upper->i_sb->s_op->write_inode)
 > > +        ret = upper->i_sb->s_op->write_inode(inode, wbc);
 > > +
 > > +    iflag |= upper->i_state & I_DIRTY_ALL;
 > > +
 > > +    if (mapping_writably_mapped(upper->i_mapping) ||
 > > +        mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > > +        iflag |= I_DIRTY_PAGES;
 > > +
 > > +    if (iflag)
 > > +        ovl_mark_inode_dirty(inode);
 > 
 > I think you didn't incorporate feedback we were speaking about in the last
 > version of the series. May comment in [1] still applies - you can miss
 > inodes dirtied through mmap when you decide to clean the inode here. So
 > IMHO you need something like:
 > 
 >     if (inode_is_open_for_write(inode))
 >         ovl_mark_inode_dirty(inode);
 > 
 > here to keep inode dirty while it is open for write (and not based on upper
 > inode state which is unreliable).

Hi Jan,

I not only checked upper inode state but also checked upper inode mmap(shared) state
using  mapping_writably_mapped(upper->i_mapping). Maybe it's better to move i_state check
after mmap check but isn't above checks enough for mmapped file? 

Below code is the definition of mmapping_writably_mapped(), I think it will check shared mmap
regardless write or read permission though the function name is quite confusable.

static inline int mapping_writably_mapped(struct address_space *mapping)
{
	return atomic_read(&mapping->i_mmap_writable) > 0;
}



Thanks,
Chengguang


 > 
 >                                 Honza
 > 
 > [1] https://lore.kernel.org/linux-fsdevel/20201105140332.GG32718@quack2.suse.cz/
 > 
 > > +
 > > +    return ret;
 > > +}
 > > +
 > >  static void ovl_evict_inode(struct inode *inode)
 > >  {
 > >      struct ovl_fs *ofs = inode->i_sb->s_fs_info;
 > > @@ -411,6 +440,7 @@ static const struct super_operations ovl_super_operations = {
 > >      .destroy_inode    = ovl_destroy_inode,
 > >      .drop_inode    = generic_delete_inode,
 > >      .evict_inode    = ovl_evict_inode,
 > > +    .write_inode    = ovl_write_inode,
 > >      .put_super    = ovl_put_super,
 > >      .sync_fs    = ovl_sync_fs,
 > >      .statfs        = ovl_statfs,
 > > -- 
 > > 2.26.2
 > > 
 > > 
 > -- 
 > Jan Kara <jack@suse.com>
 > SUSE Labs, CR
 > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation
  2020-11-10 13:45   ` Jan Kara
  2020-11-10 15:12     ` Chengguang Xu
@ 2020-11-10 16:18     ` Amir Goldstein
  1 sibling, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2020-11-10 16:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: Chengguang Xu, Miklos Szeredi, overlayfs, linux-fsdevel

On Tue, Nov 10, 2020 at 3:45 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 08-11-20 22:03:04, Chengguang Xu wrote:
> > +static int ovl_write_inode(struct inode *inode,
> > +                        struct writeback_control *wbc)
> > +{
> > +     struct ovl_fs *ofs = inode->i_sb->s_fs_info;
> > +     struct inode *upper = ovl_inode_upper(inode);
> > +     unsigned long iflag = 0;
> > +     int ret = 0;
> > +
> > +     if (!upper)
> > +             return 0;
> > +
> > +     if (!ovl_should_sync(ofs))
> > +             return 0;
> > +
> > +     if (upper->i_sb->s_op->write_inode)
> > +             ret = upper->i_sb->s_op->write_inode(inode, wbc);
> > +
> > +     iflag |= upper->i_state & I_DIRTY_ALL;
> > +
> > +     if (mapping_writably_mapped(upper->i_mapping) ||
> > +         mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
> > +             iflag |= I_DIRTY_PAGES;
> > +
> > +     if (iflag)
> > +             ovl_mark_inode_dirty(inode);
>
> I think you didn't incorporate feedback we were speaking about in the last
> version of the series. May comment in [1] still applies - you can miss
> inodes dirtied through mmap when you decide to clean the inode here. So
> IMHO you need something like:
>
>         if (inode_is_open_for_write(inode))
>                 ovl_mark_inode_dirty(inode);
>
> here to keep inode dirty while it is open for write (and not based on upper
> inode state which is unreliable).
>

Just to be clear, as long as the ovl inode is open for write, the upper inode
is also open for write via the realfile reference, but not the other
way around -
after open(); mmap(); close() of the overlay file, the ovl inode is not
open for write, but the upper inode is, because ovl_mmap() maps the
realfile and upper inode without taking any reference on the ovl file/inode.

Hence the check for mapping_writably_mapped(upper->i_mapping)
above.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation
  2020-11-10 15:12     ` Chengguang Xu
@ 2020-11-11 10:54       ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2020-11-11 10:54 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, amir73il, linux-unionfs, linux-fsdevel

On Tue 10-11-20 23:12:14, Chengguang Xu wrote:
>  ---- 在 星期二, 2020-11-10 21:45:51 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Sun 08-11-20 22:03:04, Chengguang Xu wrote:
>  > > +static int ovl_write_inode(struct inode *inode,
>  > > +               struct writeback_control *wbc)
>  > > +{
>  > > +    struct ovl_fs *ofs = inode->i_sb->s_fs_info;
>  > > +    struct inode *upper = ovl_inode_upper(inode);
>  > > +    unsigned long iflag = 0;
>  > > +    int ret = 0;
>  > > +
>  > > +    if (!upper)
>  > > +        return 0;
>  > > +
>  > > +    if (!ovl_should_sync(ofs))
>  > > +        return 0;
>  > > +
>  > > +    if (upper->i_sb->s_op->write_inode)
>  > > +        ret = upper->i_sb->s_op->write_inode(inode, wbc);
>  > > +
>  > > +    iflag |= upper->i_state & I_DIRTY_ALL;
>  > > +
>  > > +    if (mapping_writably_mapped(upper->i_mapping) ||
>  > > +        mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
>  > > +        iflag |= I_DIRTY_PAGES;
>  > > +
>  > > +    if (iflag)
>  > > +        ovl_mark_inode_dirty(inode);
>  > 
>  > I think you didn't incorporate feedback we were speaking about in the last
>  > version of the series. May comment in [1] still applies - you can miss
>  > inodes dirtied through mmap when you decide to clean the inode here. So
>  > IMHO you need something like:
>  > 
>  >     if (inode_is_open_for_write(inode))
>  >         ovl_mark_inode_dirty(inode);
>  > 
>  > here to keep inode dirty while it is open for write (and not based on upper
>  > inode state which is unreliable).
> 
> Hi Jan,
> 
> I not only checked upper inode state but also checked upper inode
> mmap(shared) state using  mapping_writably_mapped(upper->i_mapping).
> Maybe it's better to move i_state check after mmap check but isn't above
> checks enough for mmapped file? 

Ah, sorry, I'm blind! I missed the mapping_writably_mapped() check. Thanks
for explanation.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* 回复:[RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap
  2020-11-08 14:03 ` [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
@ 2020-11-11 13:05   ` Chengguang Xu
  2020-11-11 15:20     ` [RFC " Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-11-11 13:05 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, amir73il, linux-unionfs, linux-fsdevel

 ---- 在 星期日, 2020-11-08 22:03:03 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > Overlayfs cannot be notified when mmapped area gets dirty,
 > so we need to proactively mark inode dirty in ->mmap operation.
 > 
 > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > ---
 >  fs/overlayfs/file.c | 2 ++
 >  1 file changed, 2 insertions(+)
 > 
 > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
 > index efccb7c1f9bc..662252047fff 100644
 > --- a/fs/overlayfs/file.c
 > +++ b/fs/overlayfs/file.c
 > @@ -486,6 +486,8 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 >          /* Drop reference count from new vm_file value */
 >          fput(realfile);
 >      } else {
 > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE))

Maybe it's better to mark dirty only having upper inode.


 > +            ovl_mark_inode_dirty(file_inode(file));
 >          /* Drop reference count from previous vm_file value */
 >          fput(file);
 >      }
 > -- 
 > 2.26.2
 > 
 > 
 > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap
  2020-11-11 13:05   ` 回复:[RFC " Chengguang Xu
@ 2020-11-11 15:20     ` Amir Goldstein
  2020-11-11 16:09       ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-11-11 15:20 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

On Wed, Nov 11, 2020 at 3:05 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期日, 2020-11-08 22:03:03 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
>  > Overlayfs cannot be notified when mmapped area gets dirty,
>  > so we need to proactively mark inode dirty in ->mmap operation.
>  >
>  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  > ---
>  >  fs/overlayfs/file.c | 2 ++
>  >  1 file changed, 2 insertions(+)
>  >
>  > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
>  > index efccb7c1f9bc..662252047fff 100644
>  > --- a/fs/overlayfs/file.c
>  > +++ b/fs/overlayfs/file.c
>  > @@ -486,6 +486,8 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>  >          /* Drop reference count from new vm_file value */
>  >          fput(realfile);
>  >      } else {
>  > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE))
>
> Maybe it's better to mark dirty only having upper inode.
>

Yeh.

And since mapping_map_writable() is only called if VM_SHARED flag
is set (and not VM_MAYSHARE), we are not going to re-dirty an inode on
account of VM_MAYSHARE alone, so I wonder why we need to mark it
dirty here on account of VM_MAYSHARE?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap
  2020-11-11 15:20     ` [RFC " Amir Goldstein
@ 2020-11-11 16:09       ` Chengguang Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-11-11 16:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: miklos, jack, linux-unionfs, linux-fsdevel

 ---- 在 星期三, 2020-11-11 23:20:56 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Wed, Nov 11, 2020 at 3:05 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期日, 2020-11-08 22:03:03 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
 > >  > Overlayfs cannot be notified when mmapped area gets dirty,
 > >  > so we need to proactively mark inode dirty in ->mmap operation.
 > >  >
 > >  > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > ---
 > >  >  fs/overlayfs/file.c | 2 ++
 > >  >  1 file changed, 2 insertions(+)
 > >  >
 > >  > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
 > >  > index efccb7c1f9bc..662252047fff 100644
 > >  > --- a/fs/overlayfs/file.c
 > >  > +++ b/fs/overlayfs/file.c
 > >  > @@ -486,6 +486,8 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 > >  >          /* Drop reference count from new vm_file value */
 > >  >          fput(realfile);
 > >  >      } else {
 > >  > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE))
 > >
 > > Maybe it's better to mark dirty only having upper inode.
 > >
 > 
 > Yeh.
 > 
 > And since mapping_map_writable() is only called if VM_SHARED flag
 > is set (and not VM_MAYSHARE), we are not going to re-dirty an inode on
 > account of VM_MAYSHARE alone, so I wonder why we need to mark it
 > dirty here on account of VM_MAYSHARE?
 > 

Yeah, you are right. It just means the pages in this memory region can be mapped
by other process with shared+write mode and this is actually meaningless in our case.
So let's just ignore it. :-)


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-11-11 16:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-08 14:02 [RFC PATCH v3 00/10] implement containerized syncfs for overlayfs Chengguang Xu
2020-11-08 14:02 ` [RFC PATCH v3 01/10] ovl: setup overlayfs' private bdi Chengguang Xu
2020-11-08 14:02 ` [RFC PATCH v3 02/10] ovl: introduce waiting list for syncfs Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 03/10] ovl: implement ->writepages operation Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 04/10] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 05/10] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 06/10] ovl: mark overlayfs' inode dirty on shared mmap Chengguang Xu
2020-11-11 13:05   ` 回复:[RFC " Chengguang Xu
2020-11-11 15:20     ` [RFC " Amir Goldstein
2020-11-11 16:09       ` Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 07/10] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2020-11-10 13:45   ` Jan Kara
2020-11-10 15:12     ` Chengguang Xu
2020-11-11 10:54       ` Jan Kara
2020-11-10 16:18     ` Amir Goldstein
2020-11-08 14:03 ` [RFC PATCH v3 08/10] ovl: cache dirty overlayfs' inode Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting Chengguang Xu
2020-11-09  3:33   ` 回复:[RFC " Chengguang Xu
2020-11-09  7:07     ` [RFC " Amir Goldstein
2020-11-09  8:34       ` Chengguang Xu
2020-11-09 10:07         ` Amir Goldstein
2020-11-09 12:06           ` Chengguang Xu
2020-11-08 14:03 ` [RFC PATCH v3 10/10] ovl: implement containerized syncfs for overlayfs Chengguang Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).