All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] implement containerized syncfs for overlayfs
@ 2020-10-10 14:23 Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.

(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.

(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.

This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.

Chengguang Xu (5):
  fs: introduce notifier list for vfs inode
  fs: export symbol of writeback_single_inode()
  ovl: setup overlayfs' private bdi
  ovl: monitor marking dirty activity of underlying upper inode
  ovl: impement containerized syncfs for overlayfs

 fs/fs-writeback.c         |  7 ++++-
 fs/inode.c                |  5 ++++
 fs/overlayfs/inode.c      | 28 +++++++++++++++++++-
 fs/overlayfs/overlayfs.h  |  2 ++
 fs/overlayfs/ovl_entry.h  |  2 ++
 fs/overlayfs/super.c      | 65 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/overlayfs/util.c       | 33 ++++++++++++++++++++++++
 include/linux/fs.h        |  6 +++++
 include/linux/writeback.h |  1 +
 9 files changed, 143 insertions(+), 6 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-14 16:15   ` Jan Kara
  2020-10-15  3:25   ` Al Viro
  2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Currently there is no notification api for kernel about modification
of vfs inode, in some use cases like overlayfs, this kind of notification
will be very helpful to implement containerized syncfs functionality.
As the first attempt, we introduce marking inode dirty notification so that
overlay's inode could mark itself dirty as well and then only sync dirty
overlay inode while syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/fs-writeback.c  | 4 ++++
 fs/inode.c         | 5 +++++
 include/linux/fs.h | 6 ++++++
 3 files changed, 15 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..657cceb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	int dirtytime;
+	int copy_flags = flags;
 
 	trace_writeback_mark_inode_dirty(inode, flags);
 
+	blocking_notifier_call_chain(
+		&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		0, &copy_flags);
 	/*
 	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
 	 * dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..84e82db 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
  */
 void inode_init_once(struct inode *inode)
 {
+	int i;
+
 	memset(inode, 0, sizeof(*inode));
 	INIT_HLIST_NODE(&inode->i_hash);
 	INIT_LIST_HEAD(&inode->i_devices);
 	INIT_LIST_HEAD(&inode->i_io_list);
 	INIT_LIST_HEAD(&inode->i_wb_list);
 	INIT_LIST_HEAD(&inode->i_lru);
+	for (i = 0; i < INODE_MAX_NOTIFIER; i++)
+		BLOCKING_INIT_NOTIFIER_HEAD(
+			&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
 	__address_space_init_once(&inode->i_data);
 	i_size_ordered_init(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae0..42f0750 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
 
 struct fsnotify_mark_connector;
 
+enum {
+	MARK_INODE_DIRTY_NOTIFIER,
+	INODE_MAX_NOTIFIER,
+};
+
 /*
  * Keep mostly read-only and often accessed (especially for
  * the RCU path lookup and 'stat' data) fields at the beginning
@@ -723,6 +728,7 @@ struct inode {
 #endif
 
 	void			*i_private; /* fs or device private pointer */
+	struct	blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
 } __randomize_layout;
 
 struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/5] fs: export symbol of writeback_single_inode()
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Export symbol of writeback_single_inode() in order to call it
from overlayfs' ->writepages when sync single inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/fs-writeback.c         | 3 ++-
 include/linux/writeback.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 657cceb..4fed058 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1529,7 +1529,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
  * we go e.g. from filesystem. Flusher thread uses __writeback_single_inode()
  * and does more profound writeback list handling in writeback_sb_inodes().
  */
-static int writeback_single_inode(struct inode *inode,
+int writeback_single_inode(struct inode *inode,
 				  struct writeback_control *wbc)
 {
 	struct bdi_writeback *wb;
@@ -1585,6 +1585,7 @@ static int writeback_single_inode(struct inode *inode,
 	spin_unlock(&inode->i_lock);
 	return ret;
 }
+EXPORT_SYMBOL(writeback_single_inode);
 
 static long writeback_chunk_size(struct bdi_writeback *wb,
 				 struct wb_writeback_work *work)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 8e5c5bb..78e3b10 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -403,4 +403,5 @@ void tag_pages_for_writeback(struct address_space *mapping,
 void sb_mark_inode_writeback(struct inode *inode);
 void sb_clear_inode_writeback(struct inode *inode);
 
+int writeback_single_inode(struct inode *inode, struct writeback_control *wbc);
 #endif		/* WRITEBACK_H */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/5] ovl: setup overlayfs' private bdi
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  4 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Setup private bdi to collect overlayfs' dirty inodes.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983b..dc22725 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1863,6 +1863,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	int err;
 
 	sb->s_d_op = &ovl_dentry_operations;
+	err = super_setup_bdi(sb);
+	if (err)
+		goto out;
 
 	err = -ENOMEM;
 	ofs = kzalloc(sizeof(struct ovl_fs), GFP_KERNEL);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (2 preceding siblings ...)
  2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-11  1:16   ` kernel test robot
  2020-10-11  1:55   ` kernel test robot
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  4 siblings, 2 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Monitor marking dirty activity of underlying upper inode
so that we have chance to mark overlayfs' own inode dirty.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c     |  4 +++-
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/ovl_entry.h |  2 ++
 fs/overlayfs/super.c     | 11 +++++++++++
 fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++++++++
 5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b584dca..e75c7ec 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -632,8 +632,10 @@ void ovl_inode_init(struct inode *inode, struct ovl_inode_params *oip,
 {
 	struct inode *realinode;
 
-	if (oip->upperdentry)
+	if (oip->upperdentry) {
 		OVL_I(inode)->__upperdentry = oip->upperdentry;
+		ovl_register_mark_dirty_notify(OVL_I(inode));
+	}
 	if (oip->lowerpath && oip->lowerpath->dentry)
 		OVL_I(inode)->lower = igrab(d_inode(oip->lowerpath->dentry));
 	if (oip->lowerdata)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7bce246..04ef778 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -247,6 +247,8 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
 }
 
 /* util.c */
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi);
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi);
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094..fce5314 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -129,6 +129,8 @@ struct ovl_inode {
 
 	/* synchronize copy up and more */
 	struct mutex lock;
+	/* moniter marking dirty behavior of upper inode */
+	struct notifier_block mark_dirty_nb;
 };
 
 static inline struct ovl_inode *OVL_I(struct inode *inode)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index dc22725..6d8f9da 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -390,11 +390,22 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 	return ret;
 }
 
+void ovl_evict_inode(struct inode *inode)
+{
+	struct inode *upper;
+
+	upper = ovl_inode_upper(inode);
+	if (upper)
+		ovl_unregister_mark_dirty_notify(OVL_I(inode));
+	clear_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
+	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f4756..bdcfe55 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -417,6 +417,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
 		inode->i_private = upperinode;
 		__insert_inode_hash(inode, (unsigned long) upperinode);
 	}
+	ovl_register_mark_dirty_notify(OVL_I(inode));
 }
 
 static void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
@@ -950,3 +951,36 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	kfree(buf);
 	return ERR_PTR(res);
 }
+
+int ovl_inode_dirty_notify(struct notifier_block *nb,
+			   unsigned long dummy, void *param)
+{
+	struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
+	int *flags = (int *)param;
+
+	// add later
+	//__mark_inode_dirty(&oi->vfs_inode, *flags);
+	return NOTIFY_OK;
+}
+
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi)
+{
+	struct inode *upper;
+
+	upper = oi->__upperdentry->d_inode;
+	oi->mark_dirty_nb.notifier_call = ovl_inode_dirty_notify;
+
+	blocking_notifier_chain_register(
+		&upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		&oi->mark_dirty_nb);
+}
+
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi)
+{
+	struct inode *upper;
+
+	upper = oi->__upperdentry->d_inode;
+	blocking_notifier_chain_unregister(
+		&upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		&oi->mark_dirty_nb);
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (3 preceding siblings ...)
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-11  2:57   ` kernel test robot
                     ` (3 more replies)
  4 siblings, 4 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Mark overlayfs' inode dirty when underlying upper inode becomes dirty,
then only sync overlayfs' dirty inodes while syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c | 24 ++++++++++++++++++++++++
 fs/overlayfs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
 fs/overlayfs/util.c  |  3 +--
 3 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e75c7ec..c709aed 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
 #include <linux/posix_acl.h>
 #include <linux/ratelimit.h>
 #include <linux/fiemap.h>
+#include <linux/writeback.h>
 #include "overlayfs.h"
 
 
@@ -516,7 +517,30 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	.update_time	= ovl_update_time,
 };
 
+static int ovl_writepages(struct address_space *mapping, struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct inode *upper_inode = ovl_inode_upper(inode);
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct writeback_control upper_wbc = {
+		.nr_to_write		= LONG_MAX,
+		.sync_mode		= WB_SYNC_ALL,
+		.for_kupdate		= wbc->for_kupdate,
+		.for_background		= wbc->for_background,
+		.for_sync		= 0,
+		.range_cyclic		= wbc->range_cyclic,
+		.range_start		= 0,
+		.range_end		= LLONG_MAX,
+	};
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+
+	return writeback_single_inode(upper_inode, &upper_wbc);
+}
+
 static const struct address_space_operations ovl_aops = {
+	.writepages             = ovl_writepages,
 	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
 	.direct_IO		= noop_direct_IO,
 };
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6d8f9da..da1b65a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,9 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include <linux/exportfs.h>
+#include <linux/notifier.h>
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -279,9 +282,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 
 	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
 
-	down_read(&upper_sb->s_umount);
-	ret = sync_filesystem(upper_sb);
-	up_read(&upper_sb->s_umount);
+	if (upper_sb->s_op->sync_fs) {
+		down_read(&upper_sb->s_umount);
+		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+		if (!ret)
+			ret = sync_blockdev(upper_sb->s_bdev);
+		up_read(&upper_sb->s_umount);
+	}
 
 	return ret;
 }
@@ -400,11 +407,47 @@ void ovl_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
+int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
+	struct inode *upper_inode = ovl_inode_upper(inode);
+	struct writeback_control upper_wbc = {
+		.nr_to_write		= LONG_MAX,
+		.sync_mode              = WB_SYNC_ALL,
+		.tagged_writepages	= wbc->tagged_writepages,
+		.for_kupdate		= wbc->for_kupdate,
+		.for_background		= wbc->for_background,
+		.for_sync		= 0,
+		.range_cyclic		= wbc->range_cyclic,
+		.range_start		= wbc->range_start,
+		.range_end		= wbc->range_end,
+	};
+
+	if (!upper_sb->s_op->write_inode || !upper_inode)
+		return 0;
+	return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
+}
+
+int ovl_drop_inode(struct inode *inode)
+{
+	struct inode *upper_inode;
+
+	upper_inode = ovl_inode_upper(inode);
+	if (!upper_inode)
+		return 1;
+	if (!(upper_inode->i_state & I_DIRTY_ALL))
+		return 1;
+
+	return generic_drop_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
-	.drop_inode	= generic_delete_inode,
+	.drop_inode	= ovl_drop_inode,
+	.write_inode	= ovl_write_inode,
 	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index bdcfe55..93d0493 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -958,8 +958,7 @@ int ovl_inode_dirty_notify(struct notifier_block *nb,
 	struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
 	int *flags = (int *)param;
 
-	// add later
-	//__mark_inode_dirty(&oi->vfs_inode, *flags);
+	__mark_inode_dirty(&oi->vfs_inode, *flags);
 	return NOTIFY_OK;
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
@ 2020-10-11  1:16   ` kernel test robot
  2020-10-11  1:55   ` kernel test robot
  1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11  1:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2789 bytes --]

Hi Chengguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6a0553975f03bb63033f5c246273010a38a66ccf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
        git checkout 6a0553975f03bb63033f5c246273010a38a66ccf
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/overlayfs/super.c:393:6: warning: no previous prototype for 'ovl_evict_inode' [-Wmissing-prototypes]
     393 | void ovl_evict_inode(struct inode *inode)
         |      ^~~~~~~~~~~~~~~
--
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
     955 | int ovl_inode_dirty_notify(struct notifier_block *nb,
         |     ^~~~~~~~~~~~~~~~~~~~~~
   fs/overlayfs/util.c: In function 'ovl_inode_dirty_notify':
   fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
     959 |  int *flags = (int *)param;
         |       ^~~~~
   fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
     958 |  struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
         |                    ^~

vim +/ovl_evict_inode +393 fs/overlayfs/super.c

   392	
 > 393	void ovl_evict_inode(struct inode *inode)
   394	{
   395		struct inode *upper;
   396	
   397		upper = ovl_inode_upper(inode);
   398		if (upper)
   399			ovl_unregister_mark_dirty_notify(OVL_I(inode));
   400		clear_inode(inode);
   401	}
   402	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65077 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
  2020-10-11  1:16   ` kernel test robot
@ 2020-10-11  1:55   ` kernel test robot
  1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11  1:55 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]

Hi Chengguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: arm-randconfig-r034-20201011 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project a8682554c6662ce01853d0069afb6c5b4ef8ab55)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/6a0553975f03bb63033f5c246273010a38a66ccf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
        git checkout 6a0553975f03bb63033f5c246273010a38a66ccf
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/overlayfs/super.c:393:6: warning: no previous prototype for function 'ovl_evict_inode' [-Wmissing-prototypes]
   void ovl_evict_inode(struct inode *inode)
        ^
   fs/overlayfs/super.c:393:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void ovl_evict_inode(struct inode *inode)
   ^
   static 
   1 warning generated.
--
   fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
           int *flags = (int *)param;
                ^
   fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
           struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
                             ^
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for function 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
   int ovl_inode_dirty_notify(struct notifier_block *nb,
       ^
   fs/overlayfs/util.c:955:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int ovl_inode_dirty_notify(struct notifier_block *nb,
   ^
   static 
   3 warnings generated.
--
   fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
           struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
                             ^
   fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
           int *flags = (int *)param;
                ^
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for function 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
   int ovl_inode_dirty_notify(struct notifier_block *nb,
       ^
   fs/overlayfs/util.c:955:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int ovl_inode_dirty_notify(struct notifier_block *nb,
   ^
   static 
   3 warnings generated.

vim +/ovl_evict_inode +393 fs/overlayfs/super.c

   392	
 > 393	void ovl_evict_inode(struct inode *inode)
   394	{
   395		struct inode *upper;
   396	
   397		upper = ovl_inode_upper(inode);
   398		if (upper)
   399			ovl_unregister_mark_dirty_notify(OVL_I(inode));
   400		clear_inode(inode);
   401	}
   402	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33109 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-11  2:57   ` kernel test robot
  2020-10-11  3:10   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11  2:57 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 9960 bytes --]

Hi Chengguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: arm-randconfig-r034-20201011 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project a8682554c6662ce01853d0069afb6c5b4ef8ab55)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/12d938a37e0a31d43b15e07aef1cd821bf11dc0e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
        git checkout 12d938a37e0a31d43b15e07aef1cd821bf11dc0e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/overlayfs/super.c:285:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (upper_sb->s_op->sync_fs) {
               ^~~~~~~~~~~~~~~~~~~~~~~
   fs/overlayfs/super.c:293:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   fs/overlayfs/super.c:285:2: note: remove the 'if' if its condition is always true
           if (upper_sb->s_op->sync_fs) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/overlayfs/super.c:265:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   fs/overlayfs/super.c:400:6: warning: no previous prototype for function 'ovl_evict_inode' [-Wmissing-prototypes]
   void ovl_evict_inode(struct inode *inode)
        ^
   fs/overlayfs/super.c:400:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void ovl_evict_inode(struct inode *inode)
   ^
   static 
>> fs/overlayfs/super.c:410:5: warning: no previous prototype for function 'ovl_write_inode' [-Wmissing-prototypes]
   int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
       ^
   fs/overlayfs/super.c:410:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
   ^
   static 
>> fs/overlayfs/super.c:432:5: warning: no previous prototype for function 'ovl_drop_inode' [-Wmissing-prototypes]
   int ovl_drop_inode(struct inode *inode)
       ^
   fs/overlayfs/super.c:432:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int ovl_drop_inode(struct inode *inode)
   ^
   static 
   4 warnings generated.

vim +285 fs/overlayfs/super.c

   259	
   260	/* Sync real dirty inodes in upper filesystem (if it exists) */
   261	static int ovl_sync_fs(struct super_block *sb, int wait)
   262	{
   263		struct ovl_fs *ofs = sb->s_fs_info;
   264		struct super_block *upper_sb;
   265		int ret;
   266	
   267		if (!ovl_upper_mnt(ofs))
   268			return 0;
   269	
   270		if (!ovl_should_sync(ofs))
   271			return 0;
   272		/*
   273		 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
   274		 * All the super blocks will be iterated, including upper_sb.
   275		 *
   276		 * If this is a syncfs(2) call, then we do need to call
   277		 * sync_filesystem() on upper_sb, but enough if we do it when being
   278		 * called with wait == 1.
   279		 */
   280		if (!wait)
   281			return 0;
   282	
   283		upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
   284	
 > 285		if (upper_sb->s_op->sync_fs) {
   286			down_read(&upper_sb->s_umount);
   287			ret = upper_sb->s_op->sync_fs(upper_sb, wait);
   288			if (!ret)
   289				ret = sync_blockdev(upper_sb->s_bdev);
   290			up_read(&upper_sb->s_umount);
   291		}
   292	
   293		return ret;
   294	}
   295	
   296	/**
   297	 * ovl_statfs
   298	 * @sb: The overlayfs super block
   299	 * @buf: The struct kstatfs to fill in with stats
   300	 *
   301	 * Get the filesystem statistics.  As writes always target the upper layer
   302	 * filesystem pass the statfs to the upper filesystem (if it exists)
   303	 */
   304	static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
   305	{
   306		struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
   307		struct dentry *root_dentry = dentry->d_sb->s_root;
   308		struct path path;
   309		int err;
   310	
   311		ovl_path_real(root_dentry, &path);
   312	
   313		err = vfs_statfs(&path, buf);
   314		if (!err) {
   315			buf->f_namelen = ofs->namelen;
   316			buf->f_type = OVERLAYFS_SUPER_MAGIC;
   317		}
   318	
   319		return err;
   320	}
   321	
   322	/* Will this overlay be forced to mount/remount ro? */
   323	static bool ovl_force_readonly(struct ovl_fs *ofs)
   324	{
   325		return (!ovl_upper_mnt(ofs) || !ofs->workdir);
   326	}
   327	
   328	static const char *ovl_redirect_mode_def(void)
   329	{
   330		return ovl_redirect_dir_def ? "on" : "off";
   331	}
   332	
   333	static const char * const ovl_xino_str[] = {
   334		"off",
   335		"auto",
   336		"on",
   337	};
   338	
   339	static inline int ovl_xino_def(void)
   340	{
   341		return ovl_xino_auto_def ? OVL_XINO_AUTO : OVL_XINO_OFF;
   342	}
   343	
   344	/**
   345	 * ovl_show_options
   346	 *
   347	 * Prints the mount options for a given superblock.
   348	 * Returns zero; does not fail.
   349	 */
   350	static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
   351	{
   352		struct super_block *sb = dentry->d_sb;
   353		struct ovl_fs *ofs = sb->s_fs_info;
   354	
   355		seq_show_option(m, "lowerdir", ofs->config.lowerdir);
   356		if (ofs->config.upperdir) {
   357			seq_show_option(m, "upperdir", ofs->config.upperdir);
   358			seq_show_option(m, "workdir", ofs->config.workdir);
   359		}
   360		if (ofs->config.default_permissions)
   361			seq_puts(m, ",default_permissions");
   362		if (strcmp(ofs->config.redirect_mode, ovl_redirect_mode_def()) != 0)
   363			seq_printf(m, ",redirect_dir=%s", ofs->config.redirect_mode);
   364		if (ofs->config.index != ovl_index_def)
   365			seq_printf(m, ",index=%s", ofs->config.index ? "on" : "off");
   366		if (ofs->config.nfs_export != ovl_nfs_export_def)
   367			seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
   368							"on" : "off");
   369		if (ofs->config.xino != ovl_xino_def() && !ovl_same_fs(sb))
   370			seq_printf(m, ",xino=%s", ovl_xino_str[ofs->config.xino]);
   371		if (ofs->config.metacopy != ovl_metacopy_def)
   372			seq_printf(m, ",metacopy=%s",
   373				   ofs->config.metacopy ? "on" : "off");
   374		if (ofs->config.ovl_volatile)
   375			seq_puts(m, ",volatile");
   376		return 0;
   377	}
   378	
   379	static int ovl_remount(struct super_block *sb, int *flags, char *data)
   380	{
   381		struct ovl_fs *ofs = sb->s_fs_info;
   382		struct super_block *upper_sb;
   383		int ret = 0;
   384	
   385		if (!(*flags & SB_RDONLY) && ovl_force_readonly(ofs))
   386			return -EROFS;
   387	
   388		if (*flags & SB_RDONLY && !sb_rdonly(sb)) {
   389			upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
   390			if (ovl_should_sync(ofs)) {
   391				down_read(&upper_sb->s_umount);
   392				ret = sync_filesystem(upper_sb);
   393				up_read(&upper_sb->s_umount);
   394			}
   395		}
   396	
   397		return ret;
   398	}
   399	
   400	void ovl_evict_inode(struct inode *inode)
   401	{
   402		struct inode *upper;
   403	
   404		upper = ovl_inode_upper(inode);
   405		if (upper)
   406			ovl_unregister_mark_dirty_notify(OVL_I(inode));
   407		clear_inode(inode);
   408	}
   409	
 > 410	int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
   411	{
   412		struct ovl_fs *ofs = inode->i_sb->s_fs_info;
   413		struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
   414		struct inode *upper_inode = ovl_inode_upper(inode);
   415		struct writeback_control upper_wbc = {
   416			.nr_to_write		= LONG_MAX,
   417			.sync_mode              = WB_SYNC_ALL,
   418			.tagged_writepages	= wbc->tagged_writepages,
   419			.for_kupdate		= wbc->for_kupdate,
   420			.for_background		= wbc->for_background,
   421			.for_sync		= 0,
   422			.range_cyclic		= wbc->range_cyclic,
   423			.range_start		= wbc->range_start,
   424			.range_end		= wbc->range_end,
   425		};
   426	
   427		if (!upper_sb->s_op->write_inode || !upper_inode)
   428			return 0;
   429		return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
   430	}
   431	
 > 432	int ovl_drop_inode(struct inode *inode)
   433	{
   434		struct inode *upper_inode;
   435	
   436		upper_inode = ovl_inode_upper(inode);
   437		if (!upper_inode)
   438			return 1;
   439		if (!(upper_inode->i_state & I_DIRTY_ALL))
   440			return 1;
   441	
   442		return generic_drop_inode(inode);
   443	}
   444	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33109 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  2020-10-11  2:57   ` kernel test robot
@ 2020-10-11  3:10   ` kernel test robot
  2020-10-11 13:08   ` kernel test robot
  2020-10-12 12:31     ` Dan Carpenter
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11  3:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3806 bytes --]

Hi Chengguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/12d938a37e0a31d43b15e07aef1cd821bf11dc0e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
        git checkout 12d938a37e0a31d43b15e07aef1cd821bf11dc0e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/overlayfs/super.c:400:6: warning: no previous prototype for 'ovl_evict_inode' [-Wmissing-prototypes]
     400 | void ovl_evict_inode(struct inode *inode)
         |      ^~~~~~~~~~~~~~~
>> fs/overlayfs/super.c:410:5: warning: no previous prototype for 'ovl_write_inode' [-Wmissing-prototypes]
     410 | int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
         |     ^~~~~~~~~~~~~~~
>> fs/overlayfs/super.c:432:5: warning: no previous prototype for 'ovl_drop_inode' [-Wmissing-prototypes]
     432 | int ovl_drop_inode(struct inode *inode)
         |     ^~~~~~~~~~~~~~

vim +/ovl_write_inode +410 fs/overlayfs/super.c

   399	
 > 400	void ovl_evict_inode(struct inode *inode)
   401	{
   402		struct inode *upper;
   403	
   404		upper = ovl_inode_upper(inode);
   405		if (upper)
   406			ovl_unregister_mark_dirty_notify(OVL_I(inode));
   407		clear_inode(inode);
   408	}
   409	
 > 410	int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
   411	{
   412		struct ovl_fs *ofs = inode->i_sb->s_fs_info;
   413		struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
   414		struct inode *upper_inode = ovl_inode_upper(inode);
   415		struct writeback_control upper_wbc = {
   416			.nr_to_write		= LONG_MAX,
   417			.sync_mode              = WB_SYNC_ALL,
   418			.tagged_writepages	= wbc->tagged_writepages,
   419			.for_kupdate		= wbc->for_kupdate,
   420			.for_background		= wbc->for_background,
   421			.for_sync		= 0,
   422			.range_cyclic		= wbc->range_cyclic,
   423			.range_start		= wbc->range_start,
   424			.range_end		= wbc->range_end,
   425		};
   426	
   427		if (!upper_sb->s_op->write_inode || !upper_inode)
   428			return 0;
   429		return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
   430	}
   431	
 > 432	int ovl_drop_inode(struct inode *inode)
   433	{
   434		struct inode *upper_inode;
   435	
   436		upper_inode = ovl_inode_upper(inode);
   437		if (!upper_inode)
   438			return 1;
   439		if (!(upper_inode->i_state & I_DIRTY_ALL))
   440			return 1;
   441	
   442		return generic_drop_inode(inode);
   443	}
   444	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65077 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  2020-10-11  2:57   ` kernel test robot
  2020-10-11  3:10   ` kernel test robot
@ 2020-10-11 13:08   ` kernel test robot
  2020-10-12 12:31     ` Dan Carpenter
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 13:08 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]

Hi Chengguang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"cppcheck warnings: (new ones prefixed by >>)"
        ^
>> fs/overlayfs/super.c:293:9: warning: Uninitialized variable: ret [uninitvar]
    return ret;
           ^

vim +293 fs/overlayfs/super.c

a9075cdb467dd3b Miklos Szeredi        2017-11-10  259  
e8d4bfe3a715372 Chengguang Xu         2017-11-29  260  /* Sync real dirty inodes in upper filesystem (if it exists) */
e593b2bf513dd4d Amir Goldstein        2017-01-23  261  static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  262  {
ad204488d3046b3 Miklos Szeredi        2017-11-10  263  	struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein        2017-01-23  264  	struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein        2017-01-23  265  	int ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  266  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  267  	if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein        2017-01-23  268  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  269  
c86243b090bc25f Vivek Goyal           2020-08-31  270  	if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal           2020-08-31  271  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  272  	/*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  273  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  274  	 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  275  	 *
e8d4bfe3a715372 Chengguang Xu         2017-11-29  276  	 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu         2017-11-29  277  	 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu         2017-11-29  278  	 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  279  	 */
e8d4bfe3a715372 Chengguang Xu         2017-11-29  280  	if (!wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  281  		return 0;
e593b2bf513dd4d Amir Goldstein        2017-01-23  282  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  283  	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  284  
12d938a37e0a31d Chengguang Xu         2020-10-10  285  	if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein        2017-01-23  286  		down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  287  		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu         2020-10-10  288  		if (!ret)
12d938a37e0a31d Chengguang Xu         2020-10-10  289  			ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein        2017-01-23  290  		up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  291  	}
e8d4bfe3a715372 Chengguang Xu         2017-11-29  292  
e593b2bf513dd4d Amir Goldstein        2017-01-23 @293  	return ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  294  }
e593b2bf513dd4d Amir Goldstein        2017-01-23  295  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-12 12:31     ` Dan Carpenter
  2020-10-11  3:10   ` kernel test robot
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Dan Carpenter @ 2020-10-12 12:31 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

Hi Chengguang,

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: i386-randconfig-m021-20201011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/overlayfs/super.c:293 ovl_sync_fs() error: uninitialized symbol 'ret'.

vim +/ret +293 fs/overlayfs/super.c

e593b2bf513dd4d Amir Goldstein        2017-01-23  261  static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  262  {
ad204488d3046b3 Miklos Szeredi        2017-11-10  263  	struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein        2017-01-23  264  	struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein        2017-01-23  265  	int ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  266  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  267  	if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein        2017-01-23  268  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  269  
c86243b090bc25f Vivek Goyal           2020-08-31  270  	if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal           2020-08-31  271  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  272  	/*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  273  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  274  	 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  275  	 *
e8d4bfe3a715372 Chengguang Xu         2017-11-29  276  	 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu         2017-11-29  277  	 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu         2017-11-29  278  	 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  279  	 */
e8d4bfe3a715372 Chengguang Xu         2017-11-29  280  	if (!wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  281  		return 0;
e593b2bf513dd4d Amir Goldstein        2017-01-23  282  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  283  	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  284  
12d938a37e0a31d Chengguang Xu         2020-10-10  285  	if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein        2017-01-23  286  		down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  287  		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu         2020-10-10  288  		if (!ret)
12d938a37e0a31d Chengguang Xu         2020-10-10  289  			ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein        2017-01-23  290  		up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  291  	}

"ret" not initialized on else path.

e8d4bfe3a715372 Chengguang Xu         2017-11-29  292  
e593b2bf513dd4d Amir Goldstein        2017-01-23 @293  	return ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  294  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33739 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
@ 2020-10-12 12:31     ` Dan Carpenter
  0 siblings, 0 replies; 30+ messages in thread
From: Dan Carpenter @ 2020-10-12 12:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

Hi Chengguang,

url:    https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: i386-randconfig-m021-20201011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/overlayfs/super.c:293 ovl_sync_fs() error: uninitialized symbol 'ret'.

vim +/ret +293 fs/overlayfs/super.c

e593b2bf513dd4d Amir Goldstein        2017-01-23  261  static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  262  {
ad204488d3046b3 Miklos Szeredi        2017-11-10  263  	struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein        2017-01-23  264  	struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein        2017-01-23  265  	int ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  266  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  267  	if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein        2017-01-23  268  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  269  
c86243b090bc25f Vivek Goyal           2020-08-31  270  	if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal           2020-08-31  271  		return 0;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  272  	/*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  273  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09  274  	 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  275  	 *
e8d4bfe3a715372 Chengguang Xu         2017-11-29  276  	 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu         2017-11-29  277  	 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu         2017-11-29  278  	 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu         2017-11-29  279  	 */
e8d4bfe3a715372 Chengguang Xu         2017-11-29  280  	if (!wait)
e593b2bf513dd4d Amir Goldstein        2017-01-23  281  		return 0;
e593b2bf513dd4d Amir Goldstein        2017-01-23  282  
08f4c7c86d4cf12 Miklos Szeredi        2020-06-04  283  	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu         2017-11-29  284  
12d938a37e0a31d Chengguang Xu         2020-10-10  285  	if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein        2017-01-23  286  		down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  287  		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu         2020-10-10  288  		if (!ret)
12d938a37e0a31d Chengguang Xu         2020-10-10  289  			ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein        2017-01-23  290  		up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu         2020-10-10  291  	}

"ret" not initialized on else path.

e8d4bfe3a715372 Chengguang Xu         2017-11-29  292  
e593b2bf513dd4d Amir Goldstein        2017-01-23 @293  	return ret;
e593b2bf513dd4d Amir Goldstein        2017-01-23  294  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33739 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-14 16:15   ` Jan Kara
  2020-10-15  3:03     ` Chengguang Xu
  2020-10-15  3:25   ` Al Viro
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Kara @ 2020-10-14 16:15 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.
> 
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>

So I like how the patch set is elegant however growing struct inode for
everybody by struct blocking_notifier_head (which is rwsem + pointer) is
rather harsh just for this overlayfs functionality... Ideally this should
induce no overhead on struct inode if the filesystem is not covered by
overlayfs. So I'd rather place some external structure into the superblock
that would get allocated on the first use that would track dirty notifications
for each inode. I would not generalize the code for more possible
notifications at this point.

Also now that I'm thinking about it can there be multiple overlayfs inodes
for one upper inode? If not, then the mechanism of dirtiness propagation
could be much simpler - it seems we could be able to just lookup
corresponding overlayfs inode based on upper inode and then mark it dirty
(but this would need to be verified by people more familiar with
overlayfs). So all we'd need to know for this is the superblock of the
overlayfs that's covering given upper filesystem...

								Honza
> ---
>  fs/fs-writeback.c  | 4 ++++
>  fs/inode.c         | 5 +++++
>  include/linux/fs.h | 6 ++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..657cceb 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  {
>  	struct super_block *sb = inode->i_sb;
>  	int dirtytime;
> +	int copy_flags = flags;
>  
>  	trace_writeback_mark_inode_dirty(inode, flags);
>  
> +	blocking_notifier_call_chain(
> +		&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
> +		0, &copy_flags);
>  	/*
>  	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
>  	 * dirty the inode itself
> diff --git a/fs/inode.c b/fs/inode.c
> index 72c4c34..84e82db 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
>   */
>  void inode_init_once(struct inode *inode)
>  {
> +	int i;
> +
>  	memset(inode, 0, sizeof(*inode));
>  	INIT_HLIST_NODE(&inode->i_hash);
>  	INIT_LIST_HEAD(&inode->i_devices);
>  	INIT_LIST_HEAD(&inode->i_io_list);
>  	INIT_LIST_HEAD(&inode->i_wb_list);
>  	INIT_LIST_HEAD(&inode->i_lru);
> +	for (i = 0; i < INODE_MAX_NOTIFIER; i++)
> +		BLOCKING_INIT_NOTIFIER_HEAD(
> +			&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
>  	__address_space_init_once(&inode->i_data);
>  	i_size_ordered_init(inode);
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7519ae0..42f0750 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
>  
>  struct fsnotify_mark_connector;
>  
> +enum {
> +	MARK_INODE_DIRTY_NOTIFIER,
> +	INODE_MAX_NOTIFIER,
> +};
> +
>  /*
>   * Keep mostly read-only and often accessed (especially for
>   * the RCU path lookup and 'stat' data) fields at the beginning
> @@ -723,6 +728,7 @@ struct inode {
>  #endif
>  
>  	void			*i_private; /* fs or device private pointer */
> +	struct	blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
>  } __randomize_layout;
>  
>  struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
> -- 
> 1.8.3.1
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-14 16:15   ` Jan Kara
@ 2020-10-15  3:03     ` Chengguang Xu
  2020-10-15  6:11       ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15  3:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: miklos, amir73il, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
 > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
 > > Currently there is no notification api for kernel about modification
 > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > > will be very helpful to implement containerized syncfs functionality.
 > > As the first attempt, we introduce marking inode dirty notification so that
 > > overlay's inode could mark itself dirty as well and then only sync dirty
 > > overlay inode while syncfs.
 > > 
 > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > 
 > So I like how the patch set is elegant however growing struct inode for
 > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
 > rather harsh just for this overlayfs functionality... Ideally this should
 > induce no overhead on struct inode if the filesystem is not covered by
 > overlayfs. So I'd rather place some external structure into the superblock
 > that would get allocated on the first use that would track dirty notifications
 > for each inode. I would not generalize the code for more possible
 > notifications at this point.
 > 
 > Also now that I'm thinking about it can there be multiple overlayfs inodes
 > for one upper inode? If not, then the mechanism of dirtiness propagation

One upper inode only belongs to one overlayfs inode. However, in practice
one upper fs may contains hundreds or even thousands of overlayfs instances.

 > could be much simpler - it seems we could be able to just lookup
 > corresponding overlayfs inode based on upper inode and then mark it dirty
 > (but this would need to be verified by people more familiar with
 > overlayfs). So all we'd need to know for this is the superblock of the
 > overlayfs that's covering given upper filesystem...
 > 

So the difficulty is how we get overlayfs inode efficiently from upper inode,
it seems if we don't have additional info of upper inode to indicate which
overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
will be quite expensive and probably impact write performance. 

Is that possible we extend inotify infrastructure slightly to notify both
user space and kernel component?


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
  2020-10-14 16:15   ` Jan Kara
@ 2020-10-15  3:25   ` Al Viro
  2020-10-15  3:42     ` Chengguang Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Al Viro @ 2020-10-15  3:25 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.

Who's responsible for removing the crap from notifier chain?  And how does
that affect the lifetime of inode?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:25   ` Al Viro
@ 2020-10-15  3:42     ` Chengguang Xu
  2020-10-15  4:57       ` Al Viro
  0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15  3:42 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > > Currently there is no notification api for kernel about modification
 > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > > will be very helpful to implement containerized syncfs functionality.
 > > As the first attempt, we introduce marking inode dirty notification so that
 > > overlay's inode could mark itself dirty as well and then only sync dirty
 > > overlay inode while syncfs.
 > 
 > Who's responsible for removing the crap from notifier chain?  And how does
 > that affect the lifetime of inode?
 
In this case, overlayfs unregisters call back from the notifier chain of upper inode
when evicting it's own  inode. It will not affect the lifetime of upper inode because
overlayfs inode holds a reference of upper inode that means upper inode will not be
evicted while overlayfs inode is still alive.


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:42     ` Chengguang Xu
@ 2020-10-15  4:57       ` Al Viro
  2020-10-15 10:56         ` Chengguang Xu
  2020-10-16  7:09         ` Chengguang Xu
  0 siblings, 2 replies; 30+ messages in thread
From: Al Viro @ 2020-10-15  4:57 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
>  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
>  > > Currently there is no notification api for kernel about modification
>  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > > will be very helpful to implement containerized syncfs functionality.
>  > > As the first attempt, we introduce marking inode dirty notification so that
>  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > > overlay inode while syncfs.
>  > 
>  > Who's responsible for removing the crap from notifier chain?  And how does
>  > that affect the lifetime of inode?
>  
> In this case, overlayfs unregisters call back from the notifier chain of upper inode
> when evicting it's own  inode. It will not affect the lifetime of upper inode because
> overlayfs inode holds a reference of upper inode that means upper inode will not be
> evicted while overlayfs inode is still alive.

Let me see if I've got it right:
	* your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
vast majority of inodes) recepients
	* recepient pins the inode for as long as the recepient exists

That looks like a massive overkill, especially since all you are propagating is
dirtying the suckers.  All you really need is one bit in your inode + hash table
indexed by the address of struct inode (well, middle bits thereof, as usual).
With entries embedded into overlayfs-private part of overlayfs inode.  And callback
to be called stored in that entry...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:03     ` Chengguang Xu
@ 2020-10-15  6:11       ` Amir Goldstein
  2020-10-15 11:29         ` Chengguang Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15  6:11 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
>  > > Currently there is no notification api for kernel about modification
>  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > > will be very helpful to implement containerized syncfs functionality.
>  > > As the first attempt, we introduce marking inode dirty notification so that
>  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > > overlay inode while syncfs.
>  > >
>  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  >
>  > So I like how the patch set is elegant however growing struct inode for
>  > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
>  > rather harsh just for this overlayfs functionality... Ideally this should
>  > induce no overhead on struct inode if the filesystem is not covered by
>  > overlayfs. So I'd rather place some external structure into the superblock
>  > that would get allocated on the first use that would track dirty notifications
>  > for each inode. I would not generalize the code for more possible
>  > notifications at this point.
>  >
>  > Also now that I'm thinking about it can there be multiple overlayfs inodes
>  > for one upper inode? If not, then the mechanism of dirtiness propagation
>
> One upper inode only belongs to one overlayfs inode. However, in practice
> one upper fs may contains hundreds or even thousands of overlayfs instances.
>
>  > could be much simpler - it seems we could be able to just lookup
>  > corresponding overlayfs inode based on upper inode and then mark it dirty
>  > (but this would need to be verified by people more familiar with
>  > overlayfs). So all we'd need to know for this is the superblock of the
>  > overlayfs that's covering given upper filesystem...
>  >
>
> So the difficulty is how we get overlayfs inode efficiently from upper inode,
> it seems if we don't have additional info of upper inode to indicate which
> overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
> will be quite expensive and probably impact write performance.
>
> Is that possible we extend inotify infrastructure slightly to notify both
> user space and kernel component?
>
>

When I first saw your suggestion, that is what I was thinking.
Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
in struct inode already exists.

But I have to admit this approach seems like a massive overkill to
what you need.

What you implemented, tracks upper inodes that could have
been dirtied under overlayfs, but what you really want is to
track is upper inodes that were dirtied *by* overlayfs.

And for that purpose, as Miklos said several times, it would be best
pursue the overlayfs aops approach, even though it may be much
harder..

Your earlier patches maintained a list of overlayfs to be synced inodes.
Remind me what was wrong with that approach?

Perhaps you can combine that with the shadow overlay sbi approach.
Instead of dirtying overlay inode when underlying is dirtied, you can
"pre-dirty" overlayfs inode in higher level file ops and add them to the
"maybe-dirty" list (e.g. after write).

ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
if the underlying inode is still dirty on the (!wait) pass.

As for memory mapped inodes via overlayfs (which can be dirtied without
notifying overlayfs) I am not sure that is a big problem in practice.

When an inode is writably mapped via ovarlayfs, you can flag the
overlay inode with "maybe-writably-mapped" and then remove
it from the maybe dirty list when the underlying inode is not dirty
AND its i_writecount is 0 (checked on write_inode() and release()).

Actually, there is no reason to treat writably mapped inodes and
other dirty inodes differently - insert to suspect list on open for
write, remove from suspect list on last release() or write_inode()
when inode is no longer dirty and writable.

Did I miss anything?

Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  4:57       ` Al Viro
@ 2020-10-15 10:56         ` Chengguang Xu
  2020-10-16  7:09         ` Chengguang Xu
  1 sibling, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 10:56 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

 ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > 
 > >  > Who's responsible for removing the crap from notifier chain?  And how does
 > >  > that affect the lifetime of inode?
 > >  
 > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
 > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
 > > overlayfs inode holds a reference of upper inode that means upper inode will not be
 > > evicted while overlayfs inode is still alive.
 > 
 > Let me see if I've got it right:
 >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
 > vast majority of inodes) recepients
 >     * recepient pins the inode for as long as the recepient exists
 > 
 > That looks like a massive overkill, especially since all you are propagating is
 > dirtying the suckers.  All you really need is one bit in your inode + hash table
 > indexed by the address of struct inode (well, middle bits thereof, as usual).
 > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
 > to be called stored in that entry...
 > 


Yeah, actually  that could implement the logic but I'm afraid of performance degradation
caused by lock contention of hash table in concurrent file write because in practice we
build up hundreds of overlayfs instances on same underlying filesystem.

Thanks,
Chengguang


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  6:11       ` Amir Goldstein
@ 2020-10-15 11:29         ` Chengguang Xu
  2020-10-15 12:32           ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 11:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519

 ---- 在 星期四, 2020-10-15 14:11:04 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > >
 > >  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  >
 > >  > So I like how the patch set is elegant however growing struct inode for
 > >  > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
 > >  > rather harsh just for this overlayfs functionality... Ideally this should
 > >  > induce no overhead on struct inode if the filesystem is not covered by
 > >  > overlayfs. So I'd rather place some external structure into the superblock
 > >  > that would get allocated on the first use that would track dirty notifications
 > >  > for each inode. I would not generalize the code for more possible
 > >  > notifications at this point.
 > >  >
 > >  > Also now that I'm thinking about it can there be multiple overlayfs inodes
 > >  > for one upper inode? If not, then the mechanism of dirtiness propagation
 > >
 > > One upper inode only belongs to one overlayfs inode. However, in practice
 > > one upper fs may contains hundreds or even thousands of overlayfs instances.
 > >
 > >  > could be much simpler - it seems we could be able to just lookup
 > >  > corresponding overlayfs inode based on upper inode and then mark it dirty
 > >  > (but this would need to be verified by people more familiar with
 > >  > overlayfs). So all we'd need to know for this is the superblock of the
 > >  > overlayfs that's covering given upper filesystem...
 > >  >
 > >
 > > So the difficulty is how we get overlayfs inode efficiently from upper inode,
 > > it seems if we don't have additional info of upper inode to indicate which
 > > overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
 > > will be quite expensive and probably impact write performance.
 > >
 > > Is that possible we extend inotify infrastructure slightly to notify both
 > > user space and kernel component?
 > >
 > >
 > 
 > When I first saw your suggestion, that is what I was thinking.
 > Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
 > in struct inode already exists.
 > 
 > But I have to admit this approach seems like a massive overkill to
 > what you need.
 > 
 > What you implemented, tracks upper inodes that could have
 > been dirtied under overlayfs, but what you really want is to
 > track is upper inodes that were dirtied *by* overlayfs.
 > 
 > And for that purpose, as Miklos said several times, it would be best
 > pursue the overlayfs aops approach, even though it may be much
 > harder..
 > 

IIUC, that solution was raised to solve mmap rw-ro issue.
That maybe is an ultimate goal and I'm wondering whether we must
implement that if we have easier approach to solve mmap and syncfs
issues.

 > Your earlier patches maintained a list of overlayfs to be synced inodes.
 > Remind me what was wrong with that approach?
 > 

I think the main concerns are the complexity and the timing of releasing
ovl_sync_entry  struct.


 > Perhaps you can combine that with the shadow overlay sbi approach.
 > Instead of dirtying overlay inode when underlying is dirtied, you can
 > "pre-dirty" overlayfs inode in higher level file ops and add them to the
 > "maybe-dirty" list (e.g. after write).
 > 

Main problem is we can't be notified by set_page_dirty from writable mmap.
Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
inode and clear it after syncing, then overlay inode could be release at any time,
so in the end, maybe overlay inode is released but upper inode is still dirty and
there is no any pointer to find upper dirty inode out.


 > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
 > if the underlying inode is still dirty on the (!wait) pass.
 > 
 > As for memory mapped inodes via overlayfs (which can be dirtied without
 > notifying overlayfs) I am not sure that is a big problem in practice.
 > 

Yes, it's key problem here.

 > When an inode is writably mapped via ovarlayfs, you can flag the
 > overlay inode with "maybe-writably-mapped" and then remove
 > it from the maybe dirty list when the underlying inode is not dirty
 > AND its i_writecount is 0 (checked on write_inode() and release()).
 > 
 > Actually, there is no reason to treat writably mapped inodes and
 > other dirty inodes differently - insert to suspect list on open for
 > write, remove from suspect list on last release() or write_inode()
 > when inode is no longer dirty and writable.
 > 
 > Did I miss anything?
 > 

If we dirty overlay inode that means we have launched writeback mechanism,
so in this case, re-dirty overlay inode in time becomes important. 



Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 11:29         ` Chengguang Xu
@ 2020-10-15 12:32           ` Amir Goldstein
  2020-10-15 13:13             ` Chengguang Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 12:32 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

>  > Perhaps you can combine that with the shadow overlay sbi approach.
>  > Instead of dirtying overlay inode when underlying is dirtied, you can
>  > "pre-dirty" overlayfs inode in higher level file ops and add them to the
>  > "maybe-dirty" list (e.g. after write).
>  >
>
> Main problem is we can't be notified by set_page_dirty from writable mmap.
> Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
> inode and clear it after syncing, then overlay inode could be release at any time,
> so in the end, maybe overlay inode is released but upper inode is still dirty and
> there is no any pointer to find upper dirty inode out.
>

But we can control whether overlay inode is release with ovl_drop_inode()
right? Can we prevent dropping overlay inode if upper inode is
inode_is_open_for_write()?

About re-dirty, see below...

>
>  > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
>  > if the underlying inode is still dirty on the (!wait) pass.
>  >
>  > As for memory mapped inodes via overlayfs (which can be dirtied without
>  > notifying overlayfs) I am not sure that is a big problem in practice.
>  >
>
> Yes, it's key problem here.
>
>  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > overlay inode with "maybe-writably-mapped" and then remove
>  > it from the maybe dirty list when the underlying inode is not dirty
>  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  >
>  > Actually, there is no reason to treat writably mapped inodes and
>  > other dirty inodes differently - insert to suspect list on open for
>  > write, remove from suspect list on last release() or write_inode()
>  > when inode is no longer dirty and writable.
>  >
>  > Did I miss anything?
>  >
>
> If we dirty overlay inode that means we have launched writeback mechanism,
> so in this case, re-dirty overlay inode in time becomes important.
>

My idea was to use the first call to ovl_sync_fs() with 'wait' false
to iterate the
maybe-dirty list and re-dirty overlay inodes whose upper is dirty.

Then in the second call to __sync_filesystem, sync_inodes_sb() will take
care of calling ovl_write_inode() for all the re-dirty inodes.

In current code we sync ALL dirty upper fs inodes and we do not overlay
inodes with no reference in cache.

The best code would sync only upper fs inodes dirtied by this overlay
and will be able to evict overlay inodes whose upper inodes are clean.

The compromise code would sync only upper fs inodes dirtied by this overlay,
and would not evict overlay inodes as long as upper inodes are "open for write".
I think its a fine compromise considering the alternatives.

Is this workable?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 12:32           ` Amir Goldstein
@ 2020-10-15 13:13             ` Chengguang Xu
  2020-10-15 16:02               ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 13:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519

 ---- 在 星期四, 2020-10-15 20:32:24 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > Perhaps you can combine that with the shadow overlay sbi approach.
 > >  > Instead of dirtying overlay inode when underlying is dirtied, you can
 > >  > "pre-dirty" overlayfs inode in higher level file ops and add them to the
 > >  > "maybe-dirty" list (e.g. after write).
 > >  >
 > >
 > > Main problem is we can't be notified by set_page_dirty from writable mmap.
 > > Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
 > > inode and clear it after syncing, then overlay inode could be release at any time,
 > > so in the end, maybe overlay inode is released but upper inode is still dirty and
 > > there is no any pointer to find upper dirty inode out.
 > >
 > 
 > But we can control whether overlay inode is release with ovl_drop_inode()
 > right? Can we prevent dropping overlay inode if upper inode is
 > inode_is_open_for_write()?
 > 
 > About re-dirty, see below...
 > 
 > >
 > >  > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
 > >  > if the underlying inode is still dirty on the (!wait) pass.
 > >  >
 > >  > As for memory mapped inodes via overlayfs (which can be dirtied without
 > >  > notifying overlayfs) I am not sure that is a big problem in practice.
 > >  >
 > >
 > > Yes, it's key problem here.
 > >
 > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  >
 > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > write, remove from suspect list on last release() or write_inode()
 > >  > when inode is no longer dirty and writable.

I have to say inserting to suspect list cannot prevent dropping,
thinking of the problem of previous approach that we write dirty upper
inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.


 > >  >
 > >  > Did I miss anything?
 > >  >
 > >
 > > If we dirty overlay inode that means we have launched writeback mechanism,
 > > so in this case, re-dirty overlay inode in time becomes important.
 > >
 > 
 > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > to iterate the
 > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > 

I'm curious how we prevent dropping of clean overlay inode with dirty upper?
Hold another reference during iput_final operation? in the drop_inode() or something
else?


 > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > care of calling ovl_write_inode() for all the re-dirty inodes.
 > 
 > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > inodes with no reference in cache.
 > 
 > The best code would sync only upper fs inodes dirtied by this overlay
 > and will be able to evict overlay inodes whose upper inodes are clean.
 > 
 > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > and would not evict overlay inodes as long as upper inodes are "open for write".
 > I think its a fine compromise considering the alternatives.
 > 
 > Is this workable?
 > 

In your approach, the key point is how to prevent dropping overlay inode that has
dirty upper and no reference but I don't understand well how to achieve it from
your descriptions.


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 13:13             ` Chengguang Xu
@ 2020-10-15 16:02               ` Amir Goldstein
  2020-10-15 16:06                 ` Amir Goldstein
  2020-10-16  1:56                 ` Chengguang Xu
  0 siblings, 2 replies; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:02 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

>  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > >  > overlay inode with "maybe-writably-mapped" and then remove
>  > >  > it from the maybe dirty list when the underlying inode is not dirty
>  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  > >  >
>  > >  > Actually, there is no reason to treat writably mapped inodes and
>  > >  > other dirty inodes differently - insert to suspect list on open for
>  > >  > write, remove from suspect list on last release() or write_inode()
>  > >  > when inode is no longer dirty and writable.
>
> I have to say inserting to suspect list cannot prevent dropping,
> thinking of the problem of previous approach that we write dirty upper
> inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>

Sorry, I don't understand what that means.

>
>  > >  >
>  > >  > Did I miss anything?
>  > >  >
>  > >
>  > > If we dirty overlay inode that means we have launched writeback mechanism,
>  > > so in this case, re-dirty overlay inode in time becomes important.
>  > >
>  >
>  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
>  > to iterate the
>  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>  >
>
> I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> Hold another reference during iput_final operation? in the drop_inode() or something
> else?

No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

>
>
>  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
>  > care of calling ovl_write_inode() for all the re-dirty inodes.
>  >
>  > In current code we sync ALL dirty upper fs inodes and we do not overlay
>  > inodes with no reference in cache.
>  >
>  > The best code would sync only upper fs inodes dirtied by this overlay
>  > and will be able to evict overlay inodes whose upper inodes are clean.
>  >
>  > The compromise code would sync only upper fs inodes dirtied by this overlay,
>  > and would not evict overlay inodes as long as upper inodes are "open for write".
>  > I think its a fine compromise considering the alternatives.
>  >
>  > Is this workable?
>  >
>
> In your approach, the key point is how to prevent dropping overlay inode that has
> dirty upper and no reference but I don't understand well how to achieve it from
> your descriptions.
>
>

Very well, I will try to explain with code:

int ovl_inode_is_open_for_write(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       return upper_inode && inode_is_open_for_write(upper_inode);
}

void ovl_maybe_mark_inode_dirty(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
                mark_inode_dirty(inode);
}

int ovl_open(struct inode *inode, struct file *file)
{
...
       if (ovl_inode_is_open_for_write(file_inode(file)))
               ovl_add_inode_to_suspect_list(inode);

        file->private_data = realfile;

        return 0;
}

int ovl_release(struct inode *inode, struct file *file)
{
       struct inode *inode = file_inode(file);

       if (ovl_inode_is_open_for_write(inode)) {
               ovl_maybe_mark_inode_dirty(inode);
               ovl_remove_inode_from_suspect_list(inode);
       }

        fput(file->private_data);

        return 0;
}

int ovl_drop_inode(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (!upper_inode)
               return 1;
       if (upper_inode->i_state & I_DIRTY_ALL)
               return 0;

       return !inode_is_open_for_write(upper_inode);
}

static int ovl_sync_fs(struct super_block *sb, int wait)
{
        struct ovl_fs *ofs = sb->s_fs_info;
        struct super_block *upper_sb;
        int ret;

        if (!ovl_upper_mnt(ofs))
                return 0;

        /*
         * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
         * All the super blocks will be iterated, including upper_sb.
         *
         * If this is a syncfs(2) call, then we do need to call
         * sync_filesystem() on upper_sb, but enough if we do it when being
         * called with wait == 1.
         */
        if (!wait) {
                /* mark inodes on the suspect list dirty if thier
upper inode is dirty */
                ovl_mark_suspect_list_inodes_dirty();
                return 0;
        }
...


The races are avoided because inode is added/removed from suspect
list while overlay inode has a reference (from file) and because upper inode
cannot be dirtied by overlayfs when overlay inode is not on the suspect list.

Unless I am missing something.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 16:02               ` Amir Goldstein
@ 2020-10-15 16:06                 ` Amir Goldstein
  2020-10-16  1:56                 ` Chengguang Xu
  1 sibling, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:06 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

> Very well, I will try to explain with code:
>
> int ovl_inode_is_open_for_write(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        return upper_inode && inode_is_open_for_write(upper_inode);
> }
>
> void ovl_maybe_mark_inode_dirty(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
>                 mark_inode_dirty(inode);
> }
>
> int ovl_open(struct inode *inode, struct file *file)
> {
> ...
>        if (ovl_inode_is_open_for_write(file_inode(file)))
>                ovl_add_inode_to_suspect_list(inode);
>
>         file->private_data = realfile;
>
>         return 0;
> }
>
> int ovl_release(struct inode *inode, struct file *file)
> {
>        struct inode *inode = file_inode(file);
>
>        if (ovl_inode_is_open_for_write(inode)) {

Darn! that should have been (!ovl_inode_is_open_for_write(inode))
and it should be done after fput() and grabbing inode reference with
ihold() before fput().

>                ovl_maybe_mark_inode_dirty(inode);
>                ovl_remove_inode_from_suspect_list(inode);
>        }
>
>         fput(file->private_data);
>
>         return 0;
> }
>
> int ovl_drop_inode(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        if (!upper_inode)
>                return 1;
>        if (upper_inode->i_state & I_DIRTY_ALL)
>                return 0;
>
>        return !inode_is_open_for_write(upper_inode);
> }
>
> static int ovl_sync_fs(struct super_block *sb, int wait)
> {
>         struct ovl_fs *ofs = sb->s_fs_info;
>         struct super_block *upper_sb;
>         int ret;
>
>         if (!ovl_upper_mnt(ofs))
>                 return 0;
>
>         /*
>          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>          * All the super blocks will be iterated, including upper_sb.
>          *
>          * If this is a syncfs(2) call, then we do need to call
>          * sync_filesystem() on upper_sb, but enough if we do it when being
>          * called with wait == 1.
>          */
>         if (!wait) {
>                 /* mark inodes on the suspect list dirty if thier
> upper inode is dirty */
>                 ovl_mark_suspect_list_inodes_dirty();
>                 return 0;
>         }
> ...
>
>
> The races are avoided because inode is added/removed from suspect
> list while overlay inode has a reference (from file) and because upper inode
> cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
>
> Unless I am missing something.
>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 16:02               ` Amir Goldstein
  2020-10-15 16:06                 ` Amir Goldstein
@ 2020-10-16  1:56                 ` Chengguang Xu
  2020-10-16  4:39                   ` Amir Goldstein
  1 sibling, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16  1:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

 ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  > >  >
 > >  > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > >  > write, remove from suspect list on last release() or write_inode()
 > >  > >  > when inode is no longer dirty and writable.
 > >
 > > I have to say inserting to suspect list cannot prevent dropping,
 > > thinking of the problem of previous approach that we write dirty upper
 > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
 > >
 > 
 > Sorry, I don't understand what that means.

This is the main problem of my previous patch set V10, evicting clean inode
expects no write behavior but in the case of dirty upper inode we have to
write out dirty data in this timing otherwise we will lose the connection with upper inode.


 > 
 > >
 > >  > >  >
 > >  > >  > Did I miss anything?
 > >  > >  >
 > >  > >
 > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
 > >  > > so in this case, re-dirty overlay inode in time becomes important.
 > >  > >
 > >  >
 > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > >  > to iterate the
 > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > >  >
 > >
 > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
 > > Hold another reference during iput_final operation? in the drop_inode() or something
 > > else?
 > 
 > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

It's not good,  it  only temporarily  skips eviction, the inode in lru list
will be evicted in some cases like drop cache or memory reclaim. etc.

A solution for this is getting another reference in ->drop_inode so that
the inode can escape from adding to lru list but this looks awkward and tricky.

 > 
 > >
 > >
 > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
 > >  >
 > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > >  > inodes with no reference in cache.
 > >  >
 > >  > The best code would sync only upper fs inodes dirtied by this overlay
 > >  > and will be able to evict overlay inodes whose upper inodes are clean.
 > >  >
 > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
 > >  > I think its a fine compromise considering the alternatives.
 > >  >
 > >  > Is this workable?
 > >  >
 > >
 > > In your approach, the key point is how to prevent dropping overlay inode that has
 > > dirty upper and no reference but I don't understand well how to achieve it from
 > > your descriptions.
 > >
 > >
 > 
 > Very well, I will try to explain with code:
 > 
 > int ovl_inode_is_open_for_write(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        return upper_inode && inode_is_open_for_write(upper_inode);
 > }
 > 
 > void ovl_maybe_mark_inode_dirty(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 >                 mark_inode_dirty(inode);
 > }
 > 
 > int ovl_open(struct inode *inode, struct file *file)
 > {
 > ...
 >        if (ovl_inode_is_open_for_write(file_inode(file)))
 >                ovl_add_inode_to_suspect_list(inode);
 > 
 >         file->private_data = realfile;
 > 
 >         return 0;
 > }
 > 
 > int ovl_release(struct inode *inode, struct file *file)
 > {
 >        struct inode *inode = file_inode(file);
 > 
 >        if (ovl_inode_is_open_for_write(inode)) {
 >                ovl_maybe_mark_inode_dirty(inode);
 >                ovl_remove_inode_from_suspect_list(inode);

I think in some cases removing from suspect_list will cause losing
the connection with upper inode that has writable mmap.


 >        }
 > 
 >         fput(file->private_data);
 > 
 >         return 0;
 > }
 > 
 > int ovl_drop_inode(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (!upper_inode)
 >                return 1;
 >        if (upper_inode->i_state & I_DIRTY_ALL)
 >                return 0;
 > 
 >        return !inode_is_open_for_write(upper_inode);

Is this condition just for writable mmap?


 > }
 > 
 > static int ovl_sync_fs(struct super_block *sb, int wait)
 > {
 >         struct ovl_fs *ofs = sb->s_fs_info;
 >         struct super_block *upper_sb;
 >         int ret;
 > 
 >         if (!ovl_upper_mnt(ofs))
 >                 return 0;
 > 
 >         /*
 >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 >          * All the super blocks will be iterated, including upper_sb.
 >          *
 >          * If this is a syncfs(2) call, then we do need to call
 >          * sync_filesystem() on upper_sb, but enough if we do it when being
 >          * called with wait == 1.
 >          */
 >         if (!wait) {
 >                 /* mark inodes on the suspect list dirty if thier
 > upper inode is dirty */
 >                 ovl_mark_suspect_list_inodes_dirty();
 >                 return 0;
 >         }
 > ...
 > 

Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
on wait==1 just like my previous patch.


 > 
 > The races are avoided because inode is added/removed from suspect
 > list while overlay inode has a reference (from file) and because upper inode
 > cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
 > 
 > Unless I am missing something.
 > 

Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  1:56                 ` Chengguang Xu
@ 2020-10-16  4:39                   ` Amir Goldstein
  2020-10-16  7:43                     ` Chengguang Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-16  4:39 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
>  > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
>  > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  > >  > >  >
>  > >  > >  > Actually, there is no reason to treat writably mapped inodes and
>  > >  > >  > other dirty inodes differently - insert to suspect list on open for
>  > >  > >  > write, remove from suspect list on last release() or write_inode()
>  > >  > >  > when inode is no longer dirty and writable.
>  > >
>  > > I have to say inserting to suspect list cannot prevent dropping,
>  > > thinking of the problem of previous approach that we write dirty upper
>  > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>  > >
>  >
>  > Sorry, I don't understand what that means.
>
> This is the main problem of my previous patch set V10, evicting clean inode
> expects no write behavior but in the case of dirty upper inode we have to
> write out dirty data in this timing otherwise we will lose the connection with upper inode.
>

My thinking was that the suspect list holds a reference to the overlay inode.
The question is can we always safely get rid of that reference and remove
from the suspect list when the inode is no longer "writable". Let's see...

>
>  >
>  > >
>  > >  > >  >
>  > >  > >  > Did I miss anything?
>  > >  > >  >
>  > >  > >
>  > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
>  > >  > > so in this case, re-dirty overlay inode in time becomes important.
>  > >  > >
>  > >  >
>  > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
>  > >  > to iterate the
>  > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>  > >  >
>  > >
>  > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
>  > > Hold another reference during iput_final operation? in the drop_inode() or something
>  > > else?
>  >
>  > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
>
> It's not good,  it  only temporarily  skips eviction, the inode in lru list
> will be evicted in some cases like drop cache or memory reclaim. etc.
>
> A solution for this is getting another reference in ->drop_inode so that
> the inode can escape from adding to lru list but this looks awkward and tricky.
>

Right, that was nonsense. We need to rely on the reference held by the
suspect list.

>  >
>  > >
>  > >
>  > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
>  > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
>  > >  >
>  > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
>  > >  > inodes with no reference in cache.
>  > >  >
>  > >  > The best code would sync only upper fs inodes dirtied by this overlay
>  > >  > and will be able to evict overlay inodes whose upper inodes are clean.
>  > >  >
>  > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
>  > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
>  > >  > I think its a fine compromise considering the alternatives.
>  > >  >
>  > >  > Is this workable?
>  > >  >
>  > >
>  > > In your approach, the key point is how to prevent dropping overlay inode that has
>  > > dirty upper and no reference but I don't understand well how to achieve it from
>  > > your descriptions.
>  > >
>  > >
>  >
>  > Very well, I will try to explain with code:
>  >
>  > int ovl_inode_is_open_for_write(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        return upper_inode && inode_is_open_for_write(upper_inode);
>  > }
>  >
>  > void ovl_maybe_mark_inode_dirty(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
>  >                 mark_inode_dirty(inode);
>  > }
>  >
>  > int ovl_open(struct inode *inode, struct file *file)
>  > {
>  > ...
>  >        if (ovl_inode_is_open_for_write(file_inode(file)))
>  >                ovl_add_inode_to_suspect_list(inode);
>  >
>  >         file->private_data = realfile;
>  >
>  >         return 0;
>  > }
>  >
>  > int ovl_release(struct inode *inode, struct file *file)
>  > {
>  >        struct inode *inode = file_inode(file);
>  >
>  >        if (ovl_inode_is_open_for_write(inode)) {
>  >                ovl_maybe_mark_inode_dirty(inode);
>  >                ovl_remove_inode_from_suspect_list(inode);
>
> I think in some cases removing from suspect_list will cause losing
> the connection with upper inode that has writable mmap.
>

First of all I had a bug here.
Need to check for !ovl_inode_is_open_for_write(inode) after fput().

If the upper inode has a writable mmap, the upper inode would still
be "writable" (i_writecount held by the map realfile reference).

So when closing the last overlay file reference while upper inode
writable maps still exist, the remaining issue is when to remove
the overlay inode from the suspect list and allow its eviction and
I did not mention that.

I *think* that this condition should be safe in the regard that
after the condition is met, there is no way to dirty the upper inode
via overlayfs without going through ovl_open().
Obviously, the test should be done with some list lock held.

bool ovl_may_remove_from_suspect_list(struct inode *inode)
{
        struct inode *upper_inode = ovl_inode_upper(inode);

        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
                return false;

        return !inode_is_open_for_write(upper_inode);
}

Now remains the question of WHEN to check for removal
from the suspect list.

The first place is in ovl_sync_fs() when iterating the suspect list,
inodes that meet the above criteria are "indefinitely clean" and
may be removed from the list at that timing.

For eviction during memory pressure, overlay can register a
shrinker to do this garbage collection. Is shrinker being called
on drop_caches? I'm not sure. But we can also do periodic garbage
collection.

In the end, it is not the common case and we just need the garbage
collection to mitigate DoS (or existing workload) that does many:
- open
- mmap(...PROT_WRITE, MAP_SHARED...)
- close
- munmap


>
>  >        }
>  >
>  >         fput(file->private_data);
>  >
>  >         return 0;
>  > }
>  >
>  > int ovl_drop_inode(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        if (!upper_inode)
>  >                return 1;
>  >        if (upper_inode->i_state & I_DIRTY_ALL)
>  >                return 0;
>  >
>  >        return !inode_is_open_for_write(upper_inode);
>
> Is this condition just for writable mmap?
>

No, it's for all inodes that may be written from overlayfs (or
from maps created by overalyfs), but as you wrote, this
test is not needed in drop_inode().

>
>  > }
>  >
>  > static int ovl_sync_fs(struct super_block *sb, int wait)
>  > {
>  >         struct ovl_fs *ofs = sb->s_fs_info;
>  >         struct super_block *upper_sb;
>  >         int ret;
>  >
>  >         if (!ovl_upper_mnt(ofs))
>  >                 return 0;
>  >
>  >         /*
>  >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>  >          * All the super blocks will be iterated, including upper_sb.
>  >          *
>  >          * If this is a syncfs(2) call, then we do need to call
>  >          * sync_filesystem() on upper_sb, but enough if we do it when being
>  >          * called with wait == 1.
>  >          */
>  >         if (!wait) {
>  >                 /* mark inodes on the suspect list dirty if thier
>  > upper inode is dirty */
>  >                 ovl_mark_suspect_list_inodes_dirty();
>  >                 return 0;
>  >         }
>  > ...
>  >
>
> Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
> on wait==1 just like my previous patch.
>

We don't have to do it in 2 rounds, but as long as we have a suspect list,
we can use it to start writeback on wait==0 on all dirty upper inodes from
our list just like the caller intended.

Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
I'm not sure. It feels like a good idea to use generic infrastructure as much as
possible. You should know better than me to answer this question.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  4:57       ` Al Viro
  2020-10-15 10:56         ` Chengguang Xu
@ 2020-10-16  7:09         ` Chengguang Xu
  2020-10-16  9:22           ` Jan Kara
  1 sibling, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16  7:09 UTC (permalink / raw)
  To: Al Viro, amir73il, jack; +Cc: miklos, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > 
 > >  > Who's responsible for removing the crap from notifier chain?  And how does
 > >  > that affect the lifetime of inode?
 > >  
 > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
 > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
 > > overlayfs inode holds a reference of upper inode that means upper inode will not be
 > > evicted while overlayfs inode is still alive.
 > 
 > Let me see if I've got it right:
 >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
 > vast majority of inodes) recepients
 >     * recepient pins the inode for as long as the recepient exists
 > 
 > That looks like a massive overkill, especially since all you are propagating is
 > dirtying the suckers.  All you really need is one bit in your inode + hash table
 > indexed by the address of struct inode (well, middle bits thereof, as usual).
 > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
 > to be called stored in that entry...
 > 

Hi AI, Jack, Amir

Based on your feedback, I would to change the inode dirty notification
something like below, is it acceptable? 


diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..48473d9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
        trace_writeback_mark_inode_dirty(inode, flags);
 
+       if (inode->state & I_OVL_INUSE) {
+               struct inode *ovl_inode;
+
+               ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);
+               if (ovl_inode)
+                       __mark_inode_dirty(ovl_inode, flags);
+       }
+
        /*
         * Don't do this for I_DIRTY_PAGES - that doesn't actually
         * dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..ed6c85e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -820,7 +820,7 @@ static struct inode *find_inode(struct super_block *sb,
 
 repeat:
        hlist_for_each_entry(inode, head, i_hash) {
-               if (inode->i_sb != sb)
+               if (sb && inode->i_sb != sb)
                        continue;
                if (!test(inode, data))
                        continue;


Thanks,
Chengguang

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  4:39                   ` Amir Goldstein
@ 2020-10-16  7:43                     ` Chengguang Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16  7:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

 ---- 在 星期五, 2020-10-16 12:39:10 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  > >  > >  >
 > >  > >  > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > >  > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > >  > >  > write, remove from suspect list on last release() or write_inode()
 > >  > >  > >  > when inode is no longer dirty and writable.
 > >  > >
 > >  > > I have to say inserting to suspect list cannot prevent dropping,
 > >  > > thinking of the problem of previous approach that we write dirty upper
 > >  > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
 > >  > >
 > >  >
 > >  > Sorry, I don't understand what that means.
 > >
 > > This is the main problem of my previous patch set V10, evicting clean inode
 > > expects no write behavior but in the case of dirty upper inode we have to
 > > write out dirty data in this timing otherwise we will lose the connection with upper inode.
 > >
 > 
 > My thinking was that the suspect list holds a reference to the overlay inode.
 > The question is can we always safely get rid of that reference and remove
 > from the suspect list when the inode is no longer "writable". Let's see...
 > 
 > >
 > >  >
 > >  > >
 > >  > >  > >  >
 > >  > >  > >  > Did I miss anything?
 > >  > >  > >  >
 > >  > >  > >
 > >  > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
 > >  > >  > > so in this case, re-dirty overlay inode in time becomes important.
 > >  > >  > >
 > >  > >  >
 > >  > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > >  > >  > to iterate the
 > >  > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > >  > >  >
 > >  > >
 > >  > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
 > >  > > Hold another reference during iput_final operation? in the drop_inode() or something
 > >  > > else?
 > >  >
 > >  > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
 > >
 > > It's not good,  it  only temporarily  skips eviction, the inode in lru list
 > > will be evicted in some cases like drop cache or memory reclaim. etc.
 > >
 > > A solution for this is getting another reference in ->drop_inode so that
 > > the inode can escape from adding to lru list but this looks awkward and tricky.
 > >
 > 
 > Right, that was nonsense. We need to rely on the reference held by the
 > suspect list.
 > 
 > >  >
 > >  > >
 > >  > >
 > >  > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > >  > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
 > >  > >  >
 > >  > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > >  > >  > inodes with no reference in cache.
 > >  > >  >
 > >  > >  > The best code would sync only upper fs inodes dirtied by this overlay
 > >  > >  > and will be able to evict overlay inodes whose upper inodes are clean.
 > >  > >  >
 > >  > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > >  > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
 > >  > >  > I think its a fine compromise considering the alternatives.
 > >  > >  >
 > >  > >  > Is this workable?
 > >  > >  >
 > >  > >
 > >  > > In your approach, the key point is how to prevent dropping overlay inode that has
 > >  > > dirty upper and no reference but I don't understand well how to achieve it from
 > >  > > your descriptions.
 > >  > >
 > >  > >
 > >  >
 > >  > Very well, I will try to explain with code:
 > >  >
 > >  > int ovl_inode_is_open_for_write(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        return upper_inode && inode_is_open_for_write(upper_inode);
 > >  > }
 > >  >
 > >  > void ovl_maybe_mark_inode_dirty(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 > >  >                 mark_inode_dirty(inode);
 > >  > }
 > >  >
 > >  > int ovl_open(struct inode *inode, struct file *file)
 > >  > {
 > >  > ...
 > >  >        if (ovl_inode_is_open_for_write(file_inode(file)))
 > >  >                ovl_add_inode_to_suspect_list(inode);
 > >  >
 > >  >         file->private_data = realfile;
 > >  >
 > >  >         return 0;
 > >  > }
 > >  >
 > >  > int ovl_release(struct inode *inode, struct file *file)
 > >  > {
 > >  >        struct inode *inode = file_inode(file);
 > >  >
 > >  >        if (ovl_inode_is_open_for_write(inode)) {
 > >  >                ovl_maybe_mark_inode_dirty(inode);
 > >  >                ovl_remove_inode_from_suspect_list(inode);
 > >
 > > I think in some cases removing from suspect_list will cause losing
 > > the connection with upper inode that has writable mmap.
 > >
 > 
 > First of all I had a bug here.
 > Need to check for !ovl_inode_is_open_for_write(inode) after fput().
 > 
 > If the upper inode has a writable mmap, the upper inode would still
 > be "writable" (i_writecount held by the map realfile reference).
 > 
 > So when closing the last overlay file reference while upper inode
 > writable maps still exist, the remaining issue is when to remove
 > the overlay inode from the suspect list and allow its eviction and
 > I did not mention that.
 > 
 > I *think* that this condition should be safe in the regard that
 > after the condition is met, there is no way to dirty the upper inode
 > via overlayfs without going through ovl_open().
 > Obviously, the test should be done with some list lock held.
 > 
 > bool ovl_may_remove_from_suspect_list(struct inode *inode)
 > {
 >         struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >         if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 >                 return false;
 > 
 >         return !inode_is_open_for_write(upper_inode);
 > }
 > 
 > Now remains the question of WHEN to check for removal
 > from the suspect list.
 > 
 > The first place is in ovl_sync_fs() when iterating the suspect list,
 > inodes that meet the above criteria are "indefinitely clean" and
 > may be removed from the list at that timing.
 > 
 > For eviction during memory pressure, overlay can register a
 > shrinker to do this garbage collection. Is shrinker being called
 > on drop_caches? I'm not sure. But we can also do periodic garbage
 > collection.
 > 
 > In the end, it is not the common case and we just need the garbage
 > collection to mitigate DoS (or existing workload) that does many:
 > - open
 > - mmap(...PROT_WRITE, MAP_SHARED...)
 > - close
 > - munmap
 > 

That right, as you mentioned above releasing timing is important and
that also means in worst case, overlayfs hold too much memory resource
until umount. Maybe more proactive shrinker like dedicated workqueue
thread of overlayfs instead of global shrinker will be more helpful.



 > 
 > >
 > >  >        }
 > >  >
 > >  >         fput(file->private_data);
 > >  >
 > >  >         return 0;
 > >  > }
 > >  >
 > >  > int ovl_drop_inode(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        if (!upper_inode)
 > >  >                return 1;
 > >  >        if (upper_inode->i_state & I_DIRTY_ALL)
 > >  >                return 0;
 > >  >
 > >  >        return !inode_is_open_for_write(upper_inode);
 > >
 > > Is this condition just for writable mmap?
 > >
 > 
 > No, it's for all inodes that may be written from overlayfs (or
 > from maps created by overalyfs), but as you wrote, this
 > test is not needed in drop_inode().
 > 
 > >
 > >  > }
 > >  >
 > >  > static int ovl_sync_fs(struct super_block *sb, int wait)
 > >  > {
 > >  >         struct ovl_fs *ofs = sb->s_fs_info;
 > >  >         struct super_block *upper_sb;
 > >  >         int ret;
 > >  >
 > >  >         if (!ovl_upper_mnt(ofs))
 > >  >                 return 0;
 > >  >
 > >  >         /*
 > >  >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 > >  >          * All the super blocks will be iterated, including upper_sb.
 > >  >          *
 > >  >          * If this is a syncfs(2) call, then we do need to call
 > >  >          * sync_filesystem() on upper_sb, but enough if we do it when being
 > >  >          * called with wait == 1.
 > >  >          */
 > >  >         if (!wait) {
 > >  >                 /* mark inodes on the suspect list dirty if thier
 > >  > upper inode is dirty */
 > >  >                 ovl_mark_suspect_list_inodes_dirty();
 > >  >                 return 0;
 > >  >         }
 > >  > ...
 > >  >
 > >
 > > Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
 > > on wait==1 just like my previous patch.
 > >
 > 
 > We don't have to do it in 2 rounds, but as long as we have a suspect list,
 > we can use it to start writeback on wait==0 on all dirty upper inodes from
 > our list just like the caller intended.
 > 
 > Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
 > I'm not sure. It feels like a good idea to use generic infrastructure as much as
 > possible. You should know better than me to answer this question.
 > 

IMO, if we maintain a special list like suspect-list to organize target overlay inodes,
then we don't have to mark overlay inode dirty, marking overlay inode dirty will
bring unnecessary complexity in this case and I think there is no special benefit.

So this approach may achieve containerized syncfs for overlayfs and I've submitted
patch set V12 follow this thinking but I think the complexity is still big obstacle to
merge to upstream from feedback of Miklos. The new approach I posted the other day
is really simple to understand compare to the previous solution, if we can optimize
dirtiness  notification part a bit more then I think it will be more acceptable.

In the end, I need more feedback and  If you guys all think the previous solution  is the right way, 
then I can do more work based on our discussion above and post V13 later.


Thanks,
Chengguang








 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  7:09         ` Chengguang Xu
@ 2020-10-16  9:22           ` Jan Kara
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2020-10-16  9:22 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Al Viro, amir73il, jack, miklos, linux-unionfs, linux-fsdevel

On Fri 16-10-20 15:09:38, Chengguang Xu wrote:
>  ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
>  > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
>  > >  > > Currently there is no notification api for kernel about modification
>  > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > >  > > will be very helpful to implement containerized syncfs functionality.
>  > >  > > As the first attempt, we introduce marking inode dirty notification so that
>  > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > >  > > overlay inode while syncfs.
>  > >  > 
>  > >  > Who's responsible for removing the crap from notifier chain?  And how does
>  > >  > that affect the lifetime of inode?
>  > >  
>  > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
>  > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
>  > > overlayfs inode holds a reference of upper inode that means upper inode will not be
>  > > evicted while overlayfs inode is still alive.
>  > 
>  > Let me see if I've got it right:
>  >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
>  > vast majority of inodes) recepients
>  >     * recepient pins the inode for as long as the recepient exists
>  > 
>  > That looks like a massive overkill, especially since all you are propagating is
>  > dirtying the suckers.  All you really need is one bit in your inode + hash table
>  > indexed by the address of struct inode (well, middle bits thereof, as usual).
>  > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
>  > to be called stored in that entry...
>  > 
> 
> Hi AI, Jack, Amir
> 
> Based on your feedback, I would to change the inode dirty notification
> something like below, is it acceptable? 
> 
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..48473d9 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  
>         trace_writeback_mark_inode_dirty(inode, flags);
>  
> +       if (inode->state & I_OVL_INUSE) {
> +               struct inode *ovl_inode;
> +
> +               ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);

I don't think this will work - superblock pointer is part of the hash value
inode is hashed with so without proper sb pointer you won't find proper
hash chain.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-10-16  9:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15   ` Jan Kara
2020-10-15  3:03     ` Chengguang Xu
2020-10-15  6:11       ` Amir Goldstein
2020-10-15 11:29         ` Chengguang Xu
2020-10-15 12:32           ` Amir Goldstein
2020-10-15 13:13             ` Chengguang Xu
2020-10-15 16:02               ` Amir Goldstein
2020-10-15 16:06                 ` Amir Goldstein
2020-10-16  1:56                 ` Chengguang Xu
2020-10-16  4:39                   ` Amir Goldstein
2020-10-16  7:43                     ` Chengguang Xu
2020-10-15  3:25   ` Al Viro
2020-10-15  3:42     ` Chengguang Xu
2020-10-15  4:57       ` Al Viro
2020-10-15 10:56         ` Chengguang Xu
2020-10-16  7:09         ` Chengguang Xu
2020-10-16  9:22           ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-11  1:16   ` kernel test robot
2020-10-11  1:55   ` kernel test robot
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
2020-10-11  2:57   ` kernel test robot
2020-10-11  3:10   ` kernel test robot
2020-10-11 13:08   ` kernel test robot
2020-10-12 12:31   ` Dan Carpenter
2020-10-12 12:31     ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.