linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] implement containerized syncfs for overlayfs
@ 2020-10-10 14:23 Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.

(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.

(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.

This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.

Chengguang Xu (5):
  fs: introduce notifier list for vfs inode
  fs: export symbol of writeback_single_inode()
  ovl: setup overlayfs' private bdi
  ovl: monitor marking dirty activity of underlying upper inode
  ovl: impement containerized syncfs for overlayfs

 fs/fs-writeback.c         |  7 ++++-
 fs/inode.c                |  5 ++++
 fs/overlayfs/inode.c      | 28 +++++++++++++++++++-
 fs/overlayfs/overlayfs.h  |  2 ++
 fs/overlayfs/ovl_entry.h  |  2 ++
 fs/overlayfs/super.c      | 65 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/overlayfs/util.c       | 33 ++++++++++++++++++++++++
 include/linux/fs.h        |  6 +++++
 include/linux/writeback.h |  1 +
 9 files changed, 143 insertions(+), 6 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-14 16:15   ` Jan Kara
  2020-10-15  3:25   ` Al Viro
  2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Currently there is no notification api for kernel about modification
of vfs inode, in some use cases like overlayfs, this kind of notification
will be very helpful to implement containerized syncfs functionality.
As the first attempt, we introduce marking inode dirty notification so that
overlay's inode could mark itself dirty as well and then only sync dirty
overlay inode while syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/fs-writeback.c  | 4 ++++
 fs/inode.c         | 5 +++++
 include/linux/fs.h | 6 ++++++
 3 files changed, 15 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..657cceb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 {
 	struct super_block *sb = inode->i_sb;
 	int dirtytime;
+	int copy_flags = flags;
 
 	trace_writeback_mark_inode_dirty(inode, flags);
 
+	blocking_notifier_call_chain(
+		&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		0, &copy_flags);
 	/*
 	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
 	 * dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..84e82db 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
  */
 void inode_init_once(struct inode *inode)
 {
+	int i;
+
 	memset(inode, 0, sizeof(*inode));
 	INIT_HLIST_NODE(&inode->i_hash);
 	INIT_LIST_HEAD(&inode->i_devices);
 	INIT_LIST_HEAD(&inode->i_io_list);
 	INIT_LIST_HEAD(&inode->i_wb_list);
 	INIT_LIST_HEAD(&inode->i_lru);
+	for (i = 0; i < INODE_MAX_NOTIFIER; i++)
+		BLOCKING_INIT_NOTIFIER_HEAD(
+			&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
 	__address_space_init_once(&inode->i_data);
 	i_size_ordered_init(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae0..42f0750 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
 
 struct fsnotify_mark_connector;
 
+enum {
+	MARK_INODE_DIRTY_NOTIFIER,
+	INODE_MAX_NOTIFIER,
+};
+
 /*
  * Keep mostly read-only and often accessed (especially for
  * the RCU path lookup and 'stat' data) fields at the beginning
@@ -723,6 +728,7 @@ struct inode {
 #endif
 
 	void			*i_private; /* fs or device private pointer */
+	struct	blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
 } __randomize_layout;
 
 struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 2/5] fs: export symbol of writeback_single_inode()
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Export symbol of writeback_single_inode() in order to call it
from overlayfs' ->writepages when sync single inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/fs-writeback.c         | 3 ++-
 include/linux/writeback.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 657cceb..4fed058 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1529,7 +1529,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
  * we go e.g. from filesystem. Flusher thread uses __writeback_single_inode()
  * and does more profound writeback list handling in writeback_sb_inodes().
  */
-static int writeback_single_inode(struct inode *inode,
+int writeback_single_inode(struct inode *inode,
 				  struct writeback_control *wbc)
 {
 	struct bdi_writeback *wb;
@@ -1585,6 +1585,7 @@ static int writeback_single_inode(struct inode *inode,
 	spin_unlock(&inode->i_lock);
 	return ret;
 }
+EXPORT_SYMBOL(writeback_single_inode);
 
 static long writeback_chunk_size(struct bdi_writeback *wb,
 				 struct wb_writeback_work *work)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 8e5c5bb..78e3b10 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -403,4 +403,5 @@ void tag_pages_for_writeback(struct address_space *mapping,
 void sb_mark_inode_writeback(struct inode *inode);
 void sb_clear_inode_writeback(struct inode *inode);
 
+int writeback_single_inode(struct inode *inode, struct writeback_control *wbc);
 #endif		/* WRITEBACK_H */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 3/5] ovl: setup overlayfs' private bdi
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  4 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Setup private bdi to collect overlayfs' dirty inodes.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/super.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983b..dc22725 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1863,6 +1863,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	int err;
 
 	sb->s_d_op = &ovl_dentry_operations;
+	err = super_setup_bdi(sb);
+	if (err)
+		goto out;
 
 	err = -ENOMEM;
 	ofs = kzalloc(sizeof(struct ovl_fs), GFP_KERNEL);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (2 preceding siblings ...)
  2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
  4 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Monitor marking dirty activity of underlying upper inode
so that we have chance to mark overlayfs' own inode dirty.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c     |  4 +++-
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/ovl_entry.h |  2 ++
 fs/overlayfs/super.c     | 11 +++++++++++
 fs/overlayfs/util.c      | 34 ++++++++++++++++++++++++++++++++++
 5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b584dca..e75c7ec 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -632,8 +632,10 @@ void ovl_inode_init(struct inode *inode, struct ovl_inode_params *oip,
 {
 	struct inode *realinode;
 
-	if (oip->upperdentry)
+	if (oip->upperdentry) {
 		OVL_I(inode)->__upperdentry = oip->upperdentry;
+		ovl_register_mark_dirty_notify(OVL_I(inode));
+	}
 	if (oip->lowerpath && oip->lowerpath->dentry)
 		OVL_I(inode)->lower = igrab(d_inode(oip->lowerpath->dentry));
 	if (oip->lowerdata)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7bce246..04ef778 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -247,6 +247,8 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
 }
 
 /* util.c */
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi);
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi);
 int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094..fce5314 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -129,6 +129,8 @@ struct ovl_inode {
 
 	/* synchronize copy up and more */
 	struct mutex lock;
+	/* moniter marking dirty behavior of upper inode */
+	struct notifier_block mark_dirty_nb;
 };
 
 static inline struct ovl_inode *OVL_I(struct inode *inode)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index dc22725..6d8f9da 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -390,11 +390,22 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
 	return ret;
 }
 
+void ovl_evict_inode(struct inode *inode)
+{
+	struct inode *upper;
+
+	upper = ovl_inode_upper(inode);
+	if (upper)
+		ovl_unregister_mark_dirty_notify(OVL_I(inode));
+	clear_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
 	.drop_inode	= generic_delete_inode,
+	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f4756..bdcfe55 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -417,6 +417,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
 		inode->i_private = upperinode;
 		__insert_inode_hash(inode, (unsigned long) upperinode);
 	}
+	ovl_register_mark_dirty_notify(OVL_I(inode));
 }
 
 static void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
@@ -950,3 +951,36 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	kfree(buf);
 	return ERR_PTR(res);
 }
+
+int ovl_inode_dirty_notify(struct notifier_block *nb,
+			   unsigned long dummy, void *param)
+{
+	struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
+	int *flags = (int *)param;
+
+	// add later
+	//__mark_inode_dirty(&oi->vfs_inode, *flags);
+	return NOTIFY_OK;
+}
+
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi)
+{
+	struct inode *upper;
+
+	upper = oi->__upperdentry->d_inode;
+	oi->mark_dirty_nb.notifier_call = ovl_inode_dirty_notify;
+
+	blocking_notifier_chain_register(
+		&upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		&oi->mark_dirty_nb);
+}
+
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi)
+{
+	struct inode *upper;
+
+	upper = oi->__upperdentry->d_inode;
+	blocking_notifier_chain_unregister(
+		&upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+		&oi->mark_dirty_nb);
+}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
  2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
                   ` (3 preceding siblings ...)
  2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
  4 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
  To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu

Mark overlayfs' inode dirty when underlying upper inode becomes dirty,
then only sync overlayfs' dirty inodes while syncfs.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/overlayfs/inode.c | 24 ++++++++++++++++++++++++
 fs/overlayfs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
 fs/overlayfs/util.c  |  3 +--
 3 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e75c7ec..c709aed 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
 #include <linux/posix_acl.h>
 #include <linux/ratelimit.h>
 #include <linux/fiemap.h>
+#include <linux/writeback.h>
 #include "overlayfs.h"
 
 
@@ -516,7 +517,30 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	.update_time	= ovl_update_time,
 };
 
+static int ovl_writepages(struct address_space *mapping, struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct inode *upper_inode = ovl_inode_upper(inode);
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct writeback_control upper_wbc = {
+		.nr_to_write		= LONG_MAX,
+		.sync_mode		= WB_SYNC_ALL,
+		.for_kupdate		= wbc->for_kupdate,
+		.for_background		= wbc->for_background,
+		.for_sync		= 0,
+		.range_cyclic		= wbc->range_cyclic,
+		.range_start		= 0,
+		.range_end		= LLONG_MAX,
+	};
+
+	if (!ovl_should_sync(ofs))
+		return 0;
+
+	return writeback_single_inode(upper_inode, &upper_wbc);
+}
+
 static const struct address_space_operations ovl_aops = {
+	.writepages             = ovl_writepages,
 	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
 	.direct_IO		= noop_direct_IO,
 };
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6d8f9da..da1b65a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,9 @@
 #include <linux/seq_file.h>
 #include <linux/posix_acl_xattr.h>
 #include <linux/exportfs.h>
+#include <linux/notifier.h>
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -279,9 +282,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 
 	upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
 
-	down_read(&upper_sb->s_umount);
-	ret = sync_filesystem(upper_sb);
-	up_read(&upper_sb->s_umount);
+	if (upper_sb->s_op->sync_fs) {
+		down_read(&upper_sb->s_umount);
+		ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+		if (!ret)
+			ret = sync_blockdev(upper_sb->s_bdev);
+		up_read(&upper_sb->s_umount);
+	}
 
 	return ret;
 }
@@ -400,11 +407,47 @@ void ovl_evict_inode(struct inode *inode)
 	clear_inode(inode);
 }
 
+int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+	struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
+	struct inode *upper_inode = ovl_inode_upper(inode);
+	struct writeback_control upper_wbc = {
+		.nr_to_write		= LONG_MAX,
+		.sync_mode              = WB_SYNC_ALL,
+		.tagged_writepages	= wbc->tagged_writepages,
+		.for_kupdate		= wbc->for_kupdate,
+		.for_background		= wbc->for_background,
+		.for_sync		= 0,
+		.range_cyclic		= wbc->range_cyclic,
+		.range_start		= wbc->range_start,
+		.range_end		= wbc->range_end,
+	};
+
+	if (!upper_sb->s_op->write_inode || !upper_inode)
+		return 0;
+	return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
+}
+
+int ovl_drop_inode(struct inode *inode)
+{
+	struct inode *upper_inode;
+
+	upper_inode = ovl_inode_upper(inode);
+	if (!upper_inode)
+		return 1;
+	if (!(upper_inode->i_state & I_DIRTY_ALL))
+		return 1;
+
+	return generic_drop_inode(inode);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.alloc_inode	= ovl_alloc_inode,
 	.free_inode	= ovl_free_inode,
 	.destroy_inode	= ovl_destroy_inode,
-	.drop_inode	= generic_delete_inode,
+	.drop_inode	= ovl_drop_inode,
+	.write_inode	= ovl_write_inode,
 	.evict_inode	= ovl_evict_inode,
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index bdcfe55..93d0493 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -958,8 +958,7 @@ int ovl_inode_dirty_notify(struct notifier_block *nb,
 	struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
 	int *flags = (int *)param;
 
-	// add later
-	//__mark_inode_dirty(&oi->vfs_inode, *flags);
+	__mark_inode_dirty(&oi->vfs_inode, *flags);
 	return NOTIFY_OK;
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-14 16:15   ` Jan Kara
  2020-10-15  3:03     ` Chengguang Xu
  2020-10-15  3:25   ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-10-14 16:15 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.
> 
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>

So I like how the patch set is elegant however growing struct inode for
everybody by struct blocking_notifier_head (which is rwsem + pointer) is
rather harsh just for this overlayfs functionality... Ideally this should
induce no overhead on struct inode if the filesystem is not covered by
overlayfs. So I'd rather place some external structure into the superblock
that would get allocated on the first use that would track dirty notifications
for each inode. I would not generalize the code for more possible
notifications at this point.

Also now that I'm thinking about it can there be multiple overlayfs inodes
for one upper inode? If not, then the mechanism of dirtiness propagation
could be much simpler - it seems we could be able to just lookup
corresponding overlayfs inode based on upper inode and then mark it dirty
(but this would need to be verified by people more familiar with
overlayfs). So all we'd need to know for this is the superblock of the
overlayfs that's covering given upper filesystem...

								Honza
> ---
>  fs/fs-writeback.c  | 4 ++++
>  fs/inode.c         | 5 +++++
>  include/linux/fs.h | 6 ++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..657cceb 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  {
>  	struct super_block *sb = inode->i_sb;
>  	int dirtytime;
> +	int copy_flags = flags;
>  
>  	trace_writeback_mark_inode_dirty(inode, flags);
>  
> +	blocking_notifier_call_chain(
> +		&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
> +		0, &copy_flags);
>  	/*
>  	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
>  	 * dirty the inode itself
> diff --git a/fs/inode.c b/fs/inode.c
> index 72c4c34..84e82db 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
>   */
>  void inode_init_once(struct inode *inode)
>  {
> +	int i;
> +
>  	memset(inode, 0, sizeof(*inode));
>  	INIT_HLIST_NODE(&inode->i_hash);
>  	INIT_LIST_HEAD(&inode->i_devices);
>  	INIT_LIST_HEAD(&inode->i_io_list);
>  	INIT_LIST_HEAD(&inode->i_wb_list);
>  	INIT_LIST_HEAD(&inode->i_lru);
> +	for (i = 0; i < INODE_MAX_NOTIFIER; i++)
> +		BLOCKING_INIT_NOTIFIER_HEAD(
> +			&inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
>  	__address_space_init_once(&inode->i_data);
>  	i_size_ordered_init(inode);
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7519ae0..42f0750 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
>  
>  struct fsnotify_mark_connector;
>  
> +enum {
> +	MARK_INODE_DIRTY_NOTIFIER,
> +	INODE_MAX_NOTIFIER,
> +};
> +
>  /*
>   * Keep mostly read-only and often accessed (especially for
>   * the RCU path lookup and 'stat' data) fields at the beginning
> @@ -723,6 +728,7 @@ struct inode {
>  #endif
>  
>  	void			*i_private; /* fs or device private pointer */
> +	struct	blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
>  } __randomize_layout;
>  
>  struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
> -- 
> 1.8.3.1
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-14 16:15   ` Jan Kara
@ 2020-10-15  3:03     ` Chengguang Xu
  2020-10-15  6:11       ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-15  3:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: miklos, amir73il, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
 > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
 > > Currently there is no notification api for kernel about modification
 > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > > will be very helpful to implement containerized syncfs functionality.
 > > As the first attempt, we introduce marking inode dirty notification so that
 > > overlay's inode could mark itself dirty as well and then only sync dirty
 > > overlay inode while syncfs.
 > > 
 > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > 
 > So I like how the patch set is elegant however growing struct inode for
 > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
 > rather harsh just for this overlayfs functionality... Ideally this should
 > induce no overhead on struct inode if the filesystem is not covered by
 > overlayfs. So I'd rather place some external structure into the superblock
 > that would get allocated on the first use that would track dirty notifications
 > for each inode. I would not generalize the code for more possible
 > notifications at this point.
 > 
 > Also now that I'm thinking about it can there be multiple overlayfs inodes
 > for one upper inode? If not, then the mechanism of dirtiness propagation

One upper inode only belongs to one overlayfs inode. However, in practice
one upper fs may contains hundreds or even thousands of overlayfs instances.

 > could be much simpler - it seems we could be able to just lookup
 > corresponding overlayfs inode based on upper inode and then mark it dirty
 > (but this would need to be verified by people more familiar with
 > overlayfs). So all we'd need to know for this is the superblock of the
 > overlayfs that's covering given upper filesystem...
 > 

So the difficulty is how we get overlayfs inode efficiently from upper inode,
it seems if we don't have additional info of upper inode to indicate which
overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
will be quite expensive and probably impact write performance. 

Is that possible we extend inotify infrastructure slightly to notify both
user space and kernel component?


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
  2020-10-14 16:15   ` Jan Kara
@ 2020-10-15  3:25   ` Al Viro
  2020-10-15  3:42     ` Chengguang Xu
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2020-10-15  3:25 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.

Who's responsible for removing the crap from notifier chain?  And how does
that affect the lifetime of inode?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:25   ` Al Viro
@ 2020-10-15  3:42     ` Chengguang Xu
  2020-10-15  4:57       ` Al Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-15  3:42 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > > Currently there is no notification api for kernel about modification
 > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > > will be very helpful to implement containerized syncfs functionality.
 > > As the first attempt, we introduce marking inode dirty notification so that
 > > overlay's inode could mark itself dirty as well and then only sync dirty
 > > overlay inode while syncfs.
 > 
 > Who's responsible for removing the crap from notifier chain?  And how does
 > that affect the lifetime of inode?
 
In this case, overlayfs unregisters call back from the notifier chain of upper inode
when evicting it's own  inode. It will not affect the lifetime of upper inode because
overlayfs inode holds a reference of upper inode that means upper inode will not be
evicted while overlayfs inode is still alive.


Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:42     ` Chengguang Xu
@ 2020-10-15  4:57       ` Al Viro
  2020-10-15 10:56         ` Chengguang Xu
  2020-10-16  7:09         ` Chengguang Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Al Viro @ 2020-10-15  4:57 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
>  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
>  > > Currently there is no notification api for kernel about modification
>  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > > will be very helpful to implement containerized syncfs functionality.
>  > > As the first attempt, we introduce marking inode dirty notification so that
>  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > > overlay inode while syncfs.
>  > 
>  > Who's responsible for removing the crap from notifier chain?  And how does
>  > that affect the lifetime of inode?
>  
> In this case, overlayfs unregisters call back from the notifier chain of upper inode
> when evicting it's own  inode. It will not affect the lifetime of upper inode because
> overlayfs inode holds a reference of upper inode that means upper inode will not be
> evicted while overlayfs inode is still alive.

Let me see if I've got it right:
	* your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
vast majority of inodes) recepients
	* recepient pins the inode for as long as the recepient exists

That looks like a massive overkill, especially since all you are propagating is
dirtying the suckers.  All you really need is one bit in your inode + hash table
indexed by the address of struct inode (well, middle bits thereof, as usual).
With entries embedded into overlayfs-private part of overlayfs inode.  And callback
to be called stored in that entry...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  3:03     ` Chengguang Xu
@ 2020-10-15  6:11       ` Amir Goldstein
  2020-10-15 11:29         ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-10-15  6:11 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
>  > > Currently there is no notification api for kernel about modification
>  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > > will be very helpful to implement containerized syncfs functionality.
>  > > As the first attempt, we introduce marking inode dirty notification so that
>  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > > overlay inode while syncfs.
>  > >
>  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  >
>  > So I like how the patch set is elegant however growing struct inode for
>  > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
>  > rather harsh just for this overlayfs functionality... Ideally this should
>  > induce no overhead on struct inode if the filesystem is not covered by
>  > overlayfs. So I'd rather place some external structure into the superblock
>  > that would get allocated on the first use that would track dirty notifications
>  > for each inode. I would not generalize the code for more possible
>  > notifications at this point.
>  >
>  > Also now that I'm thinking about it can there be multiple overlayfs inodes
>  > for one upper inode? If not, then the mechanism of dirtiness propagation
>
> One upper inode only belongs to one overlayfs inode. However, in practice
> one upper fs may contains hundreds or even thousands of overlayfs instances.
>
>  > could be much simpler - it seems we could be able to just lookup
>  > corresponding overlayfs inode based on upper inode and then mark it dirty
>  > (but this would need to be verified by people more familiar with
>  > overlayfs). So all we'd need to know for this is the superblock of the
>  > overlayfs that's covering given upper filesystem...
>  >
>
> So the difficulty is how we get overlayfs inode efficiently from upper inode,
> it seems if we don't have additional info of upper inode to indicate which
> overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
> will be quite expensive and probably impact write performance.
>
> Is that possible we extend inotify infrastructure slightly to notify both
> user space and kernel component?
>
>

When I first saw your suggestion, that is what I was thinking.
Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
in struct inode already exists.

But I have to admit this approach seems like a massive overkill to
what you need.

What you implemented, tracks upper inodes that could have
been dirtied under overlayfs, but what you really want is to
track is upper inodes that were dirtied *by* overlayfs.

And for that purpose, as Miklos said several times, it would be best
pursue the overlayfs aops approach, even though it may be much
harder..

Your earlier patches maintained a list of overlayfs to be synced inodes.
Remind me what was wrong with that approach?

Perhaps you can combine that with the shadow overlay sbi approach.
Instead of dirtying overlay inode when underlying is dirtied, you can
"pre-dirty" overlayfs inode in higher level file ops and add them to the
"maybe-dirty" list (e.g. after write).

ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
if the underlying inode is still dirty on the (!wait) pass.

As for memory mapped inodes via overlayfs (which can be dirtied without
notifying overlayfs) I am not sure that is a big problem in practice.

When an inode is writably mapped via ovarlayfs, you can flag the
overlay inode with "maybe-writably-mapped" and then remove
it from the maybe dirty list when the underlying inode is not dirty
AND its i_writecount is 0 (checked on write_inode() and release()).

Actually, there is no reason to treat writably mapped inodes and
other dirty inodes differently - insert to suspect list on open for
write, remove from suspect list on last release() or write_inode()
when inode is no longer dirty and writable.

Did I miss anything?

Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  4:57       ` Al Viro
@ 2020-10-15 10:56         ` Chengguang Xu
  2020-10-16  7:09         ` Chengguang Xu
  1 sibling, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-15 10:56 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel

 ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > 
 > >  > Who's responsible for removing the crap from notifier chain?  And how does
 > >  > that affect the lifetime of inode?
 > >  
 > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
 > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
 > > overlayfs inode holds a reference of upper inode that means upper inode will not be
 > > evicted while overlayfs inode is still alive.
 > 
 > Let me see if I've got it right:
 >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
 > vast majority of inodes) recepients
 >     * recepient pins the inode for as long as the recepient exists
 > 
 > That looks like a massive overkill, especially since all you are propagating is
 > dirtying the suckers.  All you really need is one bit in your inode + hash table
 > indexed by the address of struct inode (well, middle bits thereof, as usual).
 > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
 > to be called stored in that entry...
 > 


Yeah, actually  that could implement the logic but I'm afraid of performance degradation
caused by lock contention of hash table in concurrent file write because in practice we
build up hundreds of overlayfs instances on same underlying filesystem.

Thanks,
Chengguang


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  6:11       ` Amir Goldstein
@ 2020-10-15 11:29         ` Chengguang Xu
  2020-10-15 12:32           ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-15 11:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519

 ---- 在 星期四, 2020-10-15 14:11:04 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > >
 > >  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  >
 > >  > So I like how the patch set is elegant however growing struct inode for
 > >  > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
 > >  > rather harsh just for this overlayfs functionality... Ideally this should
 > >  > induce no overhead on struct inode if the filesystem is not covered by
 > >  > overlayfs. So I'd rather place some external structure into the superblock
 > >  > that would get allocated on the first use that would track dirty notifications
 > >  > for each inode. I would not generalize the code for more possible
 > >  > notifications at this point.
 > >  >
 > >  > Also now that I'm thinking about it can there be multiple overlayfs inodes
 > >  > for one upper inode? If not, then the mechanism of dirtiness propagation
 > >
 > > One upper inode only belongs to one overlayfs inode. However, in practice
 > > one upper fs may contains hundreds or even thousands of overlayfs instances.
 > >
 > >  > could be much simpler - it seems we could be able to just lookup
 > >  > corresponding overlayfs inode based on upper inode and then mark it dirty
 > >  > (but this would need to be verified by people more familiar with
 > >  > overlayfs). So all we'd need to know for this is the superblock of the
 > >  > overlayfs that's covering given upper filesystem...
 > >  >
 > >
 > > So the difficulty is how we get overlayfs inode efficiently from upper inode,
 > > it seems if we don't have additional info of upper inode to indicate which
 > > overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
 > > will be quite expensive and probably impact write performance.
 > >
 > > Is that possible we extend inotify infrastructure slightly to notify both
 > > user space and kernel component?
 > >
 > >
 > 
 > When I first saw your suggestion, that is what I was thinking.
 > Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
 > in struct inode already exists.
 > 
 > But I have to admit this approach seems like a massive overkill to
 > what you need.
 > 
 > What you implemented, tracks upper inodes that could have
 > been dirtied under overlayfs, but what you really want is to
 > track is upper inodes that were dirtied *by* overlayfs.
 > 
 > And for that purpose, as Miklos said several times, it would be best
 > pursue the overlayfs aops approach, even though it may be much
 > harder..
 > 

IIUC, that solution was raised to solve mmap rw-ro issue.
That maybe is an ultimate goal and I'm wondering whether we must
implement that if we have easier approach to solve mmap and syncfs
issues.

 > Your earlier patches maintained a list of overlayfs to be synced inodes.
 > Remind me what was wrong with that approach?
 > 

I think the main concerns are the complexity and the timing of releasing
ovl_sync_entry  struct.


 > Perhaps you can combine that with the shadow overlay sbi approach.
 > Instead of dirtying overlay inode when underlying is dirtied, you can
 > "pre-dirty" overlayfs inode in higher level file ops and add them to the
 > "maybe-dirty" list (e.g. after write).
 > 

Main problem is we can't be notified by set_page_dirty from writable mmap.
Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
inode and clear it after syncing, then overlay inode could be release at any time,
so in the end, maybe overlay inode is released but upper inode is still dirty and
there is no any pointer to find upper dirty inode out.


 > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
 > if the underlying inode is still dirty on the (!wait) pass.
 > 
 > As for memory mapped inodes via overlayfs (which can be dirtied without
 > notifying overlayfs) I am not sure that is a big problem in practice.
 > 

Yes, it's key problem here.

 > When an inode is writably mapped via ovarlayfs, you can flag the
 > overlay inode with "maybe-writably-mapped" and then remove
 > it from the maybe dirty list when the underlying inode is not dirty
 > AND its i_writecount is 0 (checked on write_inode() and release()).
 > 
 > Actually, there is no reason to treat writably mapped inodes and
 > other dirty inodes differently - insert to suspect list on open for
 > write, remove from suspect list on last release() or write_inode()
 > when inode is no longer dirty and writable.
 > 
 > Did I miss anything?
 > 

If we dirty overlay inode that means we have launched writeback mechanism,
so in this case, re-dirty overlay inode in time becomes important. 



Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 11:29         ` Chengguang Xu
@ 2020-10-15 12:32           ` Amir Goldstein
  2020-10-15 13:13             ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-10-15 12:32 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

>  > Perhaps you can combine that with the shadow overlay sbi approach.
>  > Instead of dirtying overlay inode when underlying is dirtied, you can
>  > "pre-dirty" overlayfs inode in higher level file ops and add them to the
>  > "maybe-dirty" list (e.g. after write).
>  >
>
> Main problem is we can't be notified by set_page_dirty from writable mmap.
> Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
> inode and clear it after syncing, then overlay inode could be release at any time,
> so in the end, maybe overlay inode is released but upper inode is still dirty and
> there is no any pointer to find upper dirty inode out.
>

But we can control whether overlay inode is release with ovl_drop_inode()
right? Can we prevent dropping overlay inode if upper inode is
inode_is_open_for_write()?

About re-dirty, see below...

>
>  > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
>  > if the underlying inode is still dirty on the (!wait) pass.
>  >
>  > As for memory mapped inodes via overlayfs (which can be dirtied without
>  > notifying overlayfs) I am not sure that is a big problem in practice.
>  >
>
> Yes, it's key problem here.
>
>  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > overlay inode with "maybe-writably-mapped" and then remove
>  > it from the maybe dirty list when the underlying inode is not dirty
>  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  >
>  > Actually, there is no reason to treat writably mapped inodes and
>  > other dirty inodes differently - insert to suspect list on open for
>  > write, remove from suspect list on last release() or write_inode()
>  > when inode is no longer dirty and writable.
>  >
>  > Did I miss anything?
>  >
>
> If we dirty overlay inode that means we have launched writeback mechanism,
> so in this case, re-dirty overlay inode in time becomes important.
>

My idea was to use the first call to ovl_sync_fs() with 'wait' false
to iterate the
maybe-dirty list and re-dirty overlay inodes whose upper is dirty.

Then in the second call to __sync_filesystem, sync_inodes_sb() will take
care of calling ovl_write_inode() for all the re-dirty inodes.

In current code we sync ALL dirty upper fs inodes and we do not overlay
inodes with no reference in cache.

The best code would sync only upper fs inodes dirtied by this overlay
and will be able to evict overlay inodes whose upper inodes are clean.

The compromise code would sync only upper fs inodes dirtied by this overlay,
and would not evict overlay inodes as long as upper inodes are "open for write".
I think its a fine compromise considering the alternatives.

Is this workable?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 12:32           ` Amir Goldstein
@ 2020-10-15 13:13             ` Chengguang Xu
  2020-10-15 16:02               ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-15 13:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519

 ---- 在 星期四, 2020-10-15 20:32:24 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > Perhaps you can combine that with the shadow overlay sbi approach.
 > >  > Instead of dirtying overlay inode when underlying is dirtied, you can
 > >  > "pre-dirty" overlayfs inode in higher level file ops and add them to the
 > >  > "maybe-dirty" list (e.g. after write).
 > >  >
 > >
 > > Main problem is we can't be notified by set_page_dirty from writable mmap.
 > > Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
 > > inode and clear it after syncing, then overlay inode could be release at any time,
 > > so in the end, maybe overlay inode is released but upper inode is still dirty and
 > > there is no any pointer to find upper dirty inode out.
 > >
 > 
 > But we can control whether overlay inode is release with ovl_drop_inode()
 > right? Can we prevent dropping overlay inode if upper inode is
 > inode_is_open_for_write()?
 > 
 > About re-dirty, see below...
 > 
 > >
 > >  > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
 > >  > if the underlying inode is still dirty on the (!wait) pass.
 > >  >
 > >  > As for memory mapped inodes via overlayfs (which can be dirtied without
 > >  > notifying overlayfs) I am not sure that is a big problem in practice.
 > >  >
 > >
 > > Yes, it's key problem here.
 > >
 > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  >
 > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > write, remove from suspect list on last release() or write_inode()
 > >  > when inode is no longer dirty and writable.

I have to say inserting to suspect list cannot prevent dropping,
thinking of the problem of previous approach that we write dirty upper
inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.


 > >  >
 > >  > Did I miss anything?
 > >  >
 > >
 > > If we dirty overlay inode that means we have launched writeback mechanism,
 > > so in this case, re-dirty overlay inode in time becomes important.
 > >
 > 
 > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > to iterate the
 > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > 

I'm curious how we prevent dropping of clean overlay inode with dirty upper?
Hold another reference during iput_final operation? in the drop_inode() or something
else?


 > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > care of calling ovl_write_inode() for all the re-dirty inodes.
 > 
 > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > inodes with no reference in cache.
 > 
 > The best code would sync only upper fs inodes dirtied by this overlay
 > and will be able to evict overlay inodes whose upper inodes are clean.
 > 
 > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > and would not evict overlay inodes as long as upper inodes are "open for write".
 > I think its a fine compromise considering the alternatives.
 > 
 > Is this workable?
 > 

In your approach, the key point is how to prevent dropping overlay inode that has
dirty upper and no reference but I don't understand well how to achieve it from
your descriptions.


Thanks,
Chengguang




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 13:13             ` Chengguang Xu
@ 2020-10-15 16:02               ` Amir Goldstein
  2020-10-15 16:06                 ` Amir Goldstein
  2020-10-16  1:56                 ` Chengguang Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:02 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

>  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > >  > overlay inode with "maybe-writably-mapped" and then remove
>  > >  > it from the maybe dirty list when the underlying inode is not dirty
>  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  > >  >
>  > >  > Actually, there is no reason to treat writably mapped inodes and
>  > >  > other dirty inodes differently - insert to suspect list on open for
>  > >  > write, remove from suspect list on last release() or write_inode()
>  > >  > when inode is no longer dirty and writable.
>
> I have to say inserting to suspect list cannot prevent dropping,
> thinking of the problem of previous approach that we write dirty upper
> inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>

Sorry, I don't understand what that means.

>
>  > >  >
>  > >  > Did I miss anything?
>  > >  >
>  > >
>  > > If we dirty overlay inode that means we have launched writeback mechanism,
>  > > so in this case, re-dirty overlay inode in time becomes important.
>  > >
>  >
>  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
>  > to iterate the
>  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>  >
>
> I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> Hold another reference during iput_final operation? in the drop_inode() or something
> else?

No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

>
>
>  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
>  > care of calling ovl_write_inode() for all the re-dirty inodes.
>  >
>  > In current code we sync ALL dirty upper fs inodes and we do not overlay
>  > inodes with no reference in cache.
>  >
>  > The best code would sync only upper fs inodes dirtied by this overlay
>  > and will be able to evict overlay inodes whose upper inodes are clean.
>  >
>  > The compromise code would sync only upper fs inodes dirtied by this overlay,
>  > and would not evict overlay inodes as long as upper inodes are "open for write".
>  > I think its a fine compromise considering the alternatives.
>  >
>  > Is this workable?
>  >
>
> In your approach, the key point is how to prevent dropping overlay inode that has
> dirty upper and no reference but I don't understand well how to achieve it from
> your descriptions.
>
>

Very well, I will try to explain with code:

int ovl_inode_is_open_for_write(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       return upper_inode && inode_is_open_for_write(upper_inode);
}

void ovl_maybe_mark_inode_dirty(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
                mark_inode_dirty(inode);
}

int ovl_open(struct inode *inode, struct file *file)
{
...
       if (ovl_inode_is_open_for_write(file_inode(file)))
               ovl_add_inode_to_suspect_list(inode);

        file->private_data = realfile;

        return 0;
}

int ovl_release(struct inode *inode, struct file *file)
{
       struct inode *inode = file_inode(file);

       if (ovl_inode_is_open_for_write(inode)) {
               ovl_maybe_mark_inode_dirty(inode);
               ovl_remove_inode_from_suspect_list(inode);
       }

        fput(file->private_data);

        return 0;
}

int ovl_drop_inode(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (!upper_inode)
               return 1;
       if (upper_inode->i_state & I_DIRTY_ALL)
               return 0;

       return !inode_is_open_for_write(upper_inode);
}

static int ovl_sync_fs(struct super_block *sb, int wait)
{
        struct ovl_fs *ofs = sb->s_fs_info;
        struct super_block *upper_sb;
        int ret;

        if (!ovl_upper_mnt(ofs))
                return 0;

        /*
         * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
         * All the super blocks will be iterated, including upper_sb.
         *
         * If this is a syncfs(2) call, then we do need to call
         * sync_filesystem() on upper_sb, but enough if we do it when being
         * called with wait == 1.
         */
        if (!wait) {
                /* mark inodes on the suspect list dirty if thier
upper inode is dirty */
                ovl_mark_suspect_list_inodes_dirty();
                return 0;
        }
...


The races are avoided because inode is added/removed from suspect
list while overlay inode has a reference (from file) and because upper inode
cannot be dirtied by overlayfs when overlay inode is not on the suspect list.

Unless I am missing something.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 16:02               ` Amir Goldstein
@ 2020-10-15 16:06                 ` Amir Goldstein
  2020-10-16  1:56                 ` Chengguang Xu
  1 sibling, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:06 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

> Very well, I will try to explain with code:
>
> int ovl_inode_is_open_for_write(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        return upper_inode && inode_is_open_for_write(upper_inode);
> }
>
> void ovl_maybe_mark_inode_dirty(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
>                 mark_inode_dirty(inode);
> }
>
> int ovl_open(struct inode *inode, struct file *file)
> {
> ...
>        if (ovl_inode_is_open_for_write(file_inode(file)))
>                ovl_add_inode_to_suspect_list(inode);
>
>         file->private_data = realfile;
>
>         return 0;
> }
>
> int ovl_release(struct inode *inode, struct file *file)
> {
>        struct inode *inode = file_inode(file);
>
>        if (ovl_inode_is_open_for_write(inode)) {

Darn! that should have been (!ovl_inode_is_open_for_write(inode))
and it should be done after fput() and grabbing inode reference with
ihold() before fput().

>                ovl_maybe_mark_inode_dirty(inode);
>                ovl_remove_inode_from_suspect_list(inode);
>        }
>
>         fput(file->private_data);
>
>         return 0;
> }
>
> int ovl_drop_inode(struct inode *inode)
> {
>        struct inode *upper_inode = ovl_inode_upper(inode);
>
>        if (!upper_inode)
>                return 1;
>        if (upper_inode->i_state & I_DIRTY_ALL)
>                return 0;
>
>        return !inode_is_open_for_write(upper_inode);
> }
>
> static int ovl_sync_fs(struct super_block *sb, int wait)
> {
>         struct ovl_fs *ofs = sb->s_fs_info;
>         struct super_block *upper_sb;
>         int ret;
>
>         if (!ovl_upper_mnt(ofs))
>                 return 0;
>
>         /*
>          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>          * All the super blocks will be iterated, including upper_sb.
>          *
>          * If this is a syncfs(2) call, then we do need to call
>          * sync_filesystem() on upper_sb, but enough if we do it when being
>          * called with wait == 1.
>          */
>         if (!wait) {
>                 /* mark inodes on the suspect list dirty if thier
> upper inode is dirty */
>                 ovl_mark_suspect_list_inodes_dirty();
>                 return 0;
>         }
> ...
>
>
> The races are avoided because inode is added/removed from suspect
> list while overlay inode has a reference (from file) and because upper inode
> cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
>
> Unless I am missing something.
>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15 16:02               ` Amir Goldstein
  2020-10-15 16:06                 ` Amir Goldstein
@ 2020-10-16  1:56                 ` Chengguang Xu
  2020-10-16  4:39                   ` Amir Goldstein
  1 sibling, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-16  1:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

 ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  > >  >
 > >  > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > >  > write, remove from suspect list on last release() or write_inode()
 > >  > >  > when inode is no longer dirty and writable.
 > >
 > > I have to say inserting to suspect list cannot prevent dropping,
 > > thinking of the problem of previous approach that we write dirty upper
 > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
 > >
 > 
 > Sorry, I don't understand what that means.

This is the main problem of my previous patch set V10, evicting clean inode
expects no write behavior but in the case of dirty upper inode we have to
write out dirty data in this timing otherwise we will lose the connection with upper inode.


 > 
 > >
 > >  > >  >
 > >  > >  > Did I miss anything?
 > >  > >  >
 > >  > >
 > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
 > >  > > so in this case, re-dirty overlay inode in time becomes important.
 > >  > >
 > >  >
 > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > >  > to iterate the
 > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > >  >
 > >
 > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
 > > Hold another reference during iput_final operation? in the drop_inode() or something
 > > else?
 > 
 > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

It's not good,  it  only temporarily  skips eviction, the inode in lru list
will be evicted in some cases like drop cache or memory reclaim. etc.

A solution for this is getting another reference in ->drop_inode so that
the inode can escape from adding to lru list but this looks awkward and tricky.

 > 
 > >
 > >
 > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
 > >  >
 > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > >  > inodes with no reference in cache.
 > >  >
 > >  > The best code would sync only upper fs inodes dirtied by this overlay
 > >  > and will be able to evict overlay inodes whose upper inodes are clean.
 > >  >
 > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
 > >  > I think its a fine compromise considering the alternatives.
 > >  >
 > >  > Is this workable?
 > >  >
 > >
 > > In your approach, the key point is how to prevent dropping overlay inode that has
 > > dirty upper and no reference but I don't understand well how to achieve it from
 > > your descriptions.
 > >
 > >
 > 
 > Very well, I will try to explain with code:
 > 
 > int ovl_inode_is_open_for_write(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        return upper_inode && inode_is_open_for_write(upper_inode);
 > }
 > 
 > void ovl_maybe_mark_inode_dirty(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 >                 mark_inode_dirty(inode);
 > }
 > 
 > int ovl_open(struct inode *inode, struct file *file)
 > {
 > ...
 >        if (ovl_inode_is_open_for_write(file_inode(file)))
 >                ovl_add_inode_to_suspect_list(inode);
 > 
 >         file->private_data = realfile;
 > 
 >         return 0;
 > }
 > 
 > int ovl_release(struct inode *inode, struct file *file)
 > {
 >        struct inode *inode = file_inode(file);
 > 
 >        if (ovl_inode_is_open_for_write(inode)) {
 >                ovl_maybe_mark_inode_dirty(inode);
 >                ovl_remove_inode_from_suspect_list(inode);

I think in some cases removing from suspect_list will cause losing
the connection with upper inode that has writable mmap.


 >        }
 > 
 >         fput(file->private_data);
 > 
 >         return 0;
 > }
 > 
 > int ovl_drop_inode(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (!upper_inode)
 >                return 1;
 >        if (upper_inode->i_state & I_DIRTY_ALL)
 >                return 0;
 > 
 >        return !inode_is_open_for_write(upper_inode);

Is this condition just for writable mmap?


 > }
 > 
 > static int ovl_sync_fs(struct super_block *sb, int wait)
 > {
 >         struct ovl_fs *ofs = sb->s_fs_info;
 >         struct super_block *upper_sb;
 >         int ret;
 > 
 >         if (!ovl_upper_mnt(ofs))
 >                 return 0;
 > 
 >         /*
 >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 >          * All the super blocks will be iterated, including upper_sb.
 >          *
 >          * If this is a syncfs(2) call, then we do need to call
 >          * sync_filesystem() on upper_sb, but enough if we do it when being
 >          * called with wait == 1.
 >          */
 >         if (!wait) {
 >                 /* mark inodes on the suspect list dirty if thier
 > upper inode is dirty */
 >                 ovl_mark_suspect_list_inodes_dirty();
 >                 return 0;
 >         }
 > ...
 > 

Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
on wait==1 just like my previous patch.


 > 
 > The races are avoided because inode is added/removed from suspect
 > list while overlay inode has a reference (from file) and because upper inode
 > cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
 > 
 > Unless I am missing something.
 > 

Thanks,
Chengguang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  1:56                 ` Chengguang Xu
@ 2020-10-16  4:39                   ` Amir Goldstein
  2020-10-16  7:43                     ` Chengguang Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2020-10-16  4:39 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
>  > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
>  > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  > >  > >  >
>  > >  > >  > Actually, there is no reason to treat writably mapped inodes and
>  > >  > >  > other dirty inodes differently - insert to suspect list on open for
>  > >  > >  > write, remove from suspect list on last release() or write_inode()
>  > >  > >  > when inode is no longer dirty and writable.
>  > >
>  > > I have to say inserting to suspect list cannot prevent dropping,
>  > > thinking of the problem of previous approach that we write dirty upper
>  > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>  > >
>  >
>  > Sorry, I don't understand what that means.
>
> This is the main problem of my previous patch set V10, evicting clean inode
> expects no write behavior but in the case of dirty upper inode we have to
> write out dirty data in this timing otherwise we will lose the connection with upper inode.
>

My thinking was that the suspect list holds a reference to the overlay inode.
The question is can we always safely get rid of that reference and remove
from the suspect list when the inode is no longer "writable". Let's see...

>
>  >
>  > >
>  > >  > >  >
>  > >  > >  > Did I miss anything?
>  > >  > >  >
>  > >  > >
>  > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
>  > >  > > so in this case, re-dirty overlay inode in time becomes important.
>  > >  > >
>  > >  >
>  > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
>  > >  > to iterate the
>  > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>  > >  >
>  > >
>  > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
>  > > Hold another reference during iput_final operation? in the drop_inode() or something
>  > > else?
>  >
>  > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
>
> It's not good,  it  only temporarily  skips eviction, the inode in lru list
> will be evicted in some cases like drop cache or memory reclaim. etc.
>
> A solution for this is getting another reference in ->drop_inode so that
> the inode can escape from adding to lru list but this looks awkward and tricky.
>

Right, that was nonsense. We need to rely on the reference held by the
suspect list.

>  >
>  > >
>  > >
>  > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
>  > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
>  > >  >
>  > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
>  > >  > inodes with no reference in cache.
>  > >  >
>  > >  > The best code would sync only upper fs inodes dirtied by this overlay
>  > >  > and will be able to evict overlay inodes whose upper inodes are clean.
>  > >  >
>  > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
>  > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
>  > >  > I think its a fine compromise considering the alternatives.
>  > >  >
>  > >  > Is this workable?
>  > >  >
>  > >
>  > > In your approach, the key point is how to prevent dropping overlay inode that has
>  > > dirty upper and no reference but I don't understand well how to achieve it from
>  > > your descriptions.
>  > >
>  > >
>  >
>  > Very well, I will try to explain with code:
>  >
>  > int ovl_inode_is_open_for_write(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        return upper_inode && inode_is_open_for_write(upper_inode);
>  > }
>  >
>  > void ovl_maybe_mark_inode_dirty(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
>  >                 mark_inode_dirty(inode);
>  > }
>  >
>  > int ovl_open(struct inode *inode, struct file *file)
>  > {
>  > ...
>  >        if (ovl_inode_is_open_for_write(file_inode(file)))
>  >                ovl_add_inode_to_suspect_list(inode);
>  >
>  >         file->private_data = realfile;
>  >
>  >         return 0;
>  > }
>  >
>  > int ovl_release(struct inode *inode, struct file *file)
>  > {
>  >        struct inode *inode = file_inode(file);
>  >
>  >        if (ovl_inode_is_open_for_write(inode)) {
>  >                ovl_maybe_mark_inode_dirty(inode);
>  >                ovl_remove_inode_from_suspect_list(inode);
>
> I think in some cases removing from suspect_list will cause losing
> the connection with upper inode that has writable mmap.
>

First of all I had a bug here.
Need to check for !ovl_inode_is_open_for_write(inode) after fput().

If the upper inode has a writable mmap, the upper inode would still
be "writable" (i_writecount held by the map realfile reference).

So when closing the last overlay file reference while upper inode
writable maps still exist, the remaining issue is when to remove
the overlay inode from the suspect list and allow its eviction and
I did not mention that.

I *think* that this condition should be safe in the regard that
after the condition is met, there is no way to dirty the upper inode
via overlayfs without going through ovl_open().
Obviously, the test should be done with some list lock held.

bool ovl_may_remove_from_suspect_list(struct inode *inode)
{
        struct inode *upper_inode = ovl_inode_upper(inode);

        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
                return false;

        return !inode_is_open_for_write(upper_inode);
}

Now remains the question of WHEN to check for removal
from the suspect list.

The first place is in ovl_sync_fs() when iterating the suspect list,
inodes that meet the above criteria are "indefinitely clean" and
may be removed from the list at that timing.

For eviction during memory pressure, overlay can register a
shrinker to do this garbage collection. Is shrinker being called
on drop_caches? I'm not sure. But we can also do periodic garbage
collection.

In the end, it is not the common case and we just need the garbage
collection to mitigate DoS (or existing workload) that does many:
- open
- mmap(...PROT_WRITE, MAP_SHARED...)
- close
- munmap


>
>  >        }
>  >
>  >         fput(file->private_data);
>  >
>  >         return 0;
>  > }
>  >
>  > int ovl_drop_inode(struct inode *inode)
>  > {
>  >        struct inode *upper_inode = ovl_inode_upper(inode);
>  >
>  >        if (!upper_inode)
>  >                return 1;
>  >        if (upper_inode->i_state & I_DIRTY_ALL)
>  >                return 0;
>  >
>  >        return !inode_is_open_for_write(upper_inode);
>
> Is this condition just for writable mmap?
>

No, it's for all inodes that may be written from overlayfs (or
from maps created by overalyfs), but as you wrote, this
test is not needed in drop_inode().

>
>  > }
>  >
>  > static int ovl_sync_fs(struct super_block *sb, int wait)
>  > {
>  >         struct ovl_fs *ofs = sb->s_fs_info;
>  >         struct super_block *upper_sb;
>  >         int ret;
>  >
>  >         if (!ovl_upper_mnt(ofs))
>  >                 return 0;
>  >
>  >         /*
>  >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>  >          * All the super blocks will be iterated, including upper_sb.
>  >          *
>  >          * If this is a syncfs(2) call, then we do need to call
>  >          * sync_filesystem() on upper_sb, but enough if we do it when being
>  >          * called with wait == 1.
>  >          */
>  >         if (!wait) {
>  >                 /* mark inodes on the suspect list dirty if thier
>  > upper inode is dirty */
>  >                 ovl_mark_suspect_list_inodes_dirty();
>  >                 return 0;
>  >         }
>  > ...
>  >
>
> Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
> on wait==1 just like my previous patch.
>

We don't have to do it in 2 rounds, but as long as we have a suspect list,
we can use it to start writeback on wait==0 on all dirty upper inodes from
our list just like the caller intended.

Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
I'm not sure. It feels like a good idea to use generic infrastructure as much as
possible. You should know better than me to answer this question.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-15  4:57       ` Al Viro
  2020-10-15 10:56         ` Chengguang Xu
@ 2020-10-16  7:09         ` Chengguang Xu
  2020-10-16  9:22           ` Jan Kara
  1 sibling, 1 reply; 23+ messages in thread
From: Chengguang Xu @ 2020-10-16  7:09 UTC (permalink / raw)
  To: Al Viro, amir73il, jack; +Cc: miklos, linux-unionfs, linux-fsdevel, cgxu519

 ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
 > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
 > >  > > Currently there is no notification api for kernel about modification
 > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
 > >  > > will be very helpful to implement containerized syncfs functionality.
 > >  > > As the first attempt, we introduce marking inode dirty notification so that
 > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
 > >  > > overlay inode while syncfs.
 > >  > 
 > >  > Who's responsible for removing the crap from notifier chain?  And how does
 > >  > that affect the lifetime of inode?
 > >  
 > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
 > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
 > > overlayfs inode holds a reference of upper inode that means upper inode will not be
 > > evicted while overlayfs inode is still alive.
 > 
 > Let me see if I've got it right:
 >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
 > vast majority of inodes) recepients
 >     * recepient pins the inode for as long as the recepient exists
 > 
 > That looks like a massive overkill, especially since all you are propagating is
 > dirtying the suckers.  All you really need is one bit in your inode + hash table
 > indexed by the address of struct inode (well, middle bits thereof, as usual).
 > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
 > to be called stored in that entry...
 > 

Hi AI, Jack, Amir

Based on your feedback, I would to change the inode dirty notification
something like below, is it acceptable? 


diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..48473d9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
        trace_writeback_mark_inode_dirty(inode, flags);
 
+       if (inode->state & I_OVL_INUSE) {
+               struct inode *ovl_inode;
+
+               ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);
+               if (ovl_inode)
+                       __mark_inode_dirty(ovl_inode, flags);
+       }
+
        /*
         * Don't do this for I_DIRTY_PAGES - that doesn't actually
         * dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..ed6c85e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -820,7 +820,7 @@ static struct inode *find_inode(struct super_block *sb,
 
 repeat:
        hlist_for_each_entry(inode, head, i_hash) {
-               if (inode->i_sb != sb)
+               if (sb && inode->i_sb != sb)
                        continue;
                if (!test(inode, data))
                        continue;


Thanks,
Chengguang

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  4:39                   ` Amir Goldstein
@ 2020-10-16  7:43                     ` Chengguang Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Chengguang Xu @ 2020-10-16  7:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro

 ---- 在 星期五, 2020-10-16 12:39:10 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > >  ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  > >  > >  >
 > >  > >  > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > >  > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > >  > >  > write, remove from suspect list on last release() or write_inode()
 > >  > >  > >  > when inode is no longer dirty and writable.
 > >  > >
 > >  > > I have to say inserting to suspect list cannot prevent dropping,
 > >  > > thinking of the problem of previous approach that we write dirty upper
 > >  > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
 > >  > >
 > >  >
 > >  > Sorry, I don't understand what that means.
 > >
 > > This is the main problem of my previous patch set V10, evicting clean inode
 > > expects no write behavior but in the case of dirty upper inode we have to
 > > write out dirty data in this timing otherwise we will lose the connection with upper inode.
 > >
 > 
 > My thinking was that the suspect list holds a reference to the overlay inode.
 > The question is can we always safely get rid of that reference and remove
 > from the suspect list when the inode is no longer "writable". Let's see...
 > 
 > >
 > >  >
 > >  > >
 > >  > >  > >  >
 > >  > >  > >  > Did I miss anything?
 > >  > >  > >  >
 > >  > >  > >
 > >  > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
 > >  > >  > > so in this case, re-dirty overlay inode in time becomes important.
 > >  > >  > >
 > >  > >  >
 > >  > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > >  > >  > to iterate the
 > >  > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > >  > >  >
 > >  > >
 > >  > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
 > >  > > Hold another reference during iput_final operation? in the drop_inode() or something
 > >  > > else?
 > >  >
 > >  > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
 > >
 > > It's not good,  it  only temporarily  skips eviction, the inode in lru list
 > > will be evicted in some cases like drop cache or memory reclaim. etc.
 > >
 > > A solution for this is getting another reference in ->drop_inode so that
 > > the inode can escape from adding to lru list but this looks awkward and tricky.
 > >
 > 
 > Right, that was nonsense. We need to rely on the reference held by the
 > suspect list.
 > 
 > >  >
 > >  > >
 > >  > >
 > >  > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > >  > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
 > >  > >  >
 > >  > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > >  > >  > inodes with no reference in cache.
 > >  > >  >
 > >  > >  > The best code would sync only upper fs inodes dirtied by this overlay
 > >  > >  > and will be able to evict overlay inodes whose upper inodes are clean.
 > >  > >  >
 > >  > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > >  > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
 > >  > >  > I think its a fine compromise considering the alternatives.
 > >  > >  >
 > >  > >  > Is this workable?
 > >  > >  >
 > >  > >
 > >  > > In your approach, the key point is how to prevent dropping overlay inode that has
 > >  > > dirty upper and no reference but I don't understand well how to achieve it from
 > >  > > your descriptions.
 > >  > >
 > >  > >
 > >  >
 > >  > Very well, I will try to explain with code:
 > >  >
 > >  > int ovl_inode_is_open_for_write(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        return upper_inode && inode_is_open_for_write(upper_inode);
 > >  > }
 > >  >
 > >  > void ovl_maybe_mark_inode_dirty(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 > >  >                 mark_inode_dirty(inode);
 > >  > }
 > >  >
 > >  > int ovl_open(struct inode *inode, struct file *file)
 > >  > {
 > >  > ...
 > >  >        if (ovl_inode_is_open_for_write(file_inode(file)))
 > >  >                ovl_add_inode_to_suspect_list(inode);
 > >  >
 > >  >         file->private_data = realfile;
 > >  >
 > >  >         return 0;
 > >  > }
 > >  >
 > >  > int ovl_release(struct inode *inode, struct file *file)
 > >  > {
 > >  >        struct inode *inode = file_inode(file);
 > >  >
 > >  >        if (ovl_inode_is_open_for_write(inode)) {
 > >  >                ovl_maybe_mark_inode_dirty(inode);
 > >  >                ovl_remove_inode_from_suspect_list(inode);
 > >
 > > I think in some cases removing from suspect_list will cause losing
 > > the connection with upper inode that has writable mmap.
 > >
 > 
 > First of all I had a bug here.
 > Need to check for !ovl_inode_is_open_for_write(inode) after fput().
 > 
 > If the upper inode has a writable mmap, the upper inode would still
 > be "writable" (i_writecount held by the map realfile reference).
 > 
 > So when closing the last overlay file reference while upper inode
 > writable maps still exist, the remaining issue is when to remove
 > the overlay inode from the suspect list and allow its eviction and
 > I did not mention that.
 > 
 > I *think* that this condition should be safe in the regard that
 > after the condition is met, there is no way to dirty the upper inode
 > via overlayfs without going through ovl_open().
 > Obviously, the test should be done with some list lock held.
 > 
 > bool ovl_may_remove_from_suspect_list(struct inode *inode)
 > {
 >         struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >         if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 >                 return false;
 > 
 >         return !inode_is_open_for_write(upper_inode);
 > }
 > 
 > Now remains the question of WHEN to check for removal
 > from the suspect list.
 > 
 > The first place is in ovl_sync_fs() when iterating the suspect list,
 > inodes that meet the above criteria are "indefinitely clean" and
 > may be removed from the list at that timing.
 > 
 > For eviction during memory pressure, overlay can register a
 > shrinker to do this garbage collection. Is shrinker being called
 > on drop_caches? I'm not sure. But we can also do periodic garbage
 > collection.
 > 
 > In the end, it is not the common case and we just need the garbage
 > collection to mitigate DoS (or existing workload) that does many:
 > - open
 > - mmap(...PROT_WRITE, MAP_SHARED...)
 > - close
 > - munmap
 > 

That right, as you mentioned above releasing timing is important and
that also means in worst case, overlayfs hold too much memory resource
until umount. Maybe more proactive shrinker like dedicated workqueue
thread of overlayfs instead of global shrinker will be more helpful.



 > 
 > >
 > >  >        }
 > >  >
 > >  >         fput(file->private_data);
 > >  >
 > >  >         return 0;
 > >  > }
 > >  >
 > >  > int ovl_drop_inode(struct inode *inode)
 > >  > {
 > >  >        struct inode *upper_inode = ovl_inode_upper(inode);
 > >  >
 > >  >        if (!upper_inode)
 > >  >                return 1;
 > >  >        if (upper_inode->i_state & I_DIRTY_ALL)
 > >  >                return 0;
 > >  >
 > >  >        return !inode_is_open_for_write(upper_inode);
 > >
 > > Is this condition just for writable mmap?
 > >
 > 
 > No, it's for all inodes that may be written from overlayfs (or
 > from maps created by overalyfs), but as you wrote, this
 > test is not needed in drop_inode().
 > 
 > >
 > >  > }
 > >  >
 > >  > static int ovl_sync_fs(struct super_block *sb, int wait)
 > >  > {
 > >  >         struct ovl_fs *ofs = sb->s_fs_info;
 > >  >         struct super_block *upper_sb;
 > >  >         int ret;
 > >  >
 > >  >         if (!ovl_upper_mnt(ofs))
 > >  >                 return 0;
 > >  >
 > >  >         /*
 > >  >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 > >  >          * All the super blocks will be iterated, including upper_sb.
 > >  >          *
 > >  >          * If this is a syncfs(2) call, then we do need to call
 > >  >          * sync_filesystem() on upper_sb, but enough if we do it when being
 > >  >          * called with wait == 1.
 > >  >          */
 > >  >         if (!wait) {
 > >  >                 /* mark inodes on the suspect list dirty if thier
 > >  > upper inode is dirty */
 > >  >                 ovl_mark_suspect_list_inodes_dirty();
 > >  >                 return 0;
 > >  >         }
 > >  > ...
 > >  >
 > >
 > > Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
 > > on wait==1 just like my previous patch.
 > >
 > 
 > We don't have to do it in 2 rounds, but as long as we have a suspect list,
 > we can use it to start writeback on wait==0 on all dirty upper inodes from
 > our list just like the caller intended.
 > 
 > Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
 > I'm not sure. It feels like a good idea to use generic infrastructure as much as
 > possible. You should know better than me to answer this question.
 > 

IMO, if we maintain a special list like suspect-list to organize target overlay inodes,
then we don't have to mark overlay inode dirty, marking overlay inode dirty will
bring unnecessary complexity in this case and I think there is no special benefit.

So this approach may achieve containerized syncfs for overlayfs and I've submitted
patch set V12 follow this thinking but I think the complexity is still big obstacle to
merge to upstream from feedback of Miklos. The new approach I posted the other day
is really simple to understand compare to the previous solution, if we can optimize
dirtiness  notification part a bit more then I think it will be more acceptable.

In the end, I need more feedback and  If you guys all think the previous solution  is the right way, 
then I can do more work based on our discussion above and post V13 later.


Thanks,
Chengguang








 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
  2020-10-16  7:09         ` Chengguang Xu
@ 2020-10-16  9:22           ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2020-10-16  9:22 UTC (permalink / raw)
  To: Chengguang Xu
  Cc: Al Viro, amir73il, jack, miklos, linux-unionfs, linux-fsdevel

On Fri 16-10-20 15:09:38, Chengguang Xu wrote:
>  ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
>  > >  ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
>  > >  > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
>  > >  > > Currently there is no notification api for kernel about modification
>  > >  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > >  > > will be very helpful to implement containerized syncfs functionality.
>  > >  > > As the first attempt, we introduce marking inode dirty notification so that
>  > >  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > >  > > overlay inode while syncfs.
>  > >  > 
>  > >  > Who's responsible for removing the crap from notifier chain?  And how does
>  > >  > that affect the lifetime of inode?
>  > >  
>  > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
>  > > when evicting it's own  inode. It will not affect the lifetime of upper inode because
>  > > overlayfs inode holds a reference of upper inode that means upper inode will not be
>  > > evicted while overlayfs inode is still alive.
>  > 
>  > Let me see if I've got it right:
>  >     * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
>  > vast majority of inodes) recepients
>  >     * recepient pins the inode for as long as the recepient exists
>  > 
>  > That looks like a massive overkill, especially since all you are propagating is
>  > dirtying the suckers.  All you really need is one bit in your inode + hash table
>  > indexed by the address of struct inode (well, middle bits thereof, as usual).
>  > With entries embedded into overlayfs-private part of overlayfs inode.  And callback
>  > to be called stored in that entry...
>  > 
> 
> Hi AI, Jack, Amir
> 
> Based on your feedback, I would to change the inode dirty notification
> something like below, is it acceptable? 
> 
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..48473d9 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  
>         trace_writeback_mark_inode_dirty(inode, flags);
>  
> +       if (inode->state & I_OVL_INUSE) {
> +               struct inode *ovl_inode;
> +
> +               ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);

I don't think this will work - superblock pointer is part of the hash value
inode is hashed with so without proper sb pointer you won't find proper
hash chain.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-10-16  9:22 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15   ` Jan Kara
2020-10-15  3:03     ` Chengguang Xu
2020-10-15  6:11       ` Amir Goldstein
2020-10-15 11:29         ` Chengguang Xu
2020-10-15 12:32           ` Amir Goldstein
2020-10-15 13:13             ` Chengguang Xu
2020-10-15 16:02               ` Amir Goldstein
2020-10-15 16:06                 ` Amir Goldstein
2020-10-16  1:56                 ` Chengguang Xu
2020-10-16  4:39                   ` Amir Goldstein
2020-10-16  7:43                     ` Chengguang Xu
2020-10-15  3:25   ` Al Viro
2020-10-15  3:42     ` Chengguang Xu
2020-10-15  4:57       ` Al Viro
2020-10-15 10:56         ` Chengguang Xu
2020-10-16  7:09         ` Chengguang Xu
2020-10-16  9:22           ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).