* [RFC PATCH 0/5] implement containerized syncfs for overlayfs
@ 2020-10-10 14:23 Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
` (4 more replies)
0 siblings, 5 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
on upper_sb to synchronize whole dirty inodes in upper filesystem
regardless of the overlay ownership of the inode. In the use case of
container, when multiple containers using the same underlying upper
filesystem, it has some shortcomings as below.
(1) Performance
Synchronization is probably heavy because it actually syncs unnecessary
inodes for target overlayfs.
(2) Interference
Unplanned synchronization will probably impact IO performance of
unrelated container processes on the other overlayfs.
This series try to implement containerized syncfs for overlayfs so that
only sync target dirty upper inodes which are belong to specific overlayfs
instance. By doing this, it is able to reduce cost of synchronization and
will not seriously impact IO performance of unrelated processes.
Chengguang Xu (5):
fs: introduce notifier list for vfs inode
fs: export symbol of writeback_single_inode()
ovl: setup overlayfs' private bdi
ovl: monitor marking dirty activity of underlying upper inode
ovl: impement containerized syncfs for overlayfs
fs/fs-writeback.c | 7 ++++-
fs/inode.c | 5 ++++
fs/overlayfs/inode.c | 28 +++++++++++++++++++-
fs/overlayfs/overlayfs.h | 2 ++
fs/overlayfs/ovl_entry.h | 2 ++
fs/overlayfs/super.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---
fs/overlayfs/util.c | 33 ++++++++++++++++++++++++
include/linux/fs.h | 6 +++++
include/linux/writeback.h | 1 +
9 files changed, 143 insertions(+), 6 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
2020-10-14 16:15 ` Jan Kara
2020-10-15 3:25 ` Al Viro
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
` (3 subsequent siblings)
4 siblings, 2 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Currently there is no notification api for kernel about modification
of vfs inode, in some use cases like overlayfs, this kind of notification
will be very helpful to implement containerized syncfs functionality.
As the first attempt, we introduce marking inode dirty notification so that
overlay's inode could mark itself dirty as well and then only sync dirty
overlay inode while syncfs.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
fs/fs-writeback.c | 4 ++++
fs/inode.c | 5 +++++
include/linux/fs.h | 6 ++++++
3 files changed, 15 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..657cceb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
{
struct super_block *sb = inode->i_sb;
int dirtytime;
+ int copy_flags = flags;
trace_writeback_mark_inode_dirty(inode, flags);
+ blocking_notifier_call_chain(
+ &inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+ 0, ©_flags);
/*
* Don't do this for I_DIRTY_PAGES - that doesn't actually
* dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..84e82db 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
*/
void inode_init_once(struct inode *inode)
{
+ int i;
+
memset(inode, 0, sizeof(*inode));
INIT_HLIST_NODE(&inode->i_hash);
INIT_LIST_HEAD(&inode->i_devices);
INIT_LIST_HEAD(&inode->i_io_list);
INIT_LIST_HEAD(&inode->i_wb_list);
INIT_LIST_HEAD(&inode->i_lru);
+ for (i = 0; i < INODE_MAX_NOTIFIER; i++)
+ BLOCKING_INIT_NOTIFIER_HEAD(
+ &inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
__address_space_init_once(&inode->i_data);
i_size_ordered_init(inode);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae0..42f0750 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
struct fsnotify_mark_connector;
+enum {
+ MARK_INODE_DIRTY_NOTIFIER,
+ INODE_MAX_NOTIFIER,
+};
+
/*
* Keep mostly read-only and often accessed (especially for
* the RCU path lookup and 'stat' data) fields at the beginning
@@ -723,6 +728,7 @@ struct inode {
#endif
void *i_private; /* fs or device private pointer */
+ struct blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
} __randomize_layout;
struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 2/5] fs: export symbol of writeback_single_inode()
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
` (2 subsequent siblings)
4 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Export symbol of writeback_single_inode() in order to call it
from overlayfs' ->writepages when sync single inode.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
fs/fs-writeback.c | 3 ++-
include/linux/writeback.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 657cceb..4fed058 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1529,7 +1529,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
* we go e.g. from filesystem. Flusher thread uses __writeback_single_inode()
* and does more profound writeback list handling in writeback_sb_inodes().
*/
-static int writeback_single_inode(struct inode *inode,
+int writeback_single_inode(struct inode *inode,
struct writeback_control *wbc)
{
struct bdi_writeback *wb;
@@ -1585,6 +1585,7 @@ static int writeback_single_inode(struct inode *inode,
spin_unlock(&inode->i_lock);
return ret;
}
+EXPORT_SYMBOL(writeback_single_inode);
static long writeback_chunk_size(struct bdi_writeback *wb,
struct wb_writeback_work *work)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 8e5c5bb..78e3b10 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -403,4 +403,5 @@ void tag_pages_for_writeback(struct address_space *mapping,
void sb_mark_inode_writeback(struct inode *inode);
void sb_clear_inode_writeback(struct inode *inode);
+int writeback_single_inode(struct inode *inode, struct writeback_control *wbc);
#endif /* WRITEBACK_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 3/5] ovl: setup overlayfs' private bdi
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
4 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Setup private bdi to collect overlayfs' dirty inodes.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
fs/overlayfs/super.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983b..dc22725 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1863,6 +1863,9 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
int err;
sb->s_d_op = &ovl_dentry_operations;
+ err = super_setup_bdi(sb);
+ if (err)
+ goto out;
err = -ENOMEM;
ofs = kzalloc(sizeof(struct ovl_fs), GFP_KERNEL);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
` (2 preceding siblings ...)
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
2020-10-11 1:16 ` kernel test robot
2020-10-11 1:55 ` kernel test robot
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
4 siblings, 2 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Monitor marking dirty activity of underlying upper inode
so that we have chance to mark overlayfs' own inode dirty.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
fs/overlayfs/inode.c | 4 +++-
fs/overlayfs/overlayfs.h | 2 ++
fs/overlayfs/ovl_entry.h | 2 ++
fs/overlayfs/super.c | 11 +++++++++++
fs/overlayfs/util.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b584dca..e75c7ec 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -632,8 +632,10 @@ void ovl_inode_init(struct inode *inode, struct ovl_inode_params *oip,
{
struct inode *realinode;
- if (oip->upperdentry)
+ if (oip->upperdentry) {
OVL_I(inode)->__upperdentry = oip->upperdentry;
+ ovl_register_mark_dirty_notify(OVL_I(inode));
+ }
if (oip->lowerpath && oip->lowerpath->dentry)
OVL_I(inode)->lower = igrab(d_inode(oip->lowerpath->dentry));
if (oip->lowerdata)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 7bce246..04ef778 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -247,6 +247,8 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
}
/* util.c */
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi);
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi);
int ovl_want_write(struct dentry *dentry);
void ovl_drop_write(struct dentry *dentry);
struct dentry *ovl_workdir(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094..fce5314 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -129,6 +129,8 @@ struct ovl_inode {
/* synchronize copy up and more */
struct mutex lock;
+ /* moniter marking dirty behavior of upper inode */
+ struct notifier_block mark_dirty_nb;
};
static inline struct ovl_inode *OVL_I(struct inode *inode)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index dc22725..6d8f9da 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -390,11 +390,22 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data)
return ret;
}
+void ovl_evict_inode(struct inode *inode)
+{
+ struct inode *upper;
+
+ upper = ovl_inode_upper(inode);
+ if (upper)
+ ovl_unregister_mark_dirty_notify(OVL_I(inode));
+ clear_inode(inode);
+}
+
static const struct super_operations ovl_super_operations = {
.alloc_inode = ovl_alloc_inode,
.free_inode = ovl_free_inode,
.destroy_inode = ovl_destroy_inode,
.drop_inode = generic_delete_inode,
+ .evict_inode = ovl_evict_inode,
.put_super = ovl_put_super,
.sync_fs = ovl_sync_fs,
.statfs = ovl_statfs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f4756..bdcfe55 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -417,6 +417,7 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
inode->i_private = upperinode;
__insert_inode_hash(inode, (unsigned long) upperinode);
}
+ ovl_register_mark_dirty_notify(OVL_I(inode));
}
static void ovl_dentry_version_inc(struct dentry *dentry, bool impurity)
@@ -950,3 +951,36 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
kfree(buf);
return ERR_PTR(res);
}
+
+int ovl_inode_dirty_notify(struct notifier_block *nb,
+ unsigned long dummy, void *param)
+{
+ struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
+ int *flags = (int *)param;
+
+ // add later
+ //__mark_inode_dirty(&oi->vfs_inode, *flags);
+ return NOTIFY_OK;
+}
+
+void ovl_register_mark_dirty_notify(struct ovl_inode *oi)
+{
+ struct inode *upper;
+
+ upper = oi->__upperdentry->d_inode;
+ oi->mark_dirty_nb.notifier_call = ovl_inode_dirty_notify;
+
+ blocking_notifier_chain_register(
+ &upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+ &oi->mark_dirty_nb);
+}
+
+void ovl_unregister_mark_dirty_notify(struct ovl_inode *oi)
+{
+ struct inode *upper;
+
+ upper = oi->__upperdentry->d_inode;
+ blocking_notifier_chain_unregister(
+ &upper->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
+ &oi->mark_dirty_nb);
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
` (3 preceding siblings ...)
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
@ 2020-10-10 14:23 ` Chengguang Xu
2020-10-11 2:57 ` kernel test robot
` (3 more replies)
4 siblings, 4 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-10 14:23 UTC (permalink / raw)
To: miklos, amir73il, jack; +Cc: linux-unionfs, linux-fsdevel, Chengguang Xu
Mark overlayfs' inode dirty when underlying upper inode becomes dirty,
then only sync overlayfs' dirty inodes while syncfs.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
fs/overlayfs/inode.c | 24 ++++++++++++++++++++++++
fs/overlayfs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
fs/overlayfs/util.c | 3 +--
3 files changed, 72 insertions(+), 6 deletions(-)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index e75c7ec..c709aed 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
#include <linux/posix_acl.h>
#include <linux/ratelimit.h>
#include <linux/fiemap.h>
+#include <linux/writeback.h>
#include "overlayfs.h"
@@ -516,7 +517,30 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
.update_time = ovl_update_time,
};
+static int ovl_writepages(struct address_space *mapping, struct writeback_control *wbc)
+{
+ struct inode *inode = mapping->host;
+ struct inode *upper_inode = ovl_inode_upper(inode);
+ struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+ struct writeback_control upper_wbc = {
+ .nr_to_write = LONG_MAX,
+ .sync_mode = WB_SYNC_ALL,
+ .for_kupdate = wbc->for_kupdate,
+ .for_background = wbc->for_background,
+ .for_sync = 0,
+ .range_cyclic = wbc->range_cyclic,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ };
+
+ if (!ovl_should_sync(ofs))
+ return 0;
+
+ return writeback_single_inode(upper_inode, &upper_wbc);
+}
+
static const struct address_space_operations ovl_aops = {
+ .writepages = ovl_writepages,
/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
.direct_IO = noop_direct_IO,
};
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6d8f9da..da1b65a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,6 +15,9 @@
#include <linux/seq_file.h>
#include <linux/posix_acl_xattr.h>
#include <linux/exportfs.h>
+#include <linux/notifier.h>
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
#include "overlayfs.h"
MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@@ -279,9 +282,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
- down_read(&upper_sb->s_umount);
- ret = sync_filesystem(upper_sb);
- up_read(&upper_sb->s_umount);
+ if (upper_sb->s_op->sync_fs) {
+ down_read(&upper_sb->s_umount);
+ ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+ if (!ret)
+ ret = sync_blockdev(upper_sb->s_bdev);
+ up_read(&upper_sb->s_umount);
+ }
return ret;
}
@@ -400,11 +407,47 @@ void ovl_evict_inode(struct inode *inode)
clear_inode(inode);
}
+int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ struct ovl_fs *ofs = inode->i_sb->s_fs_info;
+ struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
+ struct inode *upper_inode = ovl_inode_upper(inode);
+ struct writeback_control upper_wbc = {
+ .nr_to_write = LONG_MAX,
+ .sync_mode = WB_SYNC_ALL,
+ .tagged_writepages = wbc->tagged_writepages,
+ .for_kupdate = wbc->for_kupdate,
+ .for_background = wbc->for_background,
+ .for_sync = 0,
+ .range_cyclic = wbc->range_cyclic,
+ .range_start = wbc->range_start,
+ .range_end = wbc->range_end,
+ };
+
+ if (!upper_sb->s_op->write_inode || !upper_inode)
+ return 0;
+ return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
+}
+
+int ovl_drop_inode(struct inode *inode)
+{
+ struct inode *upper_inode;
+
+ upper_inode = ovl_inode_upper(inode);
+ if (!upper_inode)
+ return 1;
+ if (!(upper_inode->i_state & I_DIRTY_ALL))
+ return 1;
+
+ return generic_drop_inode(inode);
+}
+
static const struct super_operations ovl_super_operations = {
.alloc_inode = ovl_alloc_inode,
.free_inode = ovl_free_inode,
.destroy_inode = ovl_destroy_inode,
- .drop_inode = generic_delete_inode,
+ .drop_inode = ovl_drop_inode,
+ .write_inode = ovl_write_inode,
.evict_inode = ovl_evict_inode,
.put_super = ovl_put_super,
.sync_fs = ovl_sync_fs,
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index bdcfe55..93d0493 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -958,8 +958,7 @@ int ovl_inode_dirty_notify(struct notifier_block *nb,
struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
int *flags = (int *)param;
- // add later
- //__mark_inode_dirty(&oi->vfs_inode, *flags);
+ __mark_inode_dirty(&oi->vfs_inode, *flags);
return NOTIFY_OK;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
@ 2020-10-11 1:16 ` kernel test robot
2020-10-11 1:55 ` kernel test robot
1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 1:16 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 2789 bytes --]
Hi Chengguang,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/6a0553975f03bb63033f5c246273010a38a66ccf
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
git checkout 6a0553975f03bb63033f5c246273010a38a66ccf
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/overlayfs/super.c:393:6: warning: no previous prototype for 'ovl_evict_inode' [-Wmissing-prototypes]
393 | void ovl_evict_inode(struct inode *inode)
| ^~~~~~~~~~~~~~~
--
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
955 | int ovl_inode_dirty_notify(struct notifier_block *nb,
| ^~~~~~~~~~~~~~~~~~~~~~
fs/overlayfs/util.c: In function 'ovl_inode_dirty_notify':
fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
959 | int *flags = (int *)param;
| ^~~~~
fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
958 | struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
| ^~
vim +/ovl_evict_inode +393 fs/overlayfs/super.c
392
> 393 void ovl_evict_inode(struct inode *inode)
394 {
395 struct inode *upper;
396
397 upper = ovl_inode_upper(inode);
398 if (upper)
399 ovl_unregister_mark_dirty_notify(OVL_I(inode));
400 clear_inode(inode);
401 }
402
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65077 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-11 1:16 ` kernel test robot
@ 2020-10-11 1:55 ` kernel test robot
1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 1:55 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]
Hi Chengguang,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: arm-randconfig-r034-20201011 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project a8682554c6662ce01853d0069afb6c5b4ef8ab55)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/6a0553975f03bb63033f5c246273010a38a66ccf
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
git checkout 6a0553975f03bb63033f5c246273010a38a66ccf
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/overlayfs/super.c:393:6: warning: no previous prototype for function 'ovl_evict_inode' [-Wmissing-prototypes]
void ovl_evict_inode(struct inode *inode)
^
fs/overlayfs/super.c:393:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void ovl_evict_inode(struct inode *inode)
^
static
1 warning generated.
--
fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
int *flags = (int *)param;
^
fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
^
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for function 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
int ovl_inode_dirty_notify(struct notifier_block *nb,
^
fs/overlayfs/util.c:955:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int ovl_inode_dirty_notify(struct notifier_block *nb,
^
static
3 warnings generated.
--
fs/overlayfs/util.c:958:20: warning: unused variable 'oi' [-Wunused-variable]
struct ovl_inode *oi = container_of(nb, struct ovl_inode, mark_dirty_nb);
^
fs/overlayfs/util.c:959:7: warning: unused variable 'flags' [-Wunused-variable]
int *flags = (int *)param;
^
>> fs/overlayfs/util.c:955:5: warning: no previous prototype for function 'ovl_inode_dirty_notify' [-Wmissing-prototypes]
int ovl_inode_dirty_notify(struct notifier_block *nb,
^
fs/overlayfs/util.c:955:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int ovl_inode_dirty_notify(struct notifier_block *nb,
^
static
3 warnings generated.
vim +/ovl_evict_inode +393 fs/overlayfs/super.c
392
> 393 void ovl_evict_inode(struct inode *inode)
394 {
395 struct inode *upper;
396
397 upper = ovl_inode_upper(inode);
398 if (upper)
399 ovl_unregister_mark_dirty_notify(OVL_I(inode));
400 clear_inode(inode);
401 }
402
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33109 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-11 2:57 ` kernel test robot
2020-10-11 3:10 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 2:57 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 9960 bytes --]
Hi Chengguang,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: arm-randconfig-r034-20201011 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project a8682554c6662ce01853d0069afb6c5b4ef8ab55)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/12d938a37e0a31d43b15e07aef1cd821bf11dc0e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
git checkout 12d938a37e0a31d43b15e07aef1cd821bf11dc0e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/overlayfs/super.c:285:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (upper_sb->s_op->sync_fs) {
^~~~~~~~~~~~~~~~~~~~~~~
fs/overlayfs/super.c:293:9: note: uninitialized use occurs here
return ret;
^~~
fs/overlayfs/super.c:285:2: note: remove the 'if' if its condition is always true
if (upper_sb->s_op->sync_fs) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/overlayfs/super.c:265:9: note: initialize the variable 'ret' to silence this warning
int ret;
^
= 0
fs/overlayfs/super.c:400:6: warning: no previous prototype for function 'ovl_evict_inode' [-Wmissing-prototypes]
void ovl_evict_inode(struct inode *inode)
^
fs/overlayfs/super.c:400:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void ovl_evict_inode(struct inode *inode)
^
static
>> fs/overlayfs/super.c:410:5: warning: no previous prototype for function 'ovl_write_inode' [-Wmissing-prototypes]
int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
^
fs/overlayfs/super.c:410:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
^
static
>> fs/overlayfs/super.c:432:5: warning: no previous prototype for function 'ovl_drop_inode' [-Wmissing-prototypes]
int ovl_drop_inode(struct inode *inode)
^
fs/overlayfs/super.c:432:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int ovl_drop_inode(struct inode *inode)
^
static
4 warnings generated.
vim +285 fs/overlayfs/super.c
259
260 /* Sync real dirty inodes in upper filesystem (if it exists) */
261 static int ovl_sync_fs(struct super_block *sb, int wait)
262 {
263 struct ovl_fs *ofs = sb->s_fs_info;
264 struct super_block *upper_sb;
265 int ret;
266
267 if (!ovl_upper_mnt(ofs))
268 return 0;
269
270 if (!ovl_should_sync(ofs))
271 return 0;
272 /*
273 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
274 * All the super blocks will be iterated, including upper_sb.
275 *
276 * If this is a syncfs(2) call, then we do need to call
277 * sync_filesystem() on upper_sb, but enough if we do it when being
278 * called with wait == 1.
279 */
280 if (!wait)
281 return 0;
282
283 upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
284
> 285 if (upper_sb->s_op->sync_fs) {
286 down_read(&upper_sb->s_umount);
287 ret = upper_sb->s_op->sync_fs(upper_sb, wait);
288 if (!ret)
289 ret = sync_blockdev(upper_sb->s_bdev);
290 up_read(&upper_sb->s_umount);
291 }
292
293 return ret;
294 }
295
296 /**
297 * ovl_statfs
298 * @sb: The overlayfs super block
299 * @buf: The struct kstatfs to fill in with stats
300 *
301 * Get the filesystem statistics. As writes always target the upper layer
302 * filesystem pass the statfs to the upper filesystem (if it exists)
303 */
304 static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
305 {
306 struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
307 struct dentry *root_dentry = dentry->d_sb->s_root;
308 struct path path;
309 int err;
310
311 ovl_path_real(root_dentry, &path);
312
313 err = vfs_statfs(&path, buf);
314 if (!err) {
315 buf->f_namelen = ofs->namelen;
316 buf->f_type = OVERLAYFS_SUPER_MAGIC;
317 }
318
319 return err;
320 }
321
322 /* Will this overlay be forced to mount/remount ro? */
323 static bool ovl_force_readonly(struct ovl_fs *ofs)
324 {
325 return (!ovl_upper_mnt(ofs) || !ofs->workdir);
326 }
327
328 static const char *ovl_redirect_mode_def(void)
329 {
330 return ovl_redirect_dir_def ? "on" : "off";
331 }
332
333 static const char * const ovl_xino_str[] = {
334 "off",
335 "auto",
336 "on",
337 };
338
339 static inline int ovl_xino_def(void)
340 {
341 return ovl_xino_auto_def ? OVL_XINO_AUTO : OVL_XINO_OFF;
342 }
343
344 /**
345 * ovl_show_options
346 *
347 * Prints the mount options for a given superblock.
348 * Returns zero; does not fail.
349 */
350 static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
351 {
352 struct super_block *sb = dentry->d_sb;
353 struct ovl_fs *ofs = sb->s_fs_info;
354
355 seq_show_option(m, "lowerdir", ofs->config.lowerdir);
356 if (ofs->config.upperdir) {
357 seq_show_option(m, "upperdir", ofs->config.upperdir);
358 seq_show_option(m, "workdir", ofs->config.workdir);
359 }
360 if (ofs->config.default_permissions)
361 seq_puts(m, ",default_permissions");
362 if (strcmp(ofs->config.redirect_mode, ovl_redirect_mode_def()) != 0)
363 seq_printf(m, ",redirect_dir=%s", ofs->config.redirect_mode);
364 if (ofs->config.index != ovl_index_def)
365 seq_printf(m, ",index=%s", ofs->config.index ? "on" : "off");
366 if (ofs->config.nfs_export != ovl_nfs_export_def)
367 seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
368 "on" : "off");
369 if (ofs->config.xino != ovl_xino_def() && !ovl_same_fs(sb))
370 seq_printf(m, ",xino=%s", ovl_xino_str[ofs->config.xino]);
371 if (ofs->config.metacopy != ovl_metacopy_def)
372 seq_printf(m, ",metacopy=%s",
373 ofs->config.metacopy ? "on" : "off");
374 if (ofs->config.ovl_volatile)
375 seq_puts(m, ",volatile");
376 return 0;
377 }
378
379 static int ovl_remount(struct super_block *sb, int *flags, char *data)
380 {
381 struct ovl_fs *ofs = sb->s_fs_info;
382 struct super_block *upper_sb;
383 int ret = 0;
384
385 if (!(*flags & SB_RDONLY) && ovl_force_readonly(ofs))
386 return -EROFS;
387
388 if (*flags & SB_RDONLY && !sb_rdonly(sb)) {
389 upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
390 if (ovl_should_sync(ofs)) {
391 down_read(&upper_sb->s_umount);
392 ret = sync_filesystem(upper_sb);
393 up_read(&upper_sb->s_umount);
394 }
395 }
396
397 return ret;
398 }
399
400 void ovl_evict_inode(struct inode *inode)
401 {
402 struct inode *upper;
403
404 upper = ovl_inode_upper(inode);
405 if (upper)
406 ovl_unregister_mark_dirty_notify(OVL_I(inode));
407 clear_inode(inode);
408 }
409
> 410 int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
411 {
412 struct ovl_fs *ofs = inode->i_sb->s_fs_info;
413 struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
414 struct inode *upper_inode = ovl_inode_upper(inode);
415 struct writeback_control upper_wbc = {
416 .nr_to_write = LONG_MAX,
417 .sync_mode = WB_SYNC_ALL,
418 .tagged_writepages = wbc->tagged_writepages,
419 .for_kupdate = wbc->for_kupdate,
420 .for_background = wbc->for_background,
421 .for_sync = 0,
422 .range_cyclic = wbc->range_cyclic,
423 .range_start = wbc->range_start,
424 .range_end = wbc->range_end,
425 };
426
427 if (!upper_sb->s_op->write_inode || !upper_inode)
428 return 0;
429 return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
430 }
431
> 432 int ovl_drop_inode(struct inode *inode)
433 {
434 struct inode *upper_inode;
435
436 upper_inode = ovl_inode_upper(inode);
437 if (!upper_inode)
438 return 1;
439 if (!(upper_inode->i_state & I_DIRTY_ALL))
440 return 1;
441
442 return generic_drop_inode(inode);
443 }
444
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33109 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
2020-10-11 2:57 ` kernel test robot
@ 2020-10-11 3:10 ` kernel test robot
2020-10-11 13:08 ` kernel test robot
2020-10-12 12:31 ` Dan Carpenter
3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 3:10 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3806 bytes --]
Hi Chengguang,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on linus/master v5.9-rc8 next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/12d938a37e0a31d43b15e07aef1cd821bf11dc0e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
git checkout 12d938a37e0a31d43b15e07aef1cd821bf11dc0e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
fs/overlayfs/super.c:400:6: warning: no previous prototype for 'ovl_evict_inode' [-Wmissing-prototypes]
400 | void ovl_evict_inode(struct inode *inode)
| ^~~~~~~~~~~~~~~
>> fs/overlayfs/super.c:410:5: warning: no previous prototype for 'ovl_write_inode' [-Wmissing-prototypes]
410 | int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
| ^~~~~~~~~~~~~~~
>> fs/overlayfs/super.c:432:5: warning: no previous prototype for 'ovl_drop_inode' [-Wmissing-prototypes]
432 | int ovl_drop_inode(struct inode *inode)
| ^~~~~~~~~~~~~~
vim +/ovl_write_inode +410 fs/overlayfs/super.c
399
> 400 void ovl_evict_inode(struct inode *inode)
401 {
402 struct inode *upper;
403
404 upper = ovl_inode_upper(inode);
405 if (upper)
406 ovl_unregister_mark_dirty_notify(OVL_I(inode));
407 clear_inode(inode);
408 }
409
> 410 int ovl_write_inode(struct inode *inode, struct writeback_control *wbc)
411 {
412 struct ovl_fs *ofs = inode->i_sb->s_fs_info;
413 struct super_block *upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
414 struct inode *upper_inode = ovl_inode_upper(inode);
415 struct writeback_control upper_wbc = {
416 .nr_to_write = LONG_MAX,
417 .sync_mode = WB_SYNC_ALL,
418 .tagged_writepages = wbc->tagged_writepages,
419 .for_kupdate = wbc->for_kupdate,
420 .for_background = wbc->for_background,
421 .for_sync = 0,
422 .range_cyclic = wbc->range_cyclic,
423 .range_start = wbc->range_start,
424 .range_end = wbc->range_end,
425 };
426
427 if (!upper_sb->s_op->write_inode || !upper_inode)
428 return 0;
429 return upper_sb->s_op->write_inode(upper_inode, &upper_wbc);
430 }
431
> 432 int ovl_drop_inode(struct inode *inode)
433 {
434 struct inode *upper_inode;
435
436 upper_inode = ovl_inode_upper(inode);
437 if (!upper_inode)
438 return 1;
439 if (!(upper_inode->i_state & I_DIRTY_ALL))
440 return 1;
441
442 return generic_drop_inode(inode);
443 }
444
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65077 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
2020-10-11 2:57 ` kernel test robot
2020-10-11 3:10 ` kernel test robot
@ 2020-10-11 13:08 ` kernel test robot
2020-10-12 12:31 ` Dan Carpenter
3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2020-10-11 13:08 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]
Hi Chengguang,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on next-20201009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
"cppcheck warnings: (new ones prefixed by >>)"
^
>> fs/overlayfs/super.c:293:9: warning: Uninitialized variable: ret [uninitvar]
return ret;
^
vim +293 fs/overlayfs/super.c
a9075cdb467dd3b Miklos Szeredi 2017-11-10 259
e8d4bfe3a715372 Chengguang Xu 2017-11-29 260 /* Sync real dirty inodes in upper filesystem (if it exists) */
e593b2bf513dd4d Amir Goldstein 2017-01-23 261 static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 262 {
ad204488d3046b3 Miklos Szeredi 2017-11-10 263 struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein 2017-01-23 264 struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein 2017-01-23 265 int ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 266
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 267 if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein 2017-01-23 268 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 269
c86243b090bc25f Vivek Goyal 2020-08-31 270 if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal 2020-08-31 271 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 272 /*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 273 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 274 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 275 *
e8d4bfe3a715372 Chengguang Xu 2017-11-29 276 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu 2017-11-29 277 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu 2017-11-29 278 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 279 */
e8d4bfe3a715372 Chengguang Xu 2017-11-29 280 if (!wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 281 return 0;
e593b2bf513dd4d Amir Goldstein 2017-01-23 282
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 283 upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 284
12d938a37e0a31d Chengguang Xu 2020-10-10 285 if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein 2017-01-23 286 down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 287 ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu 2020-10-10 288 if (!ret)
12d938a37e0a31d Chengguang Xu 2020-10-10 289 ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein 2017-01-23 290 up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 291 }
e8d4bfe3a715372 Chengguang Xu 2017-11-29 292
e593b2bf513dd4d Amir Goldstein 2017-01-23 @293 return ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 294 }
e593b2bf513dd4d Amir Goldstein 2017-01-23 295
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
@ 2020-10-12 12:31 ` Dan Carpenter
2020-10-11 3:10 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Dan Carpenter @ 2020-10-12 12:31 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]
Hi Chengguang,
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: i386-randconfig-m021-20201011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
smatch warnings:
fs/overlayfs/super.c:293 ovl_sync_fs() error: uninitialized symbol 'ret'.
vim +/ret +293 fs/overlayfs/super.c
e593b2bf513dd4d Amir Goldstein 2017-01-23 261 static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 262 {
ad204488d3046b3 Miklos Szeredi 2017-11-10 263 struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein 2017-01-23 264 struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein 2017-01-23 265 int ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 266
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 267 if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein 2017-01-23 268 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 269
c86243b090bc25f Vivek Goyal 2020-08-31 270 if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal 2020-08-31 271 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 272 /*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 273 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 274 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 275 *
e8d4bfe3a715372 Chengguang Xu 2017-11-29 276 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu 2017-11-29 277 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu 2017-11-29 278 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 279 */
e8d4bfe3a715372 Chengguang Xu 2017-11-29 280 if (!wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 281 return 0;
e593b2bf513dd4d Amir Goldstein 2017-01-23 282
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 283 upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 284
12d938a37e0a31d Chengguang Xu 2020-10-10 285 if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein 2017-01-23 286 down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 287 ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu 2020-10-10 288 if (!ret)
12d938a37e0a31d Chengguang Xu 2020-10-10 289 ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein 2017-01-23 290 up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 291 }
"ret" not initialized on else path.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 292
e593b2bf513dd4d Amir Goldstein 2017-01-23 @293 return ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 294 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33739 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs
@ 2020-10-12 12:31 ` Dan Carpenter
0 siblings, 0 replies; 30+ messages in thread
From: Dan Carpenter @ 2020-10-12 12:31 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]
Hi Chengguang,
url: https://github.com/0day-ci/linux/commits/Chengguang-Xu/implement-containerized-syncfs-for-overlayfs/20201011-071405
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: i386-randconfig-m021-20201011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
smatch warnings:
fs/overlayfs/super.c:293 ovl_sync_fs() error: uninitialized symbol 'ret'.
vim +/ret +293 fs/overlayfs/super.c
e593b2bf513dd4d Amir Goldstein 2017-01-23 261 static int ovl_sync_fs(struct super_block *sb, int wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 262 {
ad204488d3046b3 Miklos Szeredi 2017-11-10 263 struct ovl_fs *ofs = sb->s_fs_info;
e593b2bf513dd4d Amir Goldstein 2017-01-23 264 struct super_block *upper_sb;
e593b2bf513dd4d Amir Goldstein 2017-01-23 265 int ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 266
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 267 if (!ovl_upper_mnt(ofs))
e593b2bf513dd4d Amir Goldstein 2017-01-23 268 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 269
c86243b090bc25f Vivek Goyal 2020-08-31 270 if (!ovl_should_sync(ofs))
c86243b090bc25f Vivek Goyal 2020-08-31 271 return 0;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 272 /*
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 273 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
32b1924b210a70d Konstantin Khlebnikov 2020-04-09 274 * All the super blocks will be iterated, including upper_sb.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 275 *
e8d4bfe3a715372 Chengguang Xu 2017-11-29 276 * If this is a syncfs(2) call, then we do need to call
e8d4bfe3a715372 Chengguang Xu 2017-11-29 277 * sync_filesystem() on upper_sb, but enough if we do it when being
e8d4bfe3a715372 Chengguang Xu 2017-11-29 278 * called with wait == 1.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 279 */
e8d4bfe3a715372 Chengguang Xu 2017-11-29 280 if (!wait)
e593b2bf513dd4d Amir Goldstein 2017-01-23 281 return 0;
e593b2bf513dd4d Amir Goldstein 2017-01-23 282
08f4c7c86d4cf12 Miklos Szeredi 2020-06-04 283 upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
e8d4bfe3a715372 Chengguang Xu 2017-11-29 284
12d938a37e0a31d Chengguang Xu 2020-10-10 285 if (upper_sb->s_op->sync_fs) {
e593b2bf513dd4d Amir Goldstein 2017-01-23 286 down_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 287 ret = upper_sb->s_op->sync_fs(upper_sb, wait);
12d938a37e0a31d Chengguang Xu 2020-10-10 288 if (!ret)
12d938a37e0a31d Chengguang Xu 2020-10-10 289 ret = sync_blockdev(upper_sb->s_bdev);
e593b2bf513dd4d Amir Goldstein 2017-01-23 290 up_read(&upper_sb->s_umount);
12d938a37e0a31d Chengguang Xu 2020-10-10 291 }
"ret" not initialized on else path.
e8d4bfe3a715372 Chengguang Xu 2017-11-29 292
e593b2bf513dd4d Amir Goldstein 2017-01-23 @293 return ret;
e593b2bf513dd4d Amir Goldstein 2017-01-23 294 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33739 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
@ 2020-10-14 16:15 ` Jan Kara
2020-10-15 3:03 ` Chengguang Xu
2020-10-15 3:25 ` Al Viro
1 sibling, 1 reply; 30+ messages in thread
From: Jan Kara @ 2020-10-14 16:15 UTC (permalink / raw)
To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel
On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
So I like how the patch set is elegant however growing struct inode for
everybody by struct blocking_notifier_head (which is rwsem + pointer) is
rather harsh just for this overlayfs functionality... Ideally this should
induce no overhead on struct inode if the filesystem is not covered by
overlayfs. So I'd rather place some external structure into the superblock
that would get allocated on the first use that would track dirty notifications
for each inode. I would not generalize the code for more possible
notifications at this point.
Also now that I'm thinking about it can there be multiple overlayfs inodes
for one upper inode? If not, then the mechanism of dirtiness propagation
could be much simpler - it seems we could be able to just lookup
corresponding overlayfs inode based on upper inode and then mark it dirty
(but this would need to be verified by people more familiar with
overlayfs). So all we'd need to know for this is the superblock of the
overlayfs that's covering given upper filesystem...
Honza
> ---
> fs/fs-writeback.c | 4 ++++
> fs/inode.c | 5 +++++
> include/linux/fs.h | 6 ++++++
> 3 files changed, 15 insertions(+)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..657cceb 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2246,9 +2246,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> {
> struct super_block *sb = inode->i_sb;
> int dirtytime;
> + int copy_flags = flags;
>
> trace_writeback_mark_inode_dirty(inode, flags);
>
> + blocking_notifier_call_chain(
> + &inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER],
> + 0, ©_flags);
> /*
> * Don't do this for I_DIRTY_PAGES - that doesn't actually
> * dirty the inode itself
> diff --git a/fs/inode.c b/fs/inode.c
> index 72c4c34..84e82db 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -388,12 +388,17 @@ void address_space_init_once(struct address_space *mapping)
> */
> void inode_init_once(struct inode *inode)
> {
> + int i;
> +
> memset(inode, 0, sizeof(*inode));
> INIT_HLIST_NODE(&inode->i_hash);
> INIT_LIST_HEAD(&inode->i_devices);
> INIT_LIST_HEAD(&inode->i_io_list);
> INIT_LIST_HEAD(&inode->i_wb_list);
> INIT_LIST_HEAD(&inode->i_lru);
> + for (i = 0; i < INODE_MAX_NOTIFIER; i++)
> + BLOCKING_INIT_NOTIFIER_HEAD(
> + &inode->notifier_lists[MARK_INODE_DIRTY_NOTIFIER]);
> __address_space_init_once(&inode->i_data);
> i_size_ordered_init(inode);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7519ae0..42f0750 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -607,6 +607,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
>
> struct fsnotify_mark_connector;
>
> +enum {
> + MARK_INODE_DIRTY_NOTIFIER,
> + INODE_MAX_NOTIFIER,
> +};
> +
> /*
> * Keep mostly read-only and often accessed (especially for
> * the RCU path lookup and 'stat' data) fields at the beginning
> @@ -723,6 +728,7 @@ struct inode {
> #endif
>
> void *i_private; /* fs or device private pointer */
> + struct blocking_notifier_head notifier_lists[INODE_MAX_NOTIFIER];
> } __randomize_layout;
>
> struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
> --
> 1.8.3.1
>
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-14 16:15 ` Jan Kara
@ 2020-10-15 3:03 ` Chengguang Xu
2020-10-15 6:11 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 3:03 UTC (permalink / raw)
To: Jan Kara; +Cc: miklos, amir73il, linux-unionfs, linux-fsdevel, cgxu519
---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
> On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> > Currently there is no notification api for kernel about modification
> > of vfs inode, in some use cases like overlayfs, this kind of notification
> > will be very helpful to implement containerized syncfs functionality.
> > As the first attempt, we introduce marking inode dirty notification so that
> > overlay's inode could mark itself dirty as well and then only sync dirty
> > overlay inode while syncfs.
> >
> > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>
> So I like how the patch set is elegant however growing struct inode for
> everybody by struct blocking_notifier_head (which is rwsem + pointer) is
> rather harsh just for this overlayfs functionality... Ideally this should
> induce no overhead on struct inode if the filesystem is not covered by
> overlayfs. So I'd rather place some external structure into the superblock
> that would get allocated on the first use that would track dirty notifications
> for each inode. I would not generalize the code for more possible
> notifications at this point.
>
> Also now that I'm thinking about it can there be multiple overlayfs inodes
> for one upper inode? If not, then the mechanism of dirtiness propagation
One upper inode only belongs to one overlayfs inode. However, in practice
one upper fs may contains hundreds or even thousands of overlayfs instances.
> could be much simpler - it seems we could be able to just lookup
> corresponding overlayfs inode based on upper inode and then mark it dirty
> (but this would need to be verified by people more familiar with
> overlayfs). So all we'd need to know for this is the superblock of the
> overlayfs that's covering given upper filesystem...
>
So the difficulty is how we get overlayfs inode efficiently from upper inode,
it seems if we don't have additional info of upper inode to indicate which
overlayfs it belongs to, then the lookup of corresponding overlayfs inode
will be quite expensive and probably impact write performance.
Is that possible we extend inotify infrastructure slightly to notify both
user space and kernel component?
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15 ` Jan Kara
@ 2020-10-15 3:25 ` Al Viro
2020-10-15 3:42 ` Chengguang Xu
1 sibling, 1 reply; 30+ messages in thread
From: Al Viro @ 2020-10-15 3:25 UTC (permalink / raw)
To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel
On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> Currently there is no notification api for kernel about modification
> of vfs inode, in some use cases like overlayfs, this kind of notification
> will be very helpful to implement containerized syncfs functionality.
> As the first attempt, we introduce marking inode dirty notification so that
> overlay's inode could mark itself dirty as well and then only sync dirty
> overlay inode while syncfs.
Who's responsible for removing the crap from notifier chain? And how does
that affect the lifetime of inode?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 3:25 ` Al Viro
@ 2020-10-15 3:42 ` Chengguang Xu
2020-10-15 4:57 ` Al Viro
0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 3:42 UTC (permalink / raw)
To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel, cgxu519
---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> > Currently there is no notification api for kernel about modification
> > of vfs inode, in some use cases like overlayfs, this kind of notification
> > will be very helpful to implement containerized syncfs functionality.
> > As the first attempt, we introduce marking inode dirty notification so that
> > overlay's inode could mark itself dirty as well and then only sync dirty
> > overlay inode while syncfs.
>
> Who's responsible for removing the crap from notifier chain? And how does
> that affect the lifetime of inode?
In this case, overlayfs unregisters call back from the notifier chain of upper inode
when evicting it's own inode. It will not affect the lifetime of upper inode because
overlayfs inode holds a reference of upper inode that means upper inode will not be
evicted while overlayfs inode is still alive.
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 3:42 ` Chengguang Xu
@ 2020-10-15 4:57 ` Al Viro
2020-10-15 10:56 ` Chengguang Xu
2020-10-16 7:09 ` Chengguang Xu
0 siblings, 2 replies; 30+ messages in thread
From: Al Viro @ 2020-10-15 4:57 UTC (permalink / raw)
To: Chengguang Xu; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel
On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
> ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> > > Currently there is no notification api for kernel about modification
> > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > will be very helpful to implement containerized syncfs functionality.
> > > As the first attempt, we introduce marking inode dirty notification so that
> > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > overlay inode while syncfs.
> >
> > Who's responsible for removing the crap from notifier chain? And how does
> > that affect the lifetime of inode?
>
> In this case, overlayfs unregisters call back from the notifier chain of upper inode
> when evicting it's own inode. It will not affect the lifetime of upper inode because
> overlayfs inode holds a reference of upper inode that means upper inode will not be
> evicted while overlayfs inode is still alive.
Let me see if I've got it right:
* your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
vast majority of inodes) recepients
* recepient pins the inode for as long as the recepient exists
That looks like a massive overkill, especially since all you are propagating is
dirtying the suckers. All you really need is one bit in your inode + hash table
indexed by the address of struct inode (well, middle bits thereof, as usual).
With entries embedded into overlayfs-private part of overlayfs inode. And callback
to be called stored in that entry...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 3:03 ` Chengguang Xu
@ 2020-10-15 6:11 ` Amir Goldstein
2020-10-15 11:29 ` Chengguang Xu
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 6:11 UTC (permalink / raw)
To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
> > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> > > Currently there is no notification api for kernel about modification
> > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > will be very helpful to implement containerized syncfs functionality.
> > > As the first attempt, we introduce marking inode dirty notification so that
> > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > overlay inode while syncfs.
> > >
> > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> >
> > So I like how the patch set is elegant however growing struct inode for
> > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
> > rather harsh just for this overlayfs functionality... Ideally this should
> > induce no overhead on struct inode if the filesystem is not covered by
> > overlayfs. So I'd rather place some external structure into the superblock
> > that would get allocated on the first use that would track dirty notifications
> > for each inode. I would not generalize the code for more possible
> > notifications at this point.
> >
> > Also now that I'm thinking about it can there be multiple overlayfs inodes
> > for one upper inode? If not, then the mechanism of dirtiness propagation
>
> One upper inode only belongs to one overlayfs inode. However, in practice
> one upper fs may contains hundreds or even thousands of overlayfs instances.
>
> > could be much simpler - it seems we could be able to just lookup
> > corresponding overlayfs inode based on upper inode and then mark it dirty
> > (but this would need to be verified by people more familiar with
> > overlayfs). So all we'd need to know for this is the superblock of the
> > overlayfs that's covering given upper filesystem...
> >
>
> So the difficulty is how we get overlayfs inode efficiently from upper inode,
> it seems if we don't have additional info of upper inode to indicate which
> overlayfs it belongs to, then the lookup of corresponding overlayfs inode
> will be quite expensive and probably impact write performance.
>
> Is that possible we extend inotify infrastructure slightly to notify both
> user space and kernel component?
>
>
When I first saw your suggestion, that is what I was thinking.
Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
in struct inode already exists.
But I have to admit this approach seems like a massive overkill to
what you need.
What you implemented, tracks upper inodes that could have
been dirtied under overlayfs, but what you really want is to
track is upper inodes that were dirtied *by* overlayfs.
And for that purpose, as Miklos said several times, it would be best
pursue the overlayfs aops approach, even though it may be much
harder..
Your earlier patches maintained a list of overlayfs to be synced inodes.
Remind me what was wrong with that approach?
Perhaps you can combine that with the shadow overlay sbi approach.
Instead of dirtying overlay inode when underlying is dirtied, you can
"pre-dirty" overlayfs inode in higher level file ops and add them to the
"maybe-dirty" list (e.g. after write).
ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
if the underlying inode is still dirty on the (!wait) pass.
As for memory mapped inodes via overlayfs (which can be dirtied without
notifying overlayfs) I am not sure that is a big problem in practice.
When an inode is writably mapped via ovarlayfs, you can flag the
overlay inode with "maybe-writably-mapped" and then remove
it from the maybe dirty list when the underlying inode is not dirty
AND its i_writecount is 0 (checked on write_inode() and release()).
Actually, there is no reason to treat writably mapped inodes and
other dirty inodes differently - insert to suspect list on open for
write, remove from suspect list on last release() or write_inode()
when inode is no longer dirty and writable.
Did I miss anything?
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 4:57 ` Al Viro
@ 2020-10-15 10:56 ` Chengguang Xu
2020-10-16 7:09 ` Chengguang Xu
1 sibling, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 10:56 UTC (permalink / raw)
To: Al Viro; +Cc: miklos, amir73il, jack, linux-unionfs, linux-fsdevel
---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
> > ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> > > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> > > > Currently there is no notification api for kernel about modification
> > > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > > will be very helpful to implement containerized syncfs functionality.
> > > > As the first attempt, we introduce marking inode dirty notification so that
> > > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > > overlay inode while syncfs.
> > >
> > > Who's responsible for removing the crap from notifier chain? And how does
> > > that affect the lifetime of inode?
> >
> > In this case, overlayfs unregisters call back from the notifier chain of upper inode
> > when evicting it's own inode. It will not affect the lifetime of upper inode because
> > overlayfs inode holds a reference of upper inode that means upper inode will not be
> > evicted while overlayfs inode is still alive.
>
> Let me see if I've got it right:
> * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
> vast majority of inodes) recepients
> * recepient pins the inode for as long as the recepient exists
>
> That looks like a massive overkill, especially since all you are propagating is
> dirtying the suckers. All you really need is one bit in your inode + hash table
> indexed by the address of struct inode (well, middle bits thereof, as usual).
> With entries embedded into overlayfs-private part of overlayfs inode. And callback
> to be called stored in that entry...
>
Yeah, actually that could implement the logic but I'm afraid of performance degradation
caused by lock contention of hash table in concurrent file write because in practice we
build up hundreds of overlayfs instances on same underlying filesystem.
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 6:11 ` Amir Goldstein
@ 2020-10-15 11:29 ` Chengguang Xu
2020-10-15 12:32 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 11:29 UTC (permalink / raw)
To: Amir Goldstein
Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519
---- 在 星期四, 2020-10-15 14:11:04 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >
> > ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
> > > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
> > > > Currently there is no notification api for kernel about modification
> > > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > > will be very helpful to implement containerized syncfs functionality.
> > > > As the first attempt, we introduce marking inode dirty notification so that
> > > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > > overlay inode while syncfs.
> > > >
> > > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> > >
> > > So I like how the patch set is elegant however growing struct inode for
> > > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
> > > rather harsh just for this overlayfs functionality... Ideally this should
> > > induce no overhead on struct inode if the filesystem is not covered by
> > > overlayfs. So I'd rather place some external structure into the superblock
> > > that would get allocated on the first use that would track dirty notifications
> > > for each inode. I would not generalize the code for more possible
> > > notifications at this point.
> > >
> > > Also now that I'm thinking about it can there be multiple overlayfs inodes
> > > for one upper inode? If not, then the mechanism of dirtiness propagation
> >
> > One upper inode only belongs to one overlayfs inode. However, in practice
> > one upper fs may contains hundreds or even thousands of overlayfs instances.
> >
> > > could be much simpler - it seems we could be able to just lookup
> > > corresponding overlayfs inode based on upper inode and then mark it dirty
> > > (but this would need to be verified by people more familiar with
> > > overlayfs). So all we'd need to know for this is the superblock of the
> > > overlayfs that's covering given upper filesystem...
> > >
> >
> > So the difficulty is how we get overlayfs inode efficiently from upper inode,
> > it seems if we don't have additional info of upper inode to indicate which
> > overlayfs it belongs to, then the lookup of corresponding overlayfs inode
> > will be quite expensive and probably impact write performance.
> >
> > Is that possible we extend inotify infrastructure slightly to notify both
> > user space and kernel component?
> >
> >
>
> When I first saw your suggestion, that is what I was thinking.
> Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
> in struct inode already exists.
>
> But I have to admit this approach seems like a massive overkill to
> what you need.
>
> What you implemented, tracks upper inodes that could have
> been dirtied under overlayfs, but what you really want is to
> track is upper inodes that were dirtied *by* overlayfs.
>
> And for that purpose, as Miklos said several times, it would be best
> pursue the overlayfs aops approach, even though it may be much
> harder..
>
IIUC, that solution was raised to solve mmap rw-ro issue.
That maybe is an ultimate goal and I'm wondering whether we must
implement that if we have easier approach to solve mmap and syncfs
issues.
> Your earlier patches maintained a list of overlayfs to be synced inodes.
> Remind me what was wrong with that approach?
>
I think the main concerns are the complexity and the timing of releasing
ovl_sync_entry struct.
> Perhaps you can combine that with the shadow overlay sbi approach.
> Instead of dirtying overlay inode when underlying is dirtied, you can
> "pre-dirty" overlayfs inode in higher level file ops and add them to the
> "maybe-dirty" list (e.g. after write).
>
Main problem is we can't be notified by set_page_dirty from writable mmap.
Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
inode and clear it after syncing, then overlay inode could be release at any time,
so in the end, maybe overlay inode is released but upper inode is still dirty and
there is no any pointer to find upper dirty inode out.
> ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
> if the underlying inode is still dirty on the (!wait) pass.
>
> As for memory mapped inodes via overlayfs (which can be dirtied without
> notifying overlayfs) I am not sure that is a big problem in practice.
>
Yes, it's key problem here.
> When an inode is writably mapped via ovarlayfs, you can flag the
> overlay inode with "maybe-writably-mapped" and then remove
> it from the maybe dirty list when the underlying inode is not dirty
> AND its i_writecount is 0 (checked on write_inode() and release()).
>
> Actually, there is no reason to treat writably mapped inodes and
> other dirty inodes differently - insert to suspect list on open for
> write, remove from suspect list on last release() or write_inode()
> when inode is no longer dirty and writable.
>
> Did I miss anything?
>
If we dirty overlay inode that means we have launched writeback mechanism,
so in this case, re-dirty overlay inode in time becomes important.
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 11:29 ` Chengguang Xu
@ 2020-10-15 12:32 ` Amir Goldstein
2020-10-15 13:13 ` Chengguang Xu
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 12:32 UTC (permalink / raw)
To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
> > Perhaps you can combine that with the shadow overlay sbi approach.
> > Instead of dirtying overlay inode when underlying is dirtied, you can
> > "pre-dirty" overlayfs inode in higher level file ops and add them to the
> > "maybe-dirty" list (e.g. after write).
> >
>
> Main problem is we can't be notified by set_page_dirty from writable mmap.
> Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
> inode and clear it after syncing, then overlay inode could be release at any time,
> so in the end, maybe overlay inode is released but upper inode is still dirty and
> there is no any pointer to find upper dirty inode out.
>
But we can control whether overlay inode is release with ovl_drop_inode()
right? Can we prevent dropping overlay inode if upper inode is
inode_is_open_for_write()?
About re-dirty, see below...
>
> > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
> > if the underlying inode is still dirty on the (!wait) pass.
> >
> > As for memory mapped inodes via overlayfs (which can be dirtied without
> > notifying overlayfs) I am not sure that is a big problem in practice.
> >
>
> Yes, it's key problem here.
>
> > When an inode is writably mapped via ovarlayfs, you can flag the
> > overlay inode with "maybe-writably-mapped" and then remove
> > it from the maybe dirty list when the underlying inode is not dirty
> > AND its i_writecount is 0 (checked on write_inode() and release()).
> >
> > Actually, there is no reason to treat writably mapped inodes and
> > other dirty inodes differently - insert to suspect list on open for
> > write, remove from suspect list on last release() or write_inode()
> > when inode is no longer dirty and writable.
> >
> > Did I miss anything?
> >
>
> If we dirty overlay inode that means we have launched writeback mechanism,
> so in this case, re-dirty overlay inode in time becomes important.
>
My idea was to use the first call to ovl_sync_fs() with 'wait' false
to iterate the
maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
Then in the second call to __sync_filesystem, sync_inodes_sb() will take
care of calling ovl_write_inode() for all the re-dirty inodes.
In current code we sync ALL dirty upper fs inodes and we do not overlay
inodes with no reference in cache.
The best code would sync only upper fs inodes dirtied by this overlay
and will be able to evict overlay inodes whose upper inodes are clean.
The compromise code would sync only upper fs inodes dirtied by this overlay,
and would not evict overlay inodes as long as upper inodes are "open for write".
I think its a fine compromise considering the alternatives.
Is this workable?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 12:32 ` Amir Goldstein
@ 2020-10-15 13:13 ` Chengguang Xu
2020-10-15 16:02 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-15 13:13 UTC (permalink / raw)
To: Amir Goldstein
Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro, cgxu519
---- 在 星期四, 2020-10-15 20:32:24 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> > > Perhaps you can combine that with the shadow overlay sbi approach.
> > > Instead of dirtying overlay inode when underlying is dirtied, you can
> > > "pre-dirty" overlayfs inode in higher level file ops and add them to the
> > > "maybe-dirty" list (e.g. after write).
> > >
> >
> > Main problem is we can't be notified by set_page_dirty from writable mmap.
> > Meanwhile, if we dirty overlay inode then writeback will pick up dirty overlay
> > inode and clear it after syncing, then overlay inode could be release at any time,
> > so in the end, maybe overlay inode is released but upper inode is still dirty and
> > there is no any pointer to find upper dirty inode out.
> >
>
> But we can control whether overlay inode is release with ovl_drop_inode()
> right? Can we prevent dropping overlay inode if upper inode is
> inode_is_open_for_write()?
>
> About re-dirty, see below...
>
> >
> > > ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
> > > if the underlying inode is still dirty on the (!wait) pass.
> > >
> > > As for memory mapped inodes via overlayfs (which can be dirtied without
> > > notifying overlayfs) I am not sure that is a big problem in practice.
> > >
> >
> > Yes, it's key problem here.
> >
> > > When an inode is writably mapped via ovarlayfs, you can flag the
> > > overlay inode with "maybe-writably-mapped" and then remove
> > > it from the maybe dirty list when the underlying inode is not dirty
> > > AND its i_writecount is 0 (checked on write_inode() and release()).
> > >
> > > Actually, there is no reason to treat writably mapped inodes and
> > > other dirty inodes differently - insert to suspect list on open for
> > > write, remove from suspect list on last release() or write_inode()
> > > when inode is no longer dirty and writable.
I have to say inserting to suspect list cannot prevent dropping,
thinking of the problem of previous approach that we write dirty upper
inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
> > >
> > > Did I miss anything?
> > >
> >
> > If we dirty overlay inode that means we have launched writeback mechanism,
> > so in this case, re-dirty overlay inode in time becomes important.
> >
>
> My idea was to use the first call to ovl_sync_fs() with 'wait' false
> to iterate the
> maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>
I'm curious how we prevent dropping of clean overlay inode with dirty upper?
Hold another reference during iput_final operation? in the drop_inode() or something
else?
> Then in the second call to __sync_filesystem, sync_inodes_sb() will take
> care of calling ovl_write_inode() for all the re-dirty inodes.
>
> In current code we sync ALL dirty upper fs inodes and we do not overlay
> inodes with no reference in cache.
>
> The best code would sync only upper fs inodes dirtied by this overlay
> and will be able to evict overlay inodes whose upper inodes are clean.
>
> The compromise code would sync only upper fs inodes dirtied by this overlay,
> and would not evict overlay inodes as long as upper inodes are "open for write".
> I think its a fine compromise considering the alternatives.
>
> Is this workable?
>
In your approach, the key point is how to prevent dropping overlay inode that has
dirty upper and no reference but I don't understand well how to achieve it from
your descriptions.
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 13:13 ` Chengguang Xu
@ 2020-10-15 16:02 ` Amir Goldstein
2020-10-15 16:06 ` Amir Goldstein
2020-10-16 1:56 ` Chengguang Xu
0 siblings, 2 replies; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:02 UTC (permalink / raw)
To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
> > > > When an inode is writably mapped via ovarlayfs, you can flag the
> > > > overlay inode with "maybe-writably-mapped" and then remove
> > > > it from the maybe dirty list when the underlying inode is not dirty
> > > > AND its i_writecount is 0 (checked on write_inode() and release()).
> > > >
> > > > Actually, there is no reason to treat writably mapped inodes and
> > > > other dirty inodes differently - insert to suspect list on open for
> > > > write, remove from suspect list on last release() or write_inode()
> > > > when inode is no longer dirty and writable.
>
> I have to say inserting to suspect list cannot prevent dropping,
> thinking of the problem of previous approach that we write dirty upper
> inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>
Sorry, I don't understand what that means.
>
> > > >
> > > > Did I miss anything?
> > > >
> > >
> > > If we dirty overlay inode that means we have launched writeback mechanism,
> > > so in this case, re-dirty overlay inode in time becomes important.
> > >
> >
> > My idea was to use the first call to ovl_sync_fs() with 'wait' false
> > to iterate the
> > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
> >
>
> I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> Hold another reference during iput_final operation? in the drop_inode() or something
> else?
No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
>
>
> > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
> > care of calling ovl_write_inode() for all the re-dirty inodes.
> >
> > In current code we sync ALL dirty upper fs inodes and we do not overlay
> > inodes with no reference in cache.
> >
> > The best code would sync only upper fs inodes dirtied by this overlay
> > and will be able to evict overlay inodes whose upper inodes are clean.
> >
> > The compromise code would sync only upper fs inodes dirtied by this overlay,
> > and would not evict overlay inodes as long as upper inodes are "open for write".
> > I think its a fine compromise considering the alternatives.
> >
> > Is this workable?
> >
>
> In your approach, the key point is how to prevent dropping overlay inode that has
> dirty upper and no reference but I don't understand well how to achieve it from
> your descriptions.
>
>
Very well, I will try to explain with code:
int ovl_inode_is_open_for_write(struct inode *inode)
{
struct inode *upper_inode = ovl_inode_upper(inode);
return upper_inode && inode_is_open_for_write(upper_inode);
}
void ovl_maybe_mark_inode_dirty(struct inode *inode)
{
struct inode *upper_inode = ovl_inode_upper(inode);
if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
mark_inode_dirty(inode);
}
int ovl_open(struct inode *inode, struct file *file)
{
...
if (ovl_inode_is_open_for_write(file_inode(file)))
ovl_add_inode_to_suspect_list(inode);
file->private_data = realfile;
return 0;
}
int ovl_release(struct inode *inode, struct file *file)
{
struct inode *inode = file_inode(file);
if (ovl_inode_is_open_for_write(inode)) {
ovl_maybe_mark_inode_dirty(inode);
ovl_remove_inode_from_suspect_list(inode);
}
fput(file->private_data);
return 0;
}
int ovl_drop_inode(struct inode *inode)
{
struct inode *upper_inode = ovl_inode_upper(inode);
if (!upper_inode)
return 1;
if (upper_inode->i_state & I_DIRTY_ALL)
return 0;
return !inode_is_open_for_write(upper_inode);
}
static int ovl_sync_fs(struct super_block *sb, int wait)
{
struct ovl_fs *ofs = sb->s_fs_info;
struct super_block *upper_sb;
int ret;
if (!ovl_upper_mnt(ofs))
return 0;
/*
* Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
* All the super blocks will be iterated, including upper_sb.
*
* If this is a syncfs(2) call, then we do need to call
* sync_filesystem() on upper_sb, but enough if we do it when being
* called with wait == 1.
*/
if (!wait) {
/* mark inodes on the suspect list dirty if thier
upper inode is dirty */
ovl_mark_suspect_list_inodes_dirty();
return 0;
}
...
The races are avoided because inode is added/removed from suspect
list while overlay inode has a reference (from file) and because upper inode
cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
Unless I am missing something.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 16:02 ` Amir Goldstein
@ 2020-10-15 16:06 ` Amir Goldstein
2020-10-16 1:56 ` Chengguang Xu
1 sibling, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2020-10-15 16:06 UTC (permalink / raw)
To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
> Very well, I will try to explain with code:
>
> int ovl_inode_is_open_for_write(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> return upper_inode && inode_is_open_for_write(upper_inode);
> }
>
> void ovl_maybe_mark_inode_dirty(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
> mark_inode_dirty(inode);
> }
>
> int ovl_open(struct inode *inode, struct file *file)
> {
> ...
> if (ovl_inode_is_open_for_write(file_inode(file)))
> ovl_add_inode_to_suspect_list(inode);
>
> file->private_data = realfile;
>
> return 0;
> }
>
> int ovl_release(struct inode *inode, struct file *file)
> {
> struct inode *inode = file_inode(file);
>
> if (ovl_inode_is_open_for_write(inode)) {
Darn! that should have been (!ovl_inode_is_open_for_write(inode))
and it should be done after fput() and grabbing inode reference with
ihold() before fput().
> ovl_maybe_mark_inode_dirty(inode);
> ovl_remove_inode_from_suspect_list(inode);
> }
>
> fput(file->private_data);
>
> return 0;
> }
>
> int ovl_drop_inode(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> if (!upper_inode)
> return 1;
> if (upper_inode->i_state & I_DIRTY_ALL)
> return 0;
>
> return !inode_is_open_for_write(upper_inode);
> }
>
> static int ovl_sync_fs(struct super_block *sb, int wait)
> {
> struct ovl_fs *ofs = sb->s_fs_info;
> struct super_block *upper_sb;
> int ret;
>
> if (!ovl_upper_mnt(ofs))
> return 0;
>
> /*
> * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> * All the super blocks will be iterated, including upper_sb.
> *
> * If this is a syncfs(2) call, then we do need to call
> * sync_filesystem() on upper_sb, but enough if we do it when being
> * called with wait == 1.
> */
> if (!wait) {
> /* mark inodes on the suspect list dirty if thier
> upper inode is dirty */
> ovl_mark_suspect_list_inodes_dirty();
> return 0;
> }
> ...
>
>
> The races are avoided because inode is added/removed from suspect
> list while overlay inode has a reference (from file) and because upper inode
> cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
>
> Unless I am missing something.
>
> Thanks,
> Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 16:02 ` Amir Goldstein
2020-10-15 16:06 ` Amir Goldstein
@ 2020-10-16 1:56 ` Chengguang Xu
2020-10-16 4:39 ` Amir Goldstein
1 sibling, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16 1:56 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> > > > > When an inode is writably mapped via ovarlayfs, you can flag the
> > > > > overlay inode with "maybe-writably-mapped" and then remove
> > > > > it from the maybe dirty list when the underlying inode is not dirty
> > > > > AND its i_writecount is 0 (checked on write_inode() and release()).
> > > > >
> > > > > Actually, there is no reason to treat writably mapped inodes and
> > > > > other dirty inodes differently - insert to suspect list on open for
> > > > > write, remove from suspect list on last release() or write_inode()
> > > > > when inode is no longer dirty and writable.
> >
> > I have to say inserting to suspect list cannot prevent dropping,
> > thinking of the problem of previous approach that we write dirty upper
> > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
> >
>
> Sorry, I don't understand what that means.
This is the main problem of my previous patch set V10, evicting clean inode
expects no write behavior but in the case of dirty upper inode we have to
write out dirty data in this timing otherwise we will lose the connection with upper inode.
>
> >
> > > > >
> > > > > Did I miss anything?
> > > > >
> > > >
> > > > If we dirty overlay inode that means we have launched writeback mechanism,
> > > > so in this case, re-dirty overlay inode in time becomes important.
> > > >
> > >
> > > My idea was to use the first call to ovl_sync_fs() with 'wait' false
> > > to iterate the
> > > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
> > >
> >
> > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> > Hold another reference during iput_final operation? in the drop_inode() or something
> > else?
>
> No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
It's not good, it only temporarily skips eviction, the inode in lru list
will be evicted in some cases like drop cache or memory reclaim. etc.
A solution for this is getting another reference in ->drop_inode so that
the inode can escape from adding to lru list but this looks awkward and tricky.
>
> >
> >
> > > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
> > > care of calling ovl_write_inode() for all the re-dirty inodes.
> > >
> > > In current code we sync ALL dirty upper fs inodes and we do not overlay
> > > inodes with no reference in cache.
> > >
> > > The best code would sync only upper fs inodes dirtied by this overlay
> > > and will be able to evict overlay inodes whose upper inodes are clean.
> > >
> > > The compromise code would sync only upper fs inodes dirtied by this overlay,
> > > and would not evict overlay inodes as long as upper inodes are "open for write".
> > > I think its a fine compromise considering the alternatives.
> > >
> > > Is this workable?
> > >
> >
> > In your approach, the key point is how to prevent dropping overlay inode that has
> > dirty upper and no reference but I don't understand well how to achieve it from
> > your descriptions.
> >
> >
>
> Very well, I will try to explain with code:
>
> int ovl_inode_is_open_for_write(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> return upper_inode && inode_is_open_for_write(upper_inode);
> }
>
> void ovl_maybe_mark_inode_dirty(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
> mark_inode_dirty(inode);
> }
>
> int ovl_open(struct inode *inode, struct file *file)
> {
> ...
> if (ovl_inode_is_open_for_write(file_inode(file)))
> ovl_add_inode_to_suspect_list(inode);
>
> file->private_data = realfile;
>
> return 0;
> }
>
> int ovl_release(struct inode *inode, struct file *file)
> {
> struct inode *inode = file_inode(file);
>
> if (ovl_inode_is_open_for_write(inode)) {
> ovl_maybe_mark_inode_dirty(inode);
> ovl_remove_inode_from_suspect_list(inode);
I think in some cases removing from suspect_list will cause losing
the connection with upper inode that has writable mmap.
> }
>
> fput(file->private_data);
>
> return 0;
> }
>
> int ovl_drop_inode(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> if (!upper_inode)
> return 1;
> if (upper_inode->i_state & I_DIRTY_ALL)
> return 0;
>
> return !inode_is_open_for_write(upper_inode);
Is this condition just for writable mmap?
> }
>
> static int ovl_sync_fs(struct super_block *sb, int wait)
> {
> struct ovl_fs *ofs = sb->s_fs_info;
> struct super_block *upper_sb;
> int ret;
>
> if (!ovl_upper_mnt(ofs))
> return 0;
>
> /*
> * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> * All the super blocks will be iterated, including upper_sb.
> *
> * If this is a syncfs(2) call, then we do need to call
> * sync_filesystem() on upper_sb, but enough if we do it when being
> * called with wait == 1.
> */
> if (!wait) {
> /* mark inodes on the suspect list dirty if thier
> upper inode is dirty */
> ovl_mark_suspect_list_inodes_dirty();
> return 0;
> }
> ...
>
Why 2 rounds? it seems the simplest way is only syncing dirty upper inode
on wait==1 just like my previous patch.
>
> The races are avoided because inode is added/removed from suspect
> list while overlay inode has a reference (from file) and because upper inode
> cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
>
> Unless I am missing something.
>
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-16 1:56 ` Chengguang Xu
@ 2020-10-16 4:39 ` Amir Goldstein
2020-10-16 7:43 ` Chengguang Xu
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2020-10-16 4:39 UTC (permalink / raw)
To: Chengguang Xu; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> > > > > > When an inode is writably mapped via ovarlayfs, you can flag the
> > > > > > overlay inode with "maybe-writably-mapped" and then remove
> > > > > > it from the maybe dirty list when the underlying inode is not dirty
> > > > > > AND its i_writecount is 0 (checked on write_inode() and release()).
> > > > > >
> > > > > > Actually, there is no reason to treat writably mapped inodes and
> > > > > > other dirty inodes differently - insert to suspect list on open for
> > > > > > write, remove from suspect list on last release() or write_inode()
> > > > > > when inode is no longer dirty and writable.
> > >
> > > I have to say inserting to suspect list cannot prevent dropping,
> > > thinking of the problem of previous approach that we write dirty upper
> > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
> > >
> >
> > Sorry, I don't understand what that means.
>
> This is the main problem of my previous patch set V10, evicting clean inode
> expects no write behavior but in the case of dirty upper inode we have to
> write out dirty data in this timing otherwise we will lose the connection with upper inode.
>
My thinking was that the suspect list holds a reference to the overlay inode.
The question is can we always safely get rid of that reference and remove
from the suspect list when the inode is no longer "writable". Let's see...
>
> >
> > >
> > > > > >
> > > > > > Did I miss anything?
> > > > > >
> > > > >
> > > > > If we dirty overlay inode that means we have launched writeback mechanism,
> > > > > so in this case, re-dirty overlay inode in time becomes important.
> > > > >
> > > >
> > > > My idea was to use the first call to ovl_sync_fs() with 'wait' false
> > > > to iterate the
> > > > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
> > > >
> > >
> > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> > > Hold another reference during iput_final operation? in the drop_inode() or something
> > > else?
> >
> > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
>
> It's not good, it only temporarily skips eviction, the inode in lru list
> will be evicted in some cases like drop cache or memory reclaim. etc.
>
> A solution for this is getting another reference in ->drop_inode so that
> the inode can escape from adding to lru list but this looks awkward and tricky.
>
Right, that was nonsense. We need to rely on the reference held by the
suspect list.
> >
> > >
> > >
> > > > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
> > > > care of calling ovl_write_inode() for all the re-dirty inodes.
> > > >
> > > > In current code we sync ALL dirty upper fs inodes and we do not overlay
> > > > inodes with no reference in cache.
> > > >
> > > > The best code would sync only upper fs inodes dirtied by this overlay
> > > > and will be able to evict overlay inodes whose upper inodes are clean.
> > > >
> > > > The compromise code would sync only upper fs inodes dirtied by this overlay,
> > > > and would not evict overlay inodes as long as upper inodes are "open for write".
> > > > I think its a fine compromise considering the alternatives.
> > > >
> > > > Is this workable?
> > > >
> > >
> > > In your approach, the key point is how to prevent dropping overlay inode that has
> > > dirty upper and no reference but I don't understand well how to achieve it from
> > > your descriptions.
> > >
> > >
> >
> > Very well, I will try to explain with code:
> >
> > int ovl_inode_is_open_for_write(struct inode *inode)
> > {
> > struct inode *upper_inode = ovl_inode_upper(inode);
> >
> > return upper_inode && inode_is_open_for_write(upper_inode);
> > }
> >
> > void ovl_maybe_mark_inode_dirty(struct inode *inode)
> > {
> > struct inode *upper_inode = ovl_inode_upper(inode);
> >
> > if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
> > mark_inode_dirty(inode);
> > }
> >
> > int ovl_open(struct inode *inode, struct file *file)
> > {
> > ...
> > if (ovl_inode_is_open_for_write(file_inode(file)))
> > ovl_add_inode_to_suspect_list(inode);
> >
> > file->private_data = realfile;
> >
> > return 0;
> > }
> >
> > int ovl_release(struct inode *inode, struct file *file)
> > {
> > struct inode *inode = file_inode(file);
> >
> > if (ovl_inode_is_open_for_write(inode)) {
> > ovl_maybe_mark_inode_dirty(inode);
> > ovl_remove_inode_from_suspect_list(inode);
>
> I think in some cases removing from suspect_list will cause losing
> the connection with upper inode that has writable mmap.
>
First of all I had a bug here.
Need to check for !ovl_inode_is_open_for_write(inode) after fput().
If the upper inode has a writable mmap, the upper inode would still
be "writable" (i_writecount held by the map realfile reference).
So when closing the last overlay file reference while upper inode
writable maps still exist, the remaining issue is when to remove
the overlay inode from the suspect list and allow its eviction and
I did not mention that.
I *think* that this condition should be safe in the regard that
after the condition is met, there is no way to dirty the upper inode
via overlayfs without going through ovl_open().
Obviously, the test should be done with some list lock held.
bool ovl_may_remove_from_suspect_list(struct inode *inode)
{
struct inode *upper_inode = ovl_inode_upper(inode);
if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
return false;
return !inode_is_open_for_write(upper_inode);
}
Now remains the question of WHEN to check for removal
from the suspect list.
The first place is in ovl_sync_fs() when iterating the suspect list,
inodes that meet the above criteria are "indefinitely clean" and
may be removed from the list at that timing.
For eviction during memory pressure, overlay can register a
shrinker to do this garbage collection. Is shrinker being called
on drop_caches? I'm not sure. But we can also do periodic garbage
collection.
In the end, it is not the common case and we just need the garbage
collection to mitigate DoS (or existing workload) that does many:
- open
- mmap(...PROT_WRITE, MAP_SHARED...)
- close
- munmap
>
> > }
> >
> > fput(file->private_data);
> >
> > return 0;
> > }
> >
> > int ovl_drop_inode(struct inode *inode)
> > {
> > struct inode *upper_inode = ovl_inode_upper(inode);
> >
> > if (!upper_inode)
> > return 1;
> > if (upper_inode->i_state & I_DIRTY_ALL)
> > return 0;
> >
> > return !inode_is_open_for_write(upper_inode);
>
> Is this condition just for writable mmap?
>
No, it's for all inodes that may be written from overlayfs (or
from maps created by overalyfs), but as you wrote, this
test is not needed in drop_inode().
>
> > }
> >
> > static int ovl_sync_fs(struct super_block *sb, int wait)
> > {
> > struct ovl_fs *ofs = sb->s_fs_info;
> > struct super_block *upper_sb;
> > int ret;
> >
> > if (!ovl_upper_mnt(ofs))
> > return 0;
> >
> > /*
> > * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> > * All the super blocks will be iterated, including upper_sb.
> > *
> > * If this is a syncfs(2) call, then we do need to call
> > * sync_filesystem() on upper_sb, but enough if we do it when being
> > * called with wait == 1.
> > */
> > if (!wait) {
> > /* mark inodes on the suspect list dirty if thier
> > upper inode is dirty */
> > ovl_mark_suspect_list_inodes_dirty();
> > return 0;
> > }
> > ...
> >
>
> Why 2 rounds? it seems the simplest way is only syncing dirty upper inode
> on wait==1 just like my previous patch.
>
We don't have to do it in 2 rounds, but as long as we have a suspect list,
we can use it to start writeback on wait==0 on all dirty upper inodes from
our list just like the caller intended.
Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
I'm not sure. It feels like a good idea to use generic infrastructure as much as
possible. You should know better than me to answer this question.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-15 4:57 ` Al Viro
2020-10-15 10:56 ` Chengguang Xu
@ 2020-10-16 7:09 ` Chengguang Xu
2020-10-16 9:22 ` Jan Kara
1 sibling, 1 reply; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16 7:09 UTC (permalink / raw)
To: Al Viro, amir73il, jack; +Cc: miklos, linux-unionfs, linux-fsdevel, cgxu519
---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
> > ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> > > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> > > > Currently there is no notification api for kernel about modification
> > > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > > will be very helpful to implement containerized syncfs functionality.
> > > > As the first attempt, we introduce marking inode dirty notification so that
> > > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > > overlay inode while syncfs.
> > >
> > > Who's responsible for removing the crap from notifier chain? And how does
> > > that affect the lifetime of inode?
> >
> > In this case, overlayfs unregisters call back from the notifier chain of upper inode
> > when evicting it's own inode. It will not affect the lifetime of upper inode because
> > overlayfs inode holds a reference of upper inode that means upper inode will not be
> > evicted while overlayfs inode is still alive.
>
> Let me see if I've got it right:
> * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
> vast majority of inodes) recepients
> * recepient pins the inode for as long as the recepient exists
>
> That looks like a massive overkill, especially since all you are propagating is
> dirtying the suckers. All you really need is one bit in your inode + hash table
> indexed by the address of struct inode (well, middle bits thereof, as usual).
> With entries embedded into overlayfs-private part of overlayfs inode. And callback
> to be called stored in that entry...
>
Hi AI, Jack, Amir
Based on your feedback, I would to change the inode dirty notification
something like below, is it acceptable?
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1492271..48473d9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);
+ if (inode->state & I_OVL_INUSE) {
+ struct inode *ovl_inode;
+
+ ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);
+ if (ovl_inode)
+ __mark_inode_dirty(ovl_inode, flags);
+ }
+
/*
* Don't do this for I_DIRTY_PAGES - that doesn't actually
* dirty the inode itself
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c34..ed6c85e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -820,7 +820,7 @@ static struct inode *find_inode(struct super_block *sb,
repeat:
hlist_for_each_entry(inode, head, i_hash) {
- if (inode->i_sb != sb)
+ if (sb && inode->i_sb != sb)
continue;
if (!test(inode, data))
continue;
Thanks,
Chengguang
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-16 4:39 ` Amir Goldstein
@ 2020-10-16 7:43 ` Chengguang Xu
0 siblings, 0 replies; 30+ messages in thread
From: Chengguang Xu @ 2020-10-16 7:43 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, miklos, linux-unionfs, linux-fsdevel, Al Viro
---- 在 星期五, 2020-10-16 12:39:10 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> On Fri, Oct 16, 2020 at 4:56 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
> >
> > ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
> > > > > > > When an inode is writably mapped via ovarlayfs, you can flag the
> > > > > > > overlay inode with "maybe-writably-mapped" and then remove
> > > > > > > it from the maybe dirty list when the underlying inode is not dirty
> > > > > > > AND its i_writecount is 0 (checked on write_inode() and release()).
> > > > > > >
> > > > > > > Actually, there is no reason to treat writably mapped inodes and
> > > > > > > other dirty inodes differently - insert to suspect list on open for
> > > > > > > write, remove from suspect list on last release() or write_inode()
> > > > > > > when inode is no longer dirty and writable.
> > > >
> > > > I have to say inserting to suspect list cannot prevent dropping,
> > > > thinking of the problem of previous approach that we write dirty upper
> > > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
> > > >
> > >
> > > Sorry, I don't understand what that means.
> >
> > This is the main problem of my previous patch set V10, evicting clean inode
> > expects no write behavior but in the case of dirty upper inode we have to
> > write out dirty data in this timing otherwise we will lose the connection with upper inode.
> >
>
> My thinking was that the suspect list holds a reference to the overlay inode.
> The question is can we always safely get rid of that reference and remove
> from the suspect list when the inode is no longer "writable". Let's see...
>
> >
> > >
> > > >
> > > > > > >
> > > > > > > Did I miss anything?
> > > > > > >
> > > > > >
> > > > > > If we dirty overlay inode that means we have launched writeback mechanism,
> > > > > > so in this case, re-dirty overlay inode in time becomes important.
> > > > > >
> > > > >
> > > > > My idea was to use the first call to ovl_sync_fs() with 'wait' false
> > > > > to iterate the
> > > > > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
> > > > >
> > > >
> > > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> > > > Hold another reference during iput_final operation? in the drop_inode() or something
> > > > else?
> > >
> > > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().
> >
> > It's not good, it only temporarily skips eviction, the inode in lru list
> > will be evicted in some cases like drop cache or memory reclaim. etc.
> >
> > A solution for this is getting another reference in ->drop_inode so that
> > the inode can escape from adding to lru list but this looks awkward and tricky.
> >
>
> Right, that was nonsense. We need to rely on the reference held by the
> suspect list.
>
> > >
> > > >
> > > >
> > > > > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
> > > > > care of calling ovl_write_inode() for all the re-dirty inodes.
> > > > >
> > > > > In current code we sync ALL dirty upper fs inodes and we do not overlay
> > > > > inodes with no reference in cache.
> > > > >
> > > > > The best code would sync only upper fs inodes dirtied by this overlay
> > > > > and will be able to evict overlay inodes whose upper inodes are clean.
> > > > >
> > > > > The compromise code would sync only upper fs inodes dirtied by this overlay,
> > > > > and would not evict overlay inodes as long as upper inodes are "open for write".
> > > > > I think its a fine compromise considering the alternatives.
> > > > >
> > > > > Is this workable?
> > > > >
> > > >
> > > > In your approach, the key point is how to prevent dropping overlay inode that has
> > > > dirty upper and no reference but I don't understand well how to achieve it from
> > > > your descriptions.
> > > >
> > > >
> > >
> > > Very well, I will try to explain with code:
> > >
> > > int ovl_inode_is_open_for_write(struct inode *inode)
> > > {
> > > struct inode *upper_inode = ovl_inode_upper(inode);
> > >
> > > return upper_inode && inode_is_open_for_write(upper_inode);
> > > }
> > >
> > > void ovl_maybe_mark_inode_dirty(struct inode *inode)
> > > {
> > > struct inode *upper_inode = ovl_inode_upper(inode);
> > >
> > > if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
> > > mark_inode_dirty(inode);
> > > }
> > >
> > > int ovl_open(struct inode *inode, struct file *file)
> > > {
> > > ...
> > > if (ovl_inode_is_open_for_write(file_inode(file)))
> > > ovl_add_inode_to_suspect_list(inode);
> > >
> > > file->private_data = realfile;
> > >
> > > return 0;
> > > }
> > >
> > > int ovl_release(struct inode *inode, struct file *file)
> > > {
> > > struct inode *inode = file_inode(file);
> > >
> > > if (ovl_inode_is_open_for_write(inode)) {
> > > ovl_maybe_mark_inode_dirty(inode);
> > > ovl_remove_inode_from_suspect_list(inode);
> >
> > I think in some cases removing from suspect_list will cause losing
> > the connection with upper inode that has writable mmap.
> >
>
> First of all I had a bug here.
> Need to check for !ovl_inode_is_open_for_write(inode) after fput().
>
> If the upper inode has a writable mmap, the upper inode would still
> be "writable" (i_writecount held by the map realfile reference).
>
> So when closing the last overlay file reference while upper inode
> writable maps still exist, the remaining issue is when to remove
> the overlay inode from the suspect list and allow its eviction and
> I did not mention that.
>
> I *think* that this condition should be safe in the regard that
> after the condition is met, there is no way to dirty the upper inode
> via overlayfs without going through ovl_open().
> Obviously, the test should be done with some list lock held.
>
> bool ovl_may_remove_from_suspect_list(struct inode *inode)
> {
> struct inode *upper_inode = ovl_inode_upper(inode);
>
> if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
> return false;
>
> return !inode_is_open_for_write(upper_inode);
> }
>
> Now remains the question of WHEN to check for removal
> from the suspect list.
>
> The first place is in ovl_sync_fs() when iterating the suspect list,
> inodes that meet the above criteria are "indefinitely clean" and
> may be removed from the list at that timing.
>
> For eviction during memory pressure, overlay can register a
> shrinker to do this garbage collection. Is shrinker being called
> on drop_caches? I'm not sure. But we can also do periodic garbage
> collection.
>
> In the end, it is not the common case and we just need the garbage
> collection to mitigate DoS (or existing workload) that does many:
> - open
> - mmap(...PROT_WRITE, MAP_SHARED...)
> - close
> - munmap
>
That right, as you mentioned above releasing timing is important and
that also means in worst case, overlayfs hold too much memory resource
until umount. Maybe more proactive shrinker like dedicated workqueue
thread of overlayfs instead of global shrinker will be more helpful.
>
> >
> > > }
> > >
> > > fput(file->private_data);
> > >
> > > return 0;
> > > }
> > >
> > > int ovl_drop_inode(struct inode *inode)
> > > {
> > > struct inode *upper_inode = ovl_inode_upper(inode);
> > >
> > > if (!upper_inode)
> > > return 1;
> > > if (upper_inode->i_state & I_DIRTY_ALL)
> > > return 0;
> > >
> > > return !inode_is_open_for_write(upper_inode);
> >
> > Is this condition just for writable mmap?
> >
>
> No, it's for all inodes that may be written from overlayfs (or
> from maps created by overalyfs), but as you wrote, this
> test is not needed in drop_inode().
>
> >
> > > }
> > >
> > > static int ovl_sync_fs(struct super_block *sb, int wait)
> > > {
> > > struct ovl_fs *ofs = sb->s_fs_info;
> > > struct super_block *upper_sb;
> > > int ret;
> > >
> > > if (!ovl_upper_mnt(ofs))
> > > return 0;
> > >
> > > /*
> > > * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> > > * All the super blocks will be iterated, including upper_sb.
> > > *
> > > * If this is a syncfs(2) call, then we do need to call
> > > * sync_filesystem() on upper_sb, but enough if we do it when being
> > > * called with wait == 1.
> > > */
> > > if (!wait) {
> > > /* mark inodes on the suspect list dirty if thier
> > > upper inode is dirty */
> > > ovl_mark_suspect_list_inodes_dirty();
> > > return 0;
> > > }
> > > ...
> > >
> >
> > Why 2 rounds? it seems the simplest way is only syncing dirty upper inode
> > on wait==1 just like my previous patch.
> >
>
> We don't have to do it in 2 rounds, but as long as we have a suspect list,
> we can use it to start writeback on wait==0 on all dirty upper inodes from
> our list just like the caller intended.
>
> Do we need overlay sbi and mark_inode_dirty() and ovl_write_inode() at all?
> I'm not sure. It feels like a good idea to use generic infrastructure as much as
> possible. You should know better than me to answer this question.
>
IMO, if we maintain a special list like suspect-list to organize target overlay inodes,
then we don't have to mark overlay inode dirty, marking overlay inode dirty will
bring unnecessary complexity in this case and I think there is no special benefit.
So this approach may achieve containerized syncfs for overlayfs and I've submitted
patch set V12 follow this thinking but I think the complexity is still big obstacle to
merge to upstream from feedback of Miklos. The new approach I posted the other day
is really simple to understand compare to the previous solution, if we can optimize
dirtiness notification part a bit more then I think it will be more acceptable.
In the end, I need more feedback and If you guys all think the previous solution is the right way,
then I can do more work based on our discussion above and post V13 later.
Thanks,
Chengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
2020-10-16 7:09 ` Chengguang Xu
@ 2020-10-16 9:22 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2020-10-16 9:22 UTC (permalink / raw)
To: Chengguang Xu
Cc: Al Viro, amir73il, jack, miklos, linux-unionfs, linux-fsdevel
On Fri 16-10-20 15:09:38, Chengguang Xu wrote:
> ---- 在 星期四, 2020-10-15 12:57:41 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> > On Thu, Oct 15, 2020 at 11:42:51AM +0800, Chengguang Xu wrote:
> > > ---- 在 星期四, 2020-10-15 11:25:01 Al Viro <viro@zeniv.linux.org.uk> 撰写 ----
> > > > On Sat, Oct 10, 2020 at 10:23:51PM +0800, Chengguang Xu wrote:
> > > > > Currently there is no notification api for kernel about modification
> > > > > of vfs inode, in some use cases like overlayfs, this kind of notification
> > > > > will be very helpful to implement containerized syncfs functionality.
> > > > > As the first attempt, we introduce marking inode dirty notification so that
> > > > > overlay's inode could mark itself dirty as well and then only sync dirty
> > > > > overlay inode while syncfs.
> > > >
> > > > Who's responsible for removing the crap from notifier chain? And how does
> > > > that affect the lifetime of inode?
> > >
> > > In this case, overlayfs unregisters call back from the notifier chain of upper inode
> > > when evicting it's own inode. It will not affect the lifetime of upper inode because
> > > overlayfs inode holds a reference of upper inode that means upper inode will not be
> > > evicted while overlayfs inode is still alive.
> >
> > Let me see if I've got it right:
> > * your chain contains 1 (for upper inodes) or 0 (everything else, i.e. the
> > vast majority of inodes) recepients
> > * recepient pins the inode for as long as the recepient exists
> >
> > That looks like a massive overkill, especially since all you are propagating is
> > dirtying the suckers. All you really need is one bit in your inode + hash table
> > indexed by the address of struct inode (well, middle bits thereof, as usual).
> > With entries embedded into overlayfs-private part of overlayfs inode. And callback
> > to be called stored in that entry...
> >
>
> Hi AI, Jack, Amir
>
> Based on your feedback, I would to change the inode dirty notification
> something like below, is it acceptable?
>
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1492271..48473d9 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2249,6 +2249,14 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>
> trace_writeback_mark_inode_dirty(inode, flags);
>
> + if (inode->state & I_OVL_INUSE) {
> + struct inode *ovl_inode;
> +
> + ovl_inode = ilookup5(NULL, (unsigned long)inode, ovl_inode_test, inode);
I don't think this will work - superblock pointer is part of the hash value
inode is hashed with so without proper sb pointer you won't find proper
hash chain.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2020-10-16 9:22 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15 ` Jan Kara
2020-10-15 3:03 ` Chengguang Xu
2020-10-15 6:11 ` Amir Goldstein
2020-10-15 11:29 ` Chengguang Xu
2020-10-15 12:32 ` Amir Goldstein
2020-10-15 13:13 ` Chengguang Xu
2020-10-15 16:02 ` Amir Goldstein
2020-10-15 16:06 ` Amir Goldstein
2020-10-16 1:56 ` Chengguang Xu
2020-10-16 4:39 ` Amir Goldstein
2020-10-16 7:43 ` Chengguang Xu
2020-10-15 3:25 ` Al Viro
2020-10-15 3:42 ` Chengguang Xu
2020-10-15 4:57 ` Al Viro
2020-10-15 10:56 ` Chengguang Xu
2020-10-16 7:09 ` Chengguang Xu
2020-10-16 9:22 ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-11 1:16 ` kernel test robot
2020-10-11 1:55 ` kernel test robot
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu
2020-10-11 2:57 ` kernel test robot
2020-10-11 3:10 ` kernel test robot
2020-10-11 13:08 ` kernel test robot
2020-10-12 12:31 ` Dan Carpenter
2020-10-12 12:31 ` Dan Carpenter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.