From: cgxu <cgxu519@mykernel.net>
To: jack@suse.cz, miklos@szeredi.hu, amir73il@gmail.com
Cc: linux-unionfs@vger.kernel.org
Subject: Re: [PATCH v12] ovl: improve syncfs efficiency
Date: Mon, 31 Aug 2020 22:22:26 +0800 [thread overview]
Message-ID: <9165be8f-f125-bd1f-498e-46004f5a845e@mykernel.net> (raw)
In-Reply-To: <20200506095307.23742-1-cgxu519@mykernel.net>
On 5/6/20 5:53 PM, Chengguang Xu wrote:
> Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
> on upper_sb to synchronize whole dirty inodes in upper filesystem
> regardless of the overlay ownership of the inode. In the use case of
> container, when multiple containers using the same underlying upper
> filesystem, it has some shortcomings as below.
>
> (1) Performance
> Synchronization is probably heavy because it actually syncs unnecessary
> inodes for target overlayfs.
>
> (2) Interference
> Unplanned synchronization will probably impact IO performance of
> unrelated container processes on the other overlayfs.
Hi Miklos, Jack, Amir and folks
Recently I got another idea to mitigate the syncfs interferes between
instances, I would like to talk with you guys first before I post
full patch series and hope to get some comments about it.
The basic design as below:
1) Introduce a RB-Tree to orgnize upper inode number bitmap entry,
the entry struct is like below
/* Upper inode number bitmap entry */
struct ovl_uib_entry {
struct rb_node node; //link to rb-tree
struct list_head list;//list for destruction iterating
unsigned long start; //start inode number of bitmap
unsigned long bitmap; //bitmap for ino range (start ~ start+64 -1)
};
2) Record upper inode number to bitmap.
3) Iterating upper-sb's inode list during ->syncfs and only sync dirty
inode which number is in the bitmap.
Cons/Pros compare to previous approach
Cons:
This approach
1) May need to interate more inodes in upper sb list and need to search
the ino in rb-tree for dirty inode.
2) Cost some memory for saving inode number bitmap.
Pros:
This approach
1) Does not need to hold upper inode, so upper inode can be reclaim if
needed.
2) The logic looks simpler(at least for me).
Thanks,
cgxu
>
> This patch tries to only sync target dirty upper inodes which are belong
> to specific overlayfs instance and wait for completion. By doing this,
> it is able to reduce cost of synchronization and will not seriously impact
> IO performance of unrelated processes.
>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
> Changes since v11:
> - Introduce new structure sync_entry to organize target dirty upper
> inode list.
> - Rebase on overlayfs-next tree.
>
> Changes since v10:
> - Add special handling in ovl_evict_inode() for inode eviction trigered
> by kswapd memory reclamation(with PF_MEMALLOC flag).
> - Rebase on current latest overlayfs-next tree.
> - Slightly modify license information.
>
> Changes since v9:
> - Rebase on current latest overlayfs-next tree.
> - Calling clear_inode() regardless of having upper inode.
>
> Changes since v8:
> - Remove unnecessary blk_start_plug() call in ovl_sync_inodes().
>
> Changes since v7:
> - Check OVL_EVICT_PENDING flag instead of I_FREEING to recognize
> inode which is in eviction process.
> - Do not move ovl_inode back to ofs->upper_inodes when inode is in
> eviction process.
> - Delete unnecessary memory barrier in ovl_evict_inode().
>
> Changes since v6:
> - Take/release sb->s_sync_lock bofore/after sync_fs to serialize
> concurrent syncfs.
> - Change list iterating method to improve code readability.
> - Fix typo in commit log and comments.
> - Add header comment to sync.c.
>
> Changes since v5:
> - Move sync related functions to new file sync.c.
> - Introduce OVL_EVICT_PENDING flag to avoid race conditions.
> - If ovl inode flag is I_WILL_FREE or I_FREEING then syncfs
> will wait until OVL_EVICT_PENDING to be cleared.
> - Move inode sync operation into evict_inode from drop_inode.
> - Call upper_sb->sync_fs with no-wait after sync dirty upper
> inodes.
> - Hold s_inode_wblist_lock until deleting item from list.
> - Remove write_inode fuction because no more used.
>
> Changes since v4:
> - Add syncing operation and deleting from upper_inodes list
> during ovl inode destruction.
> - Export symbol of inode_wait_for_writeback() for waiting
> writeback on ovl inode.
> - Reuse I_SYNC flag to avoid race conditions between syncfs
> and drop_inode.
>
> Changes since v3:
> - Introduce upper_inode list to organize ovl inode which has upper
> inode.
> - Avoid iput operation inside spin lock.
> - Change list iteration method for safety.
> - Reuse inode->i_wb_list and sb->s_inodes_wb to save memory.
> - Modify commit log and add more comments to the code.
>
> Changes since v2:
> - Decoupling with syncfs of upper fs by taking sb->s_sync_lock of
> overlayfs.
> - Factoring out sync operation to different helpers for easy
> understanding.
>
> Changes since v1:
> - If upper filesystem is readonly mode then skip synchronization.
> - Introduce global wait list to replace temporary wait list for
> concurrent synchronization.
>
> fs/overlayfs/Makefile | 2 +-
> fs/overlayfs/copy_up.c | 20 +++
> fs/overlayfs/overlayfs.h | 19 +++
> fs/overlayfs/ovl_entry.h | 21 +++
> fs/overlayfs/super.c | 63 +++-----
> fs/overlayfs/sync.c | 338 +++++++++++++++++++++++++++++++++++++++
> fs/overlayfs/util.c | 15 ++
> 7 files changed, 440 insertions(+), 38 deletions(-)
> create mode 100644 fs/overlayfs/sync.c
>
> diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
> index 9164c585eb2f..0c4e53fcf4ef 100644
> --- a/fs/overlayfs/Makefile
> +++ b/fs/overlayfs/Makefile
> @@ -6,4 +6,4 @@
> obj-$(CONFIG_OVERLAY_FS) += overlay.o
>
> overlay-objs := super.o namei.o util.o inode.o file.o dir.o readdir.o \
> - copy_up.o export.o
> + copy_up.o export.o sync.o
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 66004534bd40..f760bd4f9bae 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -956,6 +956,10 @@ static bool ovl_open_need_copy_up(struct dentry *dentry, int flags)
> int ovl_maybe_copy_up(struct dentry *dentry, int flags)
> {
> int err = 0;
> + struct ovl_inode *oi = OVL_I(d_inode(dentry));
> + struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> + struct ovl_sync_info *si = ofs->si;
> + struct ovl_sync_entry *entry;
>
> if (ovl_open_need_copy_up(dentry, flags)) {
> err = ovl_want_write(dentry);
> @@ -965,6 +969,22 @@ int ovl_maybe_copy_up(struct dentry *dentry, int flags)
> }
> }
>
> + if (!err && ovl_open_flags_need_copy_up(flags) && !oi->sync_entry) {
> + entry = kmem_cache_zalloc(ovl_sync_entry_cachep, GFP_KERNEL);
> + if (IS_ERR(entry))
> + return PTR_ERR(entry);
> +
> + spin_lock_init(&entry->lock);
> + INIT_LIST_HEAD(&entry->list);
> + entry->upper = ovl_inode_upper(d_inode(dentry));
> + ihold(entry->upper);
> + oi->sync_entry = entry;
> +
> + spin_lock(&si->list_lock);
> + list_add_tail(&entry->list, &si->pending_list);
> + spin_unlock(&si->list_lock);
> + }
> +
> return err;
> }
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 76747f5b0517..d9cd2405c3a2 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -42,6 +42,11 @@ enum ovl_inode_flag {
> OVL_CONST_INO,
> };
>
> +enum ovl_sync_entry_flag {
> + /* Indicate the sync_entry should be reclaimed after syncing */
> + OVL_SYNC_RECLAIM_PENDING,
> +};
> +
> enum ovl_entry_flag {
> OVL_E_UPPER_ALIAS,
> OVL_E_OPAQUE,
> @@ -289,6 +294,9 @@ int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry);
> void ovl_set_flag(unsigned long flag, struct inode *inode);
> void ovl_clear_flag(unsigned long flag, struct inode *inode);
> bool ovl_test_flag(unsigned long flag, struct inode *inode);
> +void ovl_sync_set_flag(unsigned long flag, struct ovl_sync_entry *entry);
> +void ovl_sync_clear_flag(unsigned long flag, struct ovl_sync_entry *entry);
> +bool ovl_sync_test_flag(unsigned long flag, struct ovl_sync_entry *entry);
> bool ovl_inuse_trylock(struct dentry *dentry);
> void ovl_inuse_unlock(struct dentry *dentry);
> bool ovl_is_inuse(struct dentry *dentry);
> @@ -490,3 +498,14 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
>
> /* export.c */
> extern const struct export_operations ovl_export_operations;
> +
> +/* sync.c */
> +extern struct kmem_cache *ovl_sync_entry_cachep;
> +
> +void ovl_evict_inode(struct inode *inode);
> +int ovl_sync_fs(struct super_block *sb, int wait);
> +int ovl_sync_entry_create(struct ovl_fs *ofs, int bucket_bits);
> +int ovl_sync_info_init(struct ovl_fs *ofs);
> +void ovl_sync_info_destroy(struct ovl_fs *ofs);
> +int ovl_sync_entry_cache_init(void);
> +void ovl_sync_entry_cache_destroy(void);
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index a8f82fb7ffb4..22367b1ea83c 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -44,6 +44,23 @@ struct ovl_path {
> struct dentry *dentry;
> };
>
> +struct ovl_sync_entry {
> + spinlock_t lock;
> + unsigned long flags;
> + struct list_head list;
> + struct inode *upper;
> +};
> +
> +struct ovl_sync_info {
> + /* odd means in syncfs process */
> + atomic_t in_syncing;
> + atomic_t reclaim_count;
> + spinlock_t list_lock;
> + struct list_head pending_list;
> + struct list_head reclaim_list;
> + struct list_head waiting_list;
> +};
> +
> /* private information held for overlayfs's superblock */
> struct ovl_fs {
> struct vfsmount *upper_mnt;
> @@ -80,6 +97,8 @@ struct ovl_fs {
> atomic_long_t last_ino;
> /* Whiteout dentry cache */
> struct dentry *whiteout;
> + /* For organizing ovl_sync_entries which are used in syncfs */
> + struct ovl_sync_info *si;
> };
>
> static inline struct ovl_fs *OVL_FS(struct super_block *sb)
> @@ -120,6 +139,8 @@ struct ovl_inode {
>
> /* synchronize copy up and more */
> struct mutex lock;
> + /* pointing to relevant sync_entry */
> + struct ovl_sync_entry *sync_entry;
> };
>
> static inline struct ovl_inode *OVL_I(struct inode *inode)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index a88a7badf444..e1f010d0ebda 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -184,6 +184,7 @@ static struct inode *ovl_alloc_inode(struct super_block *sb)
> oi->lower = NULL;
> oi->lowerdata = NULL;
> mutex_init(&oi->lock);
> + oi->sync_entry = NULL;
>
> return &oi->vfs_inode;
> }
> @@ -241,6 +242,7 @@ static void ovl_free_fs(struct ovl_fs *ofs)
> kfree(ofs->config.redirect_mode);
> if (ofs->creator_cred)
> put_cred(ofs->creator_cred);
> + ovl_sync_info_destroy(ofs);
> kfree(ofs);
> }
>
> @@ -251,36 +253,6 @@ static void ovl_put_super(struct super_block *sb)
> ovl_free_fs(ofs);
> }
>
> -/* Sync real dirty inodes in upper filesystem (if it exists) */
> -static int ovl_sync_fs(struct super_block *sb, int wait)
> -{
> - struct ovl_fs *ofs = sb->s_fs_info;
> - struct super_block *upper_sb;
> - int ret;
> -
> - if (!ofs->upper_mnt)
> - return 0;
> -
> - /*
> - * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> - * All the super blocks will be iterated, including upper_sb.
> - *
> - * If this is a syncfs(2) call, then we do need to call
> - * sync_filesystem() on upper_sb, but enough if we do it when being
> - * called with wait == 1.
> - */
> - if (!wait)
> - return 0;
> -
> - upper_sb = ofs->upper_mnt->mnt_sb;
> -
> - down_read(&upper_sb->s_umount);
> - ret = sync_filesystem(upper_sb);
> - up_read(&upper_sb->s_umount);
> -
> - return ret;
> -}
> -
> /**
> * ovl_statfs
> * @sb: The overlayfs super block
> @@ -377,6 +349,7 @@ static const struct super_operations ovl_super_operations = {
> .free_inode = ovl_free_inode,
> .destroy_inode = ovl_destroy_inode,
> .drop_inode = generic_delete_inode,
> + .evict_inode = ovl_evict_inode,
> .put_super = ovl_put_super,
> .sync_fs = ovl_sync_fs,
> .statfs = ovl_statfs,
> @@ -1773,6 +1746,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> if (!ofs)
> goto out;
>
> + err = ovl_sync_info_init(ofs);
> + if (err)
> + goto out_err;
> +
> + err = -ENOMEM;
> ofs->creator_cred = cred = prepare_creds();
> if (!cred)
> goto out_err;
> @@ -1943,15 +1921,25 @@ static int __init ovl_init(void)
> return -ENOMEM;
>
> err = ovl_aio_request_cache_init();
> - if (!err) {
> - err = register_filesystem(&ovl_fs_type);
> - if (!err)
> - return 0;
> + if (err)
> + goto out_aio_req_cache;
>
> - ovl_aio_request_cache_destroy();
> - }
> - kmem_cache_destroy(ovl_inode_cachep);
> + err = ovl_sync_entry_cache_init();
> + if (err)
> + goto out_sync_entry_cache;
> +
> + err = register_filesystem(&ovl_fs_type);
> + if (err)
> + goto out_register_fs;
>
> + return 0;
> +
> +out_register_fs:
> + ovl_sync_entry_cache_destroy();
> +out_sync_entry_cache:
> + ovl_aio_request_cache_destroy();
> +out_aio_req_cache:
> + kmem_cache_destroy(ovl_inode_cachep);
> return err;
> }
>
> @@ -1966,6 +1954,7 @@ static void __exit ovl_exit(void)
> rcu_barrier();
> kmem_cache_destroy(ovl_inode_cachep);
> ovl_aio_request_cache_destroy();
> + ovl_sync_entry_cache_destroy();
> }
>
> module_init(ovl_init);
> diff --git a/fs/overlayfs/sync.c b/fs/overlayfs/sync.c
> new file mode 100644
> index 000000000000..36d9e0b543d7
> --- /dev/null
> +++ b/fs/overlayfs/sync.c
> @@ -0,0 +1,338 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 All Rights Reserved.
> + * Author Chengguang Xu <cgxu519@mykernel.net>
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/xattr.h>
> +#include <linux/mount.h>
> +#include <linux/writeback.h>
> +#include <linux/blkdev.h>
> +#include "overlayfs.h"
> +
> +/**
> + * ovl_sync_entry is used for making connection between ovl_inode and
> + * upper_inode, when opening a file with writable flags, a new ovl_sync_entry
> + * will be created and link to global pending list, when evicting ovl_inode,
> + * the relevant ovl_sync_entry will be moved to global reclaim list.
> + * In syncfs we iterate pending/reclaim list to sync every potential target
> + * to ensure all dirty data of overlayfs get synced.
> + *
> + * We reclaim ovl_sync_entry which is in OVL_SYNC_RECLAIM_PEDING state during
> + * ->syncfs() and ->evict() in case too much of resource comsumption.
> + *
> + * There are four kind of locks in in syncfs contexts:
> + *
> + * upper_sb->s_umount(semaphore) : avoiding r/o to r/w or vice versa
> + * sb->s_sync_lock(mutex) : avoiding concorrent syncfs running
> + * ofs->si->list_lock(spinlock) : protecting pending/reclaim/waiting lists
> + * sync_entry->lock(spinlock) : protecting sync_entry fields
> + *
> + * Lock dependencies in syncfs contexts:
> + *
> + * upper_sb->s_umount
> + * sb->s_sync_lock
> + * ofs->si->list_lock
> + * sync_entry->lock
> + */
> +
> +struct kmem_cache *ovl_sync_entry_cachep;
> +
> +int ovl_sync_entry_cache_init(void)
> +{
> + ovl_sync_entry_cachep = kmem_cache_create("ovl_sync_entry",
> + sizeof(struct ovl_sync_entry),
> + 0, SLAB_HWCACHE_ALIGN, NULL);
> + if (!ovl_sync_entry_cachep)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +void ovl_sync_entry_cache_destroy(void)
> +{
> + kmem_cache_destroy(ovl_sync_entry_cachep);
> +}
> +
> +int ovl_sync_info_init(struct ovl_fs *ofs)
> +{
> + struct ovl_sync_info *si;
> +
> + si = kmalloc(sizeof(struct ovl_sync_info), GFP_KERNEL);
> + if (!si)
> + return -ENOMEM;
> +
> + atomic_set(&si->in_syncing, 0);
> + atomic_set(&si->reclaim_count, 0);
> + spin_lock_init(&si->list_lock);
> + INIT_LIST_HEAD(&si->pending_list);
> + INIT_LIST_HEAD(&si->reclaim_list);
> + INIT_LIST_HEAD(&si->waiting_list);
> + ofs->si = si;
> + return 0;
> +}
> +
> +void ovl_sync_info_destroy(struct ovl_fs *ofs)
> +{
> + struct ovl_sync_info *si = ofs->si;
> + struct ovl_sync_entry *entry;
> + LIST_HEAD(tmp_list);
> +
> + if (!si)
> + return;
> +
> + spin_lock(&si->list_lock);
> + list_splice_init(&si->pending_list, &tmp_list);
> + list_splice_init(&si->reclaim_list, &tmp_list);
> + list_splice_init(&si->waiting_list, &tmp_list);
> + spin_unlock(&si->list_lock);
> +
> + while (!list_empty(&tmp_list)) {
> + entry = list_first_entry(&tmp_list,
> + struct ovl_sync_entry, list);
> + list_del(&entry->list);
> + iput(entry->upper);
> + kmem_cache_free(ovl_sync_entry_cachep, entry);
> + }
> +
> + kfree(si);
> + ofs->si = NULL;
> +}
> +
> +#define OVL_SYNC_BATCH_RECLAIM 10
> +void ovl_evict_inode(struct inode *inode)
> +{
> + struct ovl_fs *ofs = OVL_FS(inode->i_sb);
> + struct ovl_inode *oi = OVL_I(inode);
> + struct ovl_sync_entry *entry = oi->sync_entry;
> + struct ovl_sync_info *si;
> + struct inode *upper_inode;
> + unsigned long scan_count = OVL_SYNC_BATCH_RECLAIM;
> +
> + if (!ofs)
> + goto out;
> +
> + si = ofs->si;
> + spin_lock(&si->list_lock);
> + if (entry) {
> + oi->sync_entry = NULL;
> + atomic_inc(&si->reclaim_count);
> + spin_lock(&entry->lock);
> + ovl_sync_set_flag(OVL_SYNC_RECLAIM_PENDING, entry);
> + spin_unlock(&entry->lock);
> +
> + /* syncfs is running, avoid manipulating list */
> + if (atomic_read(&si->in_syncing) % 2)
> + goto out2;
> +
> + list_move_tail(&entry->list, &si->reclaim_list);
> + while (scan_count) {
> + if (list_empty(&si->reclaim_list))
> + break;
> +
> + entry = list_first_entry(&si->reclaim_list,
> + struct ovl_sync_entry, list);
> + upper_inode = entry->upper;
> + scan_count--;
> + if (upper_inode->i_state & I_DIRTY_ALL ||
> + mapping_tagged(upper_inode->i_mapping,
> + PAGECACHE_TAG_WRITEBACK)) {
> + list_move_tail(&entry->list, &si->reclaim_list);
> + } else {
> + list_del_init(&entry->list);
> + spin_unlock(&si->list_lock);
> + atomic_dec(&si->reclaim_count);
> + iput(upper_inode);
> + kmem_cache_free(ovl_sync_entry_cachep, entry);
> + spin_lock(&si->list_lock);
> + if (atomic_read(&si->in_syncing) % 2)
> + goto out2;
> + }
> + }
> +
> + while (scan_count) {
> + if (list_empty(&si->pending_list))
> + break;
> +
> + entry = list_first_entry(&si->pending_list,
> + struct ovl_sync_entry, list);
> + scan_count--;
> + spin_lock(&entry->lock);
> + if (ovl_sync_test_flag(OVL_SYNC_RECLAIM_PENDING, entry))
> + list_move_tail(&entry->list, &si->reclaim_list);
> + spin_unlock(&entry->lock);
> + }
> + }
> +out2:
> + spin_unlock(&si->list_lock);
> +out:
> + clear_inode(inode);
> +}
> +
> +/**
> + * ovl_sync_info_inodes
> + * @sb: The overlayfs super block
> + *
> + * Looking for dirty upper inode by iterating pending/reclaim list.
> + */
> +static void ovl_sync_info_inodes(struct super_block *sb)
> +{
> + struct ovl_fs *ofs = sb->s_fs_info;
> + struct ovl_sync_info *si = ofs->si;
> + struct ovl_sync_entry *entry;
> + struct blk_plug plug;
> + LIST_HEAD(sync_pending_list);
> + LIST_HEAD(sync_waiting_list);
> +
> + struct writeback_control wbc_for_sync = {
> + .sync_mode = WB_SYNC_ALL,
> + .for_sync = 1,
> + .range_start = 0,
> + .range_end = LLONG_MAX,
> + .nr_to_write = LONG_MAX,
> + };
> +
> + blk_start_plug(&plug);
> + spin_lock(&si->list_lock);
> + atomic_inc(&si->in_syncing);
> + list_splice_init(&si->pending_list, &sync_pending_list);
> + list_splice_init(&si->reclaim_list, &sync_pending_list);
> + spin_unlock(&si->list_lock);
> +
> + while (!list_empty(&sync_pending_list)) {
> + entry = list_first_entry(&sync_pending_list,
> + struct ovl_sync_entry, list);
> + sync_inode(entry->upper, &wbc_for_sync);
> + list_move_tail(&entry->list, &sync_waiting_list);
> + cond_resched();
> + }
> +
> + blk_finish_plug(&plug);
> + spin_lock(&si->list_lock);
> + list_splice_init(&sync_waiting_list, &si->waiting_list);
> + spin_unlock(&si->list_lock);
> +}
> +
> +/**
> + * ovl_wait_inodes
> + * @sb: The overlayfs super block
> + *
> + * Waiting writeback inodes which are in waiting list,
> + * all the IO that has been issued up to the time this
> + * function is enter is guaranteed to be completed.
> + */
> +static void ovl_wait_inodes(struct super_block *sb)
> +{
> + struct ovl_fs *ofs = sb->s_fs_info;
> + struct ovl_sync_info *si = ofs->si;
> + struct ovl_sync_entry *entry;
> + struct inode *upper_inode;
> + bool should_free = false;
> + LIST_HEAD(sync_pending_list);
> + LIST_HEAD(sync_waiting_list);
> +
> + /*
> + * Splice the global waiting list onto a temporary list to avoid
> + * waiting on inodes that have started writeback after this point.
> + */
> + spin_lock(&si->list_lock);
> + list_splice_init(&si->waiting_list, &sync_waiting_list);
> + spin_unlock(&si->list_lock);
> +
> + while (!list_empty(&sync_waiting_list)) {
> + entry = list_first_entry(&sync_waiting_list,
> + struct ovl_sync_entry, list);
> + upper_inode = entry->upper;
> + if (mapping_tagged(upper_inode->i_mapping,
> + PAGECACHE_TAG_WRITEBACK)) {
> + filemap_fdatawait_keep_errors(upper_inode->i_mapping);
> + cond_resched();
> + }
> +
> + spin_lock(&entry->lock);
> + if (ovl_sync_test_flag(OVL_SYNC_RECLAIM_PENDING, entry))
> + should_free = true;
> + spin_unlock(&entry->lock);
> +
> + if (should_free) {
> + list_del(&entry->list);
> + atomic_dec(&si->reclaim_count);
> + iput(entry->upper);
> + kmem_cache_free(ovl_sync_entry_cachep, entry);
> + } else {
> + list_move_tail(&entry->list, &sync_pending_list);
> + }
> + }
> +
> + spin_lock(&si->list_lock);
> + list_splice_init(&sync_pending_list, &si->pending_list);
> + atomic_inc(&si->in_syncing);
> + spin_unlock(&si->list_lock);
> +}
> +
> +/**
> + * ovl_sync_info_filesystem
> + * @sb: The overlayfs super block
> + *
> + * Sync underlying dirty inodes in upper filesystem and
> + * wait for completion.
> + */
> +static int ovl_sync_filesystem(struct super_block *sb)
> +{
> + struct ovl_fs *ofs = sb->s_fs_info;
> + struct super_block *upper_sb = ofs->upper_mnt->mnt_sb;
> + int ret = 0;
> +
> + down_read(&upper_sb->s_umount);
> + if (!sb_rdonly(upper_sb)) {
> + /*
> + * s_sync_lock is used for serializing concurrent
> + * syncfs operations.
> + */
> + mutex_lock(&sb->s_sync_lock);
> + ovl_sync_info_inodes(sb);
> + /* Calling sync_fs with no-wait for better performance. */
> + if (upper_sb->s_op->sync_fs)
> + upper_sb->s_op->sync_fs(upper_sb, 0);
> +
> + ovl_wait_inodes(sb);
> + if (upper_sb->s_op->sync_fs)
> + upper_sb->s_op->sync_fs(upper_sb, 1);
> +
> + ret = sync_blockdev(upper_sb->s_bdev);
> + mutex_unlock(&sb->s_sync_lock);
> + }
> + up_read(&upper_sb->s_umount);
> + return ret;
> +}
> +
> +/**
> + * ovl_sync_fs
> + * @sb: The overlayfs super block
> + * @wait: Wait for I/O to complete
> + *
> + * Sync real dirty inodes in upper filesystem (if it exists)
> + */
> +int ovl_sync_fs(struct super_block *sb, int wait)
> +{
> + struct ovl_fs *ofs = sb->s_fs_info;
> + int ret;
> +
> + if (!ofs->upper_mnt)
> + return 0;
> +
> + /*
> + * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> + * All the super blocks will be iterated, including upper_sb.
> + *
> + * If this is a syncfs(2) call, then we do need to call
> + * sync_filesystem() on upper_sb, but enough if we do it when being
> + * called with wait == 1.
> + */
> + if (!wait)
> + return 0;
> +
> + ret = ovl_sync_filesystem(sb);
> + return ret;
> +}
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 01755bc186c9..e99634fab602 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -602,6 +602,21 @@ bool ovl_test_flag(unsigned long flag, struct inode *inode)
> return test_bit(flag, &OVL_I(inode)->flags);
> }
>
> +void ovl_sync_set_flag(unsigned long flag, struct ovl_sync_entry *entry)
> +{
> + set_bit(flag, &entry->flags);
> +}
> +
> +void ovl_sync_clear_flag(unsigned long flag, struct ovl_sync_entry *entry)
> +{
> + clear_bit(flag, &entry->flags);
> +}
> +
> +bool ovl_sync_test_flag(unsigned long flag, struct ovl_sync_entry *entry)
> +{
> + return test_bit(flag, &entry->flags);
> +}
> +
> /**
> * Caller must hold a reference to inode to prevent it from being freed while
> * it is marked inuse.
>
next prev parent reply other threads:[~2020-08-31 14:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-06 9:53 [PATCH v12] ovl: improve syncfs efficiency Chengguang Xu
2020-05-20 1:01 ` cgxu
2020-05-20 7:24 ` Amir Goldstein
2020-05-22 9:31 ` Miklos Szeredi
2020-05-22 13:44 ` Vivek Goyal
2020-05-22 14:40 ` Giuseppe Scrivano
2020-05-26 7:50 ` cgxu
2020-05-26 8:25 ` Amir Goldstein
2020-08-31 14:22 ` cgxu [this message]
2020-08-31 14:58 ` Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9165be8f-f125-bd1f-498e-46004f5a845e@mykernel.net \
--to=cgxu519@mykernel.net \
--cc=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).