From: Chengguang Xu <cgxu519@mykernel.net>
To: "miklos" <miklos@szeredi.hu>
Cc: "linux-unionfs" <linux-unionfs@vger.kernel.org>,
"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
"linux-kernel" <linux-kernel@vger.kernel.org>,
"Chengguang Xu" <charliecgxu@tencent.com>,
"ronyjin" <ronyjin@tencent.com>, "amir73il" <amir73il@gmail.com>,
"jack" <jack@suse.cz>
Subject: 回复:[RFC PATCH V6 0/7] implement containerized syncfs for overlayfs
Date: Sat, 27 Nov 2021 17:26:33 +0800 [thread overview]
Message-ID: <17d60b7bbc2.caee608a13298.8366222634423039066@mykernel.net> (raw)
In-Reply-To: <20211122030038.1938875-1-cgxu519@mykernel.net>
---- 在 星期一, 2021-11-22 11:00:31 Chengguang Xu <cgxu519@mykernel.net> 撰写 ----
> From: Chengguang Xu <charliecgxu@tencent.com>
>
> Current syncfs(2) syscall on overlayfs just calls sync_filesystem()
> on upper_sb to synchronize whole dirty inodes in upper filesystem
> regardless of the overlay ownership of the inode. In the use case of
> container, when multiple containers using the same underlying upper
> filesystem, it has some shortcomings as below.
>
> (1) Performance
> Synchronization is probably heavy because it actually syncs unnecessary
> inodes for target overlayfs.
>
> (2) Interference
> Unplanned synchronization will probably impact IO performance of
> unrelated container processes on the other overlayfs.
>
> This series try to implement containerized syncfs for overlayfs so that
> only sync target dirty upper inodes which are belong to specific overlayfs
> instance. By doing this, it is able to reduce cost of synchronization and
> will not seriously impact IO performance of unrelated processes.
>
> v1->v2:
> - Mark overlayfs' inode dirty itself instead of adding notification
> mechanism to vfs inode.
>
> v2->v3:
> - Introduce overlayfs' extra syncfs wait list to wait target upper inodes
> in ->sync_fs.
>
> v3->v4:
> - Using wait_sb_inodes() to wait syncing upper inodes.
> - Mark overlay inode dirty only when having upper inode and VM_SHARED
> flag in ovl_mmap().
> - Check upper i_state after checking upper mmap state
> in ovl_write_inode.
>
> v4->v5:
> - Add underlying inode dirtiness check after mnt_drop_write().
> - Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs().
>
> v5->v6:
> - Rebase to latest overlayfs-next tree.
> - Mark oerlay inode dirty when it has upper instead of marking dirty on
> modification.
> - Trigger dirty page writeback in overlayfs' ->write_inode().
> - Mark overlay inode 'DONTCACHE' flag.
> - Delete overlayfs' ->writepages() and ->evict_inode() operations.
Hi Miklos,
Have you got time to have a look at this V6 series? I think this version has already fixed
the issues in previous feedbacks of you guys and passed fstests (generic/overlay cases).
I did some stress long time tests (tar & syncfs & diff on w/wo copy-up) and found no obvious problem.
For syncfs time with 1M clean upper inodes, there was extra 1.3s wasted on waiting scheduling.
I guess this 1.3s will not bring significant impact to container instance in most cases, I also
agree with Jack that we can start with this approach and do some improvements afterwards if there is
complain from any real users.
Thanks,
Chengguang
>
> Chengguang Xu (7):
> ovl: setup overlayfs' private bdi
> ovl: mark overlayfs inode dirty when it has upper
> ovl: implement overlayfs' own ->write_inode operation
> ovl: set 'DONTCACHE' flag for overlayfs inode
> fs: export wait_sb_inodes()
> ovl: introduce ovl_sync_upper_blockdev()
> ovl: implement containerized syncfs for overlayfs
>
> fs/fs-writeback.c | 3 ++-
> fs/overlayfs/inode.c | 5 +++-
> fs/overlayfs/super.c | 49 ++++++++++++++++++++++++++++++++-------
> fs/overlayfs/util.c | 1 +
> include/linux/writeback.h | 1 +
> 5 files changed, 48 insertions(+), 11 deletions(-)
>
> --
> 2.27.0
>
>
prev parent reply other threads:[~2021-11-27 9:29 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-22 3:00 [RFC PATCH V6 0/7] implement containerized syncfs for overlayfs Chengguang Xu
2021-11-22 3:00 ` [RFC PATCH V6 1/7] ovl: setup overlayfs' private bdi Chengguang Xu
2021-11-26 8:51 ` Jan Kara
2021-11-22 3:00 ` [RFC PATCH V6 2/7] ovl: mark overlayfs inode dirty when it has upper Chengguang Xu
2021-11-26 9:10 ` Jan Kara
2021-11-26 13:06 ` Chengguang Xu
2021-11-26 14:32 ` Jan Kara
2021-11-22 3:00 ` [RFC PATCH V6 3/7] ovl: implement overlayfs' own ->write_inode operation Chengguang Xu
2021-11-26 9:14 ` Jan Kara
2021-11-26 13:09 ` Chengguang Xu
2021-11-22 3:00 ` [RFC PATCH V6 4/7] ovl: set 'DONTCACHE' flag for overlayfs inode Chengguang Xu
2021-11-26 9:20 ` Jan Kara
2021-11-22 3:00 ` [RFC PATCH V6 5/7] fs: export wait_sb_inodes() Chengguang Xu
2021-11-26 9:20 ` Jan Kara
2021-11-22 3:00 ` [RFC PATCH V6 6/7] ovl: introduce ovl_sync_upper_blockdev() Chengguang Xu
2021-11-26 9:21 ` Jan Kara
2021-11-22 3:00 ` [RFC PATCH V6 7/7] ovl: implement containerized syncfs for overlayfs Chengguang Xu
2021-11-22 7:40 ` Amir Goldstein
2021-11-26 5:03 ` Chengguang Xu
2021-11-26 9:25 ` Jan Kara
2021-11-27 9:26 ` Chengguang Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17d60b7bbc2.caee608a13298.8366222634423039066@mykernel.net \
--to=cgxu519@mykernel.net \
--cc=amir73il@gmail.com \
--cc=charliecgxu@tencent.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=ronyjin@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).