linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Chengguang Xu <cgxu519@mykernel.net>
Cc: Jan Kara <jack@suse.cz>, miklos <miklos@szeredi.hu>,
	linux-unionfs <linux-unionfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
Date: Thu, 15 Oct 2020 09:11:04 +0300	[thread overview]
Message-ID: <CAOQ4uxhOrPfJxhJ1g7eSSdO4=giFJabCCOvJL7dSo1R9VsZozA@mail.gmail.com> (raw)
In-Reply-To: <1752a360692.e4f6555543384.3080516622688985279@mykernel.net>

On Thu, Oct 15, 2020 at 6:03 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期四, 2020-10-15 00:15:38 Jan Kara <jack@suse.cz> 撰写 ----
>  > On Sat 10-10-20 22:23:51, Chengguang Xu wrote:
>  > > Currently there is no notification api for kernel about modification
>  > > of vfs inode, in some use cases like overlayfs, this kind of notification
>  > > will be very helpful to implement containerized syncfs functionality.
>  > > As the first attempt, we introduce marking inode dirty notification so that
>  > > overlay's inode could mark itself dirty as well and then only sync dirty
>  > > overlay inode while syncfs.
>  > >
>  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
>  >
>  > So I like how the patch set is elegant however growing struct inode for
>  > everybody by struct blocking_notifier_head (which is rwsem + pointer) is
>  > rather harsh just for this overlayfs functionality... Ideally this should
>  > induce no overhead on struct inode if the filesystem is not covered by
>  > overlayfs. So I'd rather place some external structure into the superblock
>  > that would get allocated on the first use that would track dirty notifications
>  > for each inode. I would not generalize the code for more possible
>  > notifications at this point.
>  >
>  > Also now that I'm thinking about it can there be multiple overlayfs inodes
>  > for one upper inode? If not, then the mechanism of dirtiness propagation
>
> One upper inode only belongs to one overlayfs inode. However, in practice
> one upper fs may contains hundreds or even thousands of overlayfs instances.
>
>  > could be much simpler - it seems we could be able to just lookup
>  > corresponding overlayfs inode based on upper inode and then mark it dirty
>  > (but this would need to be verified by people more familiar with
>  > overlayfs). So all we'd need to know for this is the superblock of the
>  > overlayfs that's covering given upper filesystem...
>  >
>
> So the difficulty is how we get overlayfs inode efficiently from upper inode,
> it seems if we don't have additional info of upper inode to indicate which
> overlayfs it belongs to,  then the lookup of corresponding overlayfs inode
> will be quite expensive and probably impact write performance.
>
> Is that possible we extend inotify infrastructure slightly to notify both
> user space and kernel component?
>
>

When I first saw your suggestion, that is what I was thinking.
Add event fsnotify_dirty_inode(), since the pub-sub infrastructure
in struct inode already exists.

But I have to admit this approach seems like a massive overkill to
what you need.

What you implemented, tracks upper inodes that could have
been dirtied under overlayfs, but what you really want is to
track is upper inodes that were dirtied *by* overlayfs.

And for that purpose, as Miklos said several times, it would be best
pursue the overlayfs aops approach, even though it may be much
harder..

Your earlier patches maintained a list of overlayfs to be synced inodes.
Remind me what was wrong with that approach?

Perhaps you can combine that with the shadow overlay sbi approach.
Instead of dirtying overlay inode when underlying is dirtied, you can
"pre-dirty" overlayfs inode in higher level file ops and add them to the
"maybe-dirty" list (e.g. after write).

ovl_sync_fs() can iterate the maybe-dirty list and re-dirty overlay inodes
if the underlying inode is still dirty on the (!wait) pass.

As for memory mapped inodes via overlayfs (which can be dirtied without
notifying overlayfs) I am not sure that is a big problem in practice.

When an inode is writably mapped via ovarlayfs, you can flag the
overlay inode with "maybe-writably-mapped" and then remove
it from the maybe dirty list when the underlying inode is not dirty
AND its i_writecount is 0 (checked on write_inode() and release()).

Actually, there is no reason to treat writably mapped inodes and
other dirty inodes differently - insert to suspect list on open for
write, remove from suspect list on last release() or write_inode()
when inode is no longer dirty and writable.

Did I miss anything?

Amir.

  reply	other threads:[~2020-10-15  6:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15   ` Jan Kara
2020-10-15  3:03     ` Chengguang Xu
2020-10-15  6:11       ` Amir Goldstein [this message]
2020-10-15 11:29         ` Chengguang Xu
2020-10-15 12:32           ` Amir Goldstein
2020-10-15 13:13             ` Chengguang Xu
2020-10-15 16:02               ` Amir Goldstein
2020-10-15 16:06                 ` Amir Goldstein
2020-10-16  1:56                 ` Chengguang Xu
2020-10-16  4:39                   ` Amir Goldstein
2020-10-16  7:43                     ` Chengguang Xu
2020-10-15  3:25   ` Al Viro
2020-10-15  3:42     ` Chengguang Xu
2020-10-15  4:57       ` Al Viro
2020-10-15 10:56         ` Chengguang Xu
2020-10-16  7:09         ` Chengguang Xu
2020-10-16  9:22           ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhOrPfJxhJ1g7eSSdO4=giFJabCCOvJL7dSo1R9VsZozA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=cgxu519@mykernel.net \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).