From: Chengguang Xu <cgxu519@mykernel.net>
To: "Jan Kara" <jack@suse.cz>
Cc: "Amir Goldstein" <amir73il@gmail.com>,
"miklos" <miklos@szeredi.hu>,
"linux-unionfs" <linux-unionfs@vger.kernel.org>,
"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
"charliecgxu" <charliecgxu@tencent.com>
Subject: Re: [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap
Date: Fri, 06 Nov 2020 17:47:55 +0800 [thread overview]
Message-ID: <1759cf492c8.11cac446f12251.3388484787199140990@mykernel.net> (raw)
In-Reply-To: <20201106085023.GA25479@quack2.suse.cz>
---- 在 星期五, 2020-11-06 16:50:23 Jan Kara <jack@suse.cz> 撰写 ----
> On Fri 06-11-20 10:41:44, Chengguang Xu wrote:
> > ---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@suse.cz> 撰写 ----
> > > On Thu 05-11-20 16:21:27, Amir Goldstein wrote:
> > > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote:
> > > > > > ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@suse.cz> 撰写 ----
> > > > > > > On Sun 25-10-20 11:41:14, Chengguang Xu wrote:
> > > > > > > > Overlayfs cannot be notified when mmapped area gets dirty,
> > > > > > > > so we need to proactively mark inode dirty in ->mmap operation.
> > > > > > > >
> > > > > > > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> > > > > > > > ---
> > > > > > > > fs/overlayfs/file.c | 4 ++++
> > > > > > > > 1 file changed, 4 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > > > > > index efccb7c1f9bc..cd6fcdfd81a9 100644
> > > > > > > > --- a/fs/overlayfs/file.c
> > > > > > > > +++ b/fs/overlayfs/file.c
> > > > > > > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > > > > /* Drop reference count from new vm_file value */
> > > > > > > > fput(realfile);
> > > > > > > > } else {
> > > > > > > > + if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) &&
> > > > > > > > + vma->vm_flags & (VM_WRITE|VM_MAYWRITE))
> > > > > > > > + ovl_mark_inode_dirty(file_inode(file));
> > > > > > > > +
> > > > > > >
> > > > > > > But does this work reliably? I mean once writeback runs, your inode (as
> > > > > > > well as upper inode) is cleaned. Then a page fault comes so file has dirty
> > > > > > > pages again and would need flushing but overlayfs inode stays clean? Am I
> > > > > > > missing something?
> > > > > > >
> > > > > >
> > > > > > Yeah, this is key point of this approach, in order to fix the issue I
> > > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i
> > > > > > mean is during writeback we will call into ->write_inode() by this
> > > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and
> > > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode().
> > > > > >
> > > > > > if (mapping_writably_mapped(upper->i_mapping) ||
> > > > > > mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
> > > > > > iflag |= I_DIRTY_PAGES;
> > > > >
> > > > > OK, but suppose the upper mapping is clean at this moment (upper inode has
> > > > > been fully written out for whatever reason, but it is still mapped) so your
> > > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which
> > > > > would make your overlayfs inode dirty again when a write to mmap happens,
> > > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag.
> > > > >
> > > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then
> > > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm
> > > > > / writeback subsystem.
> > > > >
> > > >
> > > > Perhaps I will add some background.
> > > >
> > > > What I suggested was to maintain a "suspect list" in addition to
> > > > the dirty ovl inodes.
> > > >
> > > > ovl inode is added to the suspect list on mmap (writable) and removed
> > > > from the suspect list on release() flush() or on sync_fs() if real inode is no
> > > > longer writably mapped.
> > > >
> > > > There was another variant where ovl inode is added to suspect list on open
> > > > for write and removed from suspect list on release() flush() or sync_fs()
> > > > if real inode is not inode_is_open_for_write().
> > > >
> > > > In both cases the list will have inodes whose real is not dirty, but
> > > > in both cases
> > > > the list shouldn't be terribly large to traverse on sync_fs().
> > > >
> > > > Chengguang tried to implement the idea without an actual list by
> > > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have
> > > > no idea if his idea works.
> > > >
> > > > I think we can resort to using an actual suspect list if you say that it
> > > > cannot work like this?
> > >
> > > Yeah, the suspect list (i.e., additional list of inodes to check on sync)
> > > you describe should work fine.
> >
> > I think this solution still has the problem we have met in below thread[1]
> > The main problem is the state combination of clean overlayfs' inode && dirty upper inode.
>
> But I think the scheme Amir proposed and I detailed in my previous email
> should prevent that state. Because while the inode is mapped, it will be
> kept in the dirty list. So which scenario do you think would lead to clean
> overlayfs inode and dirty upper inode?
If keeping in the dirty list means making overlayfs inode dirty, then
I think we don't need extra list for that, vfs itself has writeback list and
the solution will be exactly the same as mine(re-dirty) . Right?
>
> > [1] https://www.spinics.net/lists/linux-unionfs/msg07448.html
> >
> > > Also the "keep suspect inode dirty" idea
> > > of Chengguang could work fine but we'd have to use something like
> > > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need
> > > to be implemented but it should be easy vma_interval_tree_foreach() walk
> > > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether
> > > inode should be redirtied or not.
> > >
> >
> > I'm curious that isn't it enough to check i_mmap_writable by
> > mapping_writably_mapped() ? Am I missing something?
>
> What is i_mmap_writeable? I've grepped the tree and didn't find anything
> like that...
>
Maybe spelling mistake? The reason I check this is I'm afraid of the permission change of vma by mprotect(2).
Thanks,
Chenguang
next prev parent reply other threads:[~2020-11-06 9:48 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-25 3:41 [RFC PATCH v2 0/8] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 1/8] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 2/8] ovl: implement ->writepages operation Chengguang Xu
2020-11-02 17:17 ` Jan Kara
2020-11-04 12:18 ` Chengguang Xu
2020-11-05 13:55 ` Jan Kara
2020-11-06 5:57 ` Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 3/8] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 4/8] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap Chengguang Xu
2020-11-02 17:30 ` Jan Kara
2020-11-04 11:54 ` Chengguang Xu
2020-11-05 14:03 ` Jan Kara
2020-11-05 14:21 ` Amir Goldstein
2020-11-05 15:54 ` Jan Kara
2020-11-06 2:41 ` Chengguang Xu
2020-11-06 8:50 ` Jan Kara
2020-11-06 9:47 ` Chengguang Xu [this message]
2020-10-25 3:41 ` [RFC PATCH v2 6/8] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 7/8] ovl: cache dirty overlayfs' inode Chengguang Xu
2020-10-25 3:41 ` [RFC PATCH v2 8/8] ovl: implement containerized syncfs for overlayfs Chengguang Xu
2020-10-30 15:46 ` [RFC PATCH v2 0/8] " Miklos Szeredi
2020-10-31 12:22 ` Chengguang Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1759cf492c8.11cac446f12251.3388484787199140990@mykernel.net \
--to=cgxu519@mykernel.net \
--cc=amir73il@gmail.com \
--cc=charliecgxu@tencent.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).