linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chengguang Xu <cgxu519@mykernel.net>
To: "Jan Kara" <jack@suse.cz>
Cc: "Amir Goldstein" <amir73il@gmail.com>,
	"miklos" <miklos@szeredi.hu>,
	"linux-unionfs" <linux-unionfs@vger.kernel.org>,
	"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
	"charliecgxu" <charliecgxu@tencent.com>
Subject: Re: [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap
Date: Fri, 06 Nov 2020 10:41:44 +0800	[thread overview]
Message-ID: <1759b6e6328.fdde3abc11178.4917086206975298767@mykernel.net> (raw)
In-Reply-To: <20201105155434.GI32718@quack2.suse.cz>

 ---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@suse.cz> 撰写 ----
 > On Thu 05-11-20 16:21:27, Amir Goldstein wrote:
 > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@suse.cz> wrote:
 > > >
 > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote:
 > > > >  ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@suse.cz> 撰写 ----
 > > > >  > On Sun 25-10-20 11:41:14, Chengguang Xu wrote:
 > > > >  > > Overlayfs cannot be notified when mmapped area gets dirty,
 > > > >  > > so we need to proactively mark inode dirty in ->mmap operation.
 > > > >  > >
 > > > >  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > > > >  > > ---
 > > > >  > >  fs/overlayfs/file.c | 4 ++++
 > > > >  > >  1 file changed, 4 insertions(+)
 > > > >  > >
 > > > >  > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
 > > > >  > > index efccb7c1f9bc..cd6fcdfd81a9 100644
 > > > >  > > --- a/fs/overlayfs/file.c
 > > > >  > > +++ b/fs/overlayfs/file.c
 > > > >  > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 > > > >  > >          /* Drop reference count from new vm_file value */
 > > > >  > >          fput(realfile);
 > > > >  > >      } else {
 > > > >  > > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) &&
 > > > >  > > +            vma->vm_flags & (VM_WRITE|VM_MAYWRITE))
 > > > >  > > +            ovl_mark_inode_dirty(file_inode(file));
 > > > >  > > +
 > > > >  >
 > > > >  > But does this work reliably? I mean once writeback runs, your inode (as
 > > > >  > well as upper inode) is cleaned. Then a page fault comes so file has dirty
 > > > >  > pages again and would need flushing but overlayfs inode stays clean? Am I
 > > > >  > missing something?
 > > > >  >
 > > > >
 > > > > Yeah, this is key point of this approach, in order to  fix the issue I
 > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i
 > > > > mean is during writeback we will call into ->write_inode() by this
 > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and
 > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode().
 > > > >
 > > > >     if (mapping_writably_mapped(upper->i_mapping) ||
 > > > >          mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > > > >                  iflag |= I_DIRTY_PAGES;
 > > >
 > > > OK, but suppose the upper mapping is clean at this moment (upper inode has
 > > > been fully written out for whatever reason, but it is still mapped) so your
 > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which
 > > > would make your overlayfs inode dirty again when a write to mmap happens,
 > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag.
 > > >
 > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then
 > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm
 > > > / writeback subsystem.
 > > >
 > > 
 > > Perhaps I will add some background.
 > > 
 > > What I suggested was to maintain a "suspect list" in addition to
 > > the dirty ovl inodes.
 > > 
 > > ovl inode is added to the suspect list on mmap (writable) and removed
 > > from the suspect list on release() flush() or on sync_fs() if real inode is no
 > > longer writably mapped.
 > > 
 > > There was another variant where ovl inode is added to suspect list on open
 > > for write and removed from suspect list on release() flush() or sync_fs()
 > > if real inode is not inode_is_open_for_write().
 > > 
 > > In both cases the list will have inodes whose real is not dirty, but
 > > in both cases
 > > the list shouldn't be terribly large to traverse on sync_fs().
 > > 
 > > Chengguang tried to implement the idea without an actual list by
 > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have
 > > no idea if his idea works.
 > > 
 > > I think we can resort to using an actual suspect list if you say that it
 > > cannot work like this?
 > 
 > Yeah, the suspect list (i.e., additional list of inodes to check on sync)
 > you describe should work fine. 

I think this solution still has the problem we have met in below thread[1]
The main problem is the state combination of clean overlayfs' inode && dirty upper inode.
 
[1] https://www.spinics.net/lists/linux-unionfs/msg07448.html

 > Also the "keep suspect inode dirty" idea
 > of Chengguang could work fine but we'd have to use something like
 > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need
 > to be implemented but it should be easy vma_interval_tree_foreach() walk
 > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether
 > inode should be redirtied or not.
 > 

I'm curious that isn't  it enough to check  i_mmap_writable by mapping_writably_mapped() ?
Am I missing something?


Thanks,
Chengguang

  reply	other threads:[~2020-11-06  2:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-25  3:41 [RFC PATCH v2 0/8] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 1/8] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 2/8] ovl: implement ->writepages operation Chengguang Xu
2020-11-02 17:17   ` Jan Kara
2020-11-04 12:18     ` Chengguang Xu
2020-11-05 13:55       ` Jan Kara
2020-11-06  5:57         ` Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 3/8] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 4/8] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap Chengguang Xu
2020-11-02 17:30   ` Jan Kara
2020-11-04 11:54     ` Chengguang Xu
2020-11-05 14:03       ` Jan Kara
2020-11-05 14:21         ` Amir Goldstein
2020-11-05 15:54           ` Jan Kara
2020-11-06  2:41             ` Chengguang Xu [this message]
2020-11-06  8:50               ` Jan Kara
2020-11-06  9:47                 ` Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 6/8] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 7/8] ovl: cache dirty overlayfs' inode Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 8/8] ovl: implement containerized syncfs for overlayfs Chengguang Xu
2020-10-30 15:46 ` [RFC PATCH v2 0/8] " Miklos Szeredi
2020-10-31 12:22   ` Chengguang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1759b6e6328.fdde3abc11178.4917086206975298767@mykernel.net \
    --to=cgxu519@mykernel.net \
    --cc=amir73il@gmail.com \
    --cc=charliecgxu@tencent.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).