Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Chengguang Xu <cgxu519@mykernel.net>
To: "Jan Kara" <jack@suse.cz>
Cc: "Amir Goldstein" <amir73il@gmail.com>,
	"miklos" <miklos@szeredi.hu>,
	"linux-unionfs" <linux-unionfs@vger.kernel.org>,
	"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
	"charliecgxu" <charliecgxu@tencent.com>
Subject: Re: [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap
Date: Fri, 06 Nov 2020 17:47:55 +0800
Message-ID: <1759cf492c8.11cac446f12251.3388484787199140990@mykernel.net> (raw)
In-Reply-To: <20201106085023.GA25479@quack2.suse.cz>

 ---- 在 星期五, 2020-11-06 16:50:23 Jan Kara <jack@suse.cz> 撰写 ----
 > On Fri 06-11-20 10:41:44, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > On Thu 05-11-20 16:21:27, Amir Goldstein wrote:
 > >  > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@suse.cz> wrote:
 > >  > > >
 > >  > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote:
 > >  > > > >  ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@suse.cz> 撰写 ----
 > >  > > > >  > On Sun 25-10-20 11:41:14, Chengguang Xu wrote:
 > >  > > > >  > > Overlayfs cannot be notified when mmapped area gets dirty,
 > >  > > > >  > > so we need to proactively mark inode dirty in ->mmap operation.
 > >  > > > >  > >
 > >  > > > >  > > Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
 > >  > > > >  > > ---
 > >  > > > >  > >  fs/overlayfs/file.c | 4 ++++
 > >  > > > >  > >  1 file changed, 4 insertions(+)
 > >  > > > >  > >
 > >  > > > >  > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
 > >  > > > >  > > index efccb7c1f9bc..cd6fcdfd81a9 100644
 > >  > > > >  > > --- a/fs/overlayfs/file.c
 > >  > > > >  > > +++ b/fs/overlayfs/file.c
 > >  > > > >  > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 > >  > > > >  > >          /* Drop reference count from new vm_file value */
 > >  > > > >  > >          fput(realfile);
 > >  > > > >  > >      } else {
 > >  > > > >  > > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) &&
 > >  > > > >  > > +            vma->vm_flags & (VM_WRITE|VM_MAYWRITE))
 > >  > > > >  > > +            ovl_mark_inode_dirty(file_inode(file));
 > >  > > > >  > > +
 > >  > > > >  >
 > >  > > > >  > But does this work reliably? I mean once writeback runs, your inode (as
 > >  > > > >  > well as upper inode) is cleaned. Then a page fault comes so file has dirty
 > >  > > > >  > pages again and would need flushing but overlayfs inode stays clean? Am I
 > >  > > > >  > missing something?
 > >  > > > >  >
 > >  > > > >
 > >  > > > > Yeah, this is key point of this approach, in order to  fix the issue I
 > >  > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i
 > >  > > > > mean is during writeback we will call into ->write_inode() by this
 > >  > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and
 > >  > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode().
 > >  > > > >
 > >  > > > >     if (mapping_writably_mapped(upper->i_mapping) ||
 > >  > > > >          mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > >  > > > >                  iflag |= I_DIRTY_PAGES;
 > >  > > >
 > >  > > > OK, but suppose the upper mapping is clean at this moment (upper inode has
 > >  > > > been fully written out for whatever reason, but it is still mapped) so your
 > >  > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which
 > >  > > > would make your overlayfs inode dirty again when a write to mmap happens,
 > >  > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag.
 > >  > > >
 > >  > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then
 > >  > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm
 > >  > > > / writeback subsystem.
 > >  > > >
 > >  > > 
 > >  > > Perhaps I will add some background.
 > >  > > 
 > >  > > What I suggested was to maintain a "suspect list" in addition to
 > >  > > the dirty ovl inodes.
 > >  > > 
 > >  > > ovl inode is added to the suspect list on mmap (writable) and removed
 > >  > > from the suspect list on release() flush() or on sync_fs() if real inode is no
 > >  > > longer writably mapped.
 > >  > > 
 > >  > > There was another variant where ovl inode is added to suspect list on open
 > >  > > for write and removed from suspect list on release() flush() or sync_fs()
 > >  > > if real inode is not inode_is_open_for_write().
 > >  > > 
 > >  > > In both cases the list will have inodes whose real is not dirty, but
 > >  > > in both cases
 > >  > > the list shouldn't be terribly large to traverse on sync_fs().
 > >  > > 
 > >  > > Chengguang tried to implement the idea without an actual list by
 > >  > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have
 > >  > > no idea if his idea works.
 > >  > > 
 > >  > > I think we can resort to using an actual suspect list if you say that it
 > >  > > cannot work like this?
 > >  > 
 > >  > Yeah, the suspect list (i.e., additional list of inodes to check on sync)
 > >  > you describe should work fine. 
 > > 
 > > I think this solution still has the problem we have met in below thread[1]
 > > The main problem is the state combination of clean overlayfs' inode && dirty upper inode.
 > 
 > But I think the scheme Amir proposed and I detailed in my previous email
 > should prevent that state. Because while the inode is mapped, it will be
 > kept in the dirty list. So which scenario do you think would lead to clean
 > overlayfs inode and dirty upper inode?

If keeping in the dirty list means making  overlayfs inode dirty, then
I think we don't need extra list for that, vfs itself has writeback list and
the solution will be exactly the same as mine(re-dirty) . Right?


 > 
 > > [1] https://www.spinics.net/lists/linux-unionfs/msg07448.html
 > > 
 > >  > Also the "keep suspect inode dirty" idea
 > >  > of Chengguang could work fine but we'd have to use something like
 > >  > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need
 > >  > to be implemented but it should be easy vma_interval_tree_foreach() walk
 > >  > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether
 > >  > inode should be redirtied or not.
 > >  > 
 > > 
 > > I'm curious that isn't  it enough to check  i_mmap_writable by
 > > mapping_writably_mapped() ?  Am I missing something?
 > 
 > What is i_mmap_writeable? I've grepped the tree and didn't find anything
 > like that...
 > 

Maybe spelling mistake? The reason I check this is I'm afraid of the permission change of vma by mprotect(2).


Thanks,
Chenguang

  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-25  3:41 [RFC PATCH v2 0/8] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 1/8] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 2/8] ovl: implement ->writepages operation Chengguang Xu
2020-11-02 17:17   ` Jan Kara
2020-11-04 12:18     ` Chengguang Xu
2020-11-05 13:55       ` Jan Kara
2020-11-06  5:57         ` Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 3/8] ovl: implement overlayfs' ->evict_inode operation Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 4/8] ovl: mark overlayfs' inode dirty on modification Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap Chengguang Xu
2020-11-02 17:30   ` Jan Kara
2020-11-04 11:54     ` Chengguang Xu
2020-11-05 14:03       ` Jan Kara
2020-11-05 14:21         ` Amir Goldstein
2020-11-05 15:54           ` Jan Kara
2020-11-06  2:41             ` Chengguang Xu
2020-11-06  8:50               ` Jan Kara
2020-11-06  9:47                 ` Chengguang Xu [this message]
2020-10-25  3:41 ` [RFC PATCH v2 6/8] ovl: implement overlayfs' ->write_inode operation Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 7/8] ovl: cache dirty overlayfs' inode Chengguang Xu
2020-10-25  3:41 ` [RFC PATCH v2 8/8] ovl: implement containerized syncfs for overlayfs Chengguang Xu
2020-10-30 15:46 ` [RFC PATCH v2 0/8] " Miklos Szeredi
2020-10-31 12:22   ` Chengguang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1759cf492c8.11cac446f12251.3388484787199140990@mykernel.net \
    --to=cgxu519@mykernel.net \
    --cc=amir73il@gmail.com \
    --cc=charliecgxu@tencent.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git