linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chengguang Xu <cgxu519@mykernel.net>
To: "Amir Goldstein" <amir73il@gmail.com>
Cc: "Jan Kara" <jack@suse.cz>, "miklos" <miklos@szeredi.hu>,
	"linux-unionfs" <linux-unionfs@vger.kernel.org>,
	"linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>
Subject: Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
Date: Fri, 16 Oct 2020 09:56:13 +0800	[thread overview]
Message-ID: <1752f1f2c1a.c646418645575.6542807445910962686@mykernel.net> (raw)
In-Reply-To: <CAOQ4uxhujP_pzguq+FJ87Mx4GBNzEWQs-izuXK1qhWu3EmLpJA@mail.gmail.com>

 ---- 在 星期五, 2020-10-16 00:02:42 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > >  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
 > >  > >  > overlay inode with "maybe-writably-mapped" and then remove
 > >  > >  > it from the maybe dirty list when the underlying inode is not dirty
 > >  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
 > >  > >  >
 > >  > >  > Actually, there is no reason to treat writably mapped inodes and
 > >  > >  > other dirty inodes differently - insert to suspect list on open for
 > >  > >  > write, remove from suspect list on last release() or write_inode()
 > >  > >  > when inode is no longer dirty and writable.
 > >
 > > I have to say inserting to suspect list cannot prevent dropping,
 > > thinking of the problem of previous approach that we write dirty upper
 > > inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
 > >
 > 
 > Sorry, I don't understand what that means.

This is the main problem of my previous patch set V10, evicting clean inode
expects no write behavior but in the case of dirty upper inode we have to
write out dirty data in this timing otherwise we will lose the connection with upper inode.


 > 
 > >
 > >  > >  >
 > >  > >  > Did I miss anything?
 > >  > >  >
 > >  > >
 > >  > > If we dirty overlay inode that means we have launched writeback mechanism,
 > >  > > so in this case, re-dirty overlay inode in time becomes important.
 > >  > >
 > >  >
 > >  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
 > >  > to iterate the
 > >  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
 > >  >
 > >
 > > I'm curious how we prevent dropping of clean overlay inode with dirty upper?
 > > Hold another reference during iput_final operation? in the drop_inode() or something
 > > else?
 > 
 > No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

It's not good,  it  only temporarily  skips eviction, the inode in lru list
will be evicted in some cases like drop cache or memory reclaim. etc.

A solution for this is getting another reference in ->drop_inode so that
the inode can escape from adding to lru list but this looks awkward and tricky.

 > 
 > >
 > >
 > >  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
 > >  > care of calling ovl_write_inode() for all the re-dirty inodes.
 > >  >
 > >  > In current code we sync ALL dirty upper fs inodes and we do not overlay
 > >  > inodes with no reference in cache.
 > >  >
 > >  > The best code would sync only upper fs inodes dirtied by this overlay
 > >  > and will be able to evict overlay inodes whose upper inodes are clean.
 > >  >
 > >  > The compromise code would sync only upper fs inodes dirtied by this overlay,
 > >  > and would not evict overlay inodes as long as upper inodes are "open for write".
 > >  > I think its a fine compromise considering the alternatives.
 > >  >
 > >  > Is this workable?
 > >  >
 > >
 > > In your approach, the key point is how to prevent dropping overlay inode that has
 > > dirty upper and no reference but I don't understand well how to achieve it from
 > > your descriptions.
 > >
 > >
 > 
 > Very well, I will try to explain with code:
 > 
 > int ovl_inode_is_open_for_write(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        return upper_inode && inode_is_open_for_write(upper_inode);
 > }
 > 
 > void ovl_maybe_mark_inode_dirty(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
 >                 mark_inode_dirty(inode);
 > }
 > 
 > int ovl_open(struct inode *inode, struct file *file)
 > {
 > ...
 >        if (ovl_inode_is_open_for_write(file_inode(file)))
 >                ovl_add_inode_to_suspect_list(inode);
 > 
 >         file->private_data = realfile;
 > 
 >         return 0;
 > }
 > 
 > int ovl_release(struct inode *inode, struct file *file)
 > {
 >        struct inode *inode = file_inode(file);
 > 
 >        if (ovl_inode_is_open_for_write(inode)) {
 >                ovl_maybe_mark_inode_dirty(inode);
 >                ovl_remove_inode_from_suspect_list(inode);

I think in some cases removing from suspect_list will cause losing
the connection with upper inode that has writable mmap.


 >        }
 > 
 >         fput(file->private_data);
 > 
 >         return 0;
 > }
 > 
 > int ovl_drop_inode(struct inode *inode)
 > {
 >        struct inode *upper_inode = ovl_inode_upper(inode);
 > 
 >        if (!upper_inode)
 >                return 1;
 >        if (upper_inode->i_state & I_DIRTY_ALL)
 >                return 0;
 > 
 >        return !inode_is_open_for_write(upper_inode);

Is this condition just for writable mmap?


 > }
 > 
 > static int ovl_sync_fs(struct super_block *sb, int wait)
 > {
 >         struct ovl_fs *ofs = sb->s_fs_info;
 >         struct super_block *upper_sb;
 >         int ret;
 > 
 >         if (!ovl_upper_mnt(ofs))
 >                 return 0;
 > 
 >         /*
 >          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 >          * All the super blocks will be iterated, including upper_sb.
 >          *
 >          * If this is a syncfs(2) call, then we do need to call
 >          * sync_filesystem() on upper_sb, but enough if we do it when being
 >          * called with wait == 1.
 >          */
 >         if (!wait) {
 >                 /* mark inodes on the suspect list dirty if thier
 > upper inode is dirty */
 >                 ovl_mark_suspect_list_inodes_dirty();
 >                 return 0;
 >         }
 > ...
 > 

Why 2 rounds?  it seems the simplest way is only syncing dirty upper inode
on wait==1 just like my previous patch.


 > 
 > The races are avoided because inode is added/removed from suspect
 > list while overlay inode has a reference (from file) and because upper inode
 > cannot be dirtied by overlayfs when overlay inode is not on the suspect list.
 > 
 > Unless I am missing something.
 > 

Thanks,
Chengguang

  parent reply	other threads:[~2020-10-16  1:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15   ` Jan Kara
2020-10-15  3:03     ` Chengguang Xu
2020-10-15  6:11       ` Amir Goldstein
2020-10-15 11:29         ` Chengguang Xu
2020-10-15 12:32           ` Amir Goldstein
2020-10-15 13:13             ` Chengguang Xu
2020-10-15 16:02               ` Amir Goldstein
2020-10-15 16:06                 ` Amir Goldstein
2020-10-16  1:56                 ` Chengguang Xu [this message]
2020-10-16  4:39                   ` Amir Goldstein
2020-10-16  7:43                     ` Chengguang Xu
2020-10-15  3:25   ` Al Viro
2020-10-15  3:42     ` Chengguang Xu
2020-10-15  4:57       ` Al Viro
2020-10-15 10:56         ` Chengguang Xu
2020-10-16  7:09         ` Chengguang Xu
2020-10-16  9:22           ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1752f1f2c1a.c646418645575.6542807445910962686@mykernel.net \
    --to=cgxu519@mykernel.net \
    --cc=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).