linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Chengguang Xu <cgxu519@mykernel.net>
Cc: Jan Kara <jack@suse.cz>, miklos <miklos@szeredi.hu>,
	linux-unionfs <linux-unionfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [RFC PATCH 1/5] fs: introduce notifier list for vfs inode
Date: Thu, 15 Oct 2020 19:02:42 +0300	[thread overview]
Message-ID: <CAOQ4uxhujP_pzguq+FJ87Mx4GBNzEWQs-izuXK1qhWu3EmLpJA@mail.gmail.com> (raw)
In-Reply-To: <1752c652963.113ee3fbc44343.6282793280578516240@mykernel.net>

>  > >  > When an inode is writably mapped via ovarlayfs, you can flag the
>  > >  > overlay inode with "maybe-writably-mapped" and then remove
>  > >  > it from the maybe dirty list when the underlying inode is not dirty
>  > >  > AND its i_writecount is 0 (checked on write_inode() and release()).
>  > >  >
>  > >  > Actually, there is no reason to treat writably mapped inodes and
>  > >  > other dirty inodes differently - insert to suspect list on open for
>  > >  > write, remove from suspect list on last release() or write_inode()
>  > >  > when inode is no longer dirty and writable.
>
> I have to say inserting to suspect list cannot prevent dropping,
> thinking of the problem of previous approach that we write dirty upper
> inode with current->flags & PF_MEMALLOC while evicting clean overlay inode.
>

Sorry, I don't understand what that means.

>
>  > >  >
>  > >  > Did I miss anything?
>  > >  >
>  > >
>  > > If we dirty overlay inode that means we have launched writeback mechanism,
>  > > so in this case, re-dirty overlay inode in time becomes important.
>  > >
>  >
>  > My idea was to use the first call to ovl_sync_fs() with 'wait' false
>  > to iterate the
>  > maybe-dirty list and re-dirty overlay inodes whose upper is dirty.
>  >
>
> I'm curious how we prevent dropping of clean overlay inode with dirty upper?
> Hold another reference during iput_final operation? in the drop_inode() or something
> else?

No, just return 0 from ovl_drop_inode() and iput_final() will not evict().

>
>
>  > Then in the second call to __sync_filesystem, sync_inodes_sb() will take
>  > care of calling ovl_write_inode() for all the re-dirty inodes.
>  >
>  > In current code we sync ALL dirty upper fs inodes and we do not overlay
>  > inodes with no reference in cache.
>  >
>  > The best code would sync only upper fs inodes dirtied by this overlay
>  > and will be able to evict overlay inodes whose upper inodes are clean.
>  >
>  > The compromise code would sync only upper fs inodes dirtied by this overlay,
>  > and would not evict overlay inodes as long as upper inodes are "open for write".
>  > I think its a fine compromise considering the alternatives.
>  >
>  > Is this workable?
>  >
>
> In your approach, the key point is how to prevent dropping overlay inode that has
> dirty upper and no reference but I don't understand well how to achieve it from
> your descriptions.
>
>

Very well, I will try to explain with code:

int ovl_inode_is_open_for_write(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       return upper_inode && inode_is_open_for_write(upper_inode);
}

void ovl_maybe_mark_inode_dirty(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (upper_inode && upper_inode->i_state & I_DIRTY_ALL)
                mark_inode_dirty(inode);
}

int ovl_open(struct inode *inode, struct file *file)
{
...
       if (ovl_inode_is_open_for_write(file_inode(file)))
               ovl_add_inode_to_suspect_list(inode);

        file->private_data = realfile;

        return 0;
}

int ovl_release(struct inode *inode, struct file *file)
{
       struct inode *inode = file_inode(file);

       if (ovl_inode_is_open_for_write(inode)) {
               ovl_maybe_mark_inode_dirty(inode);
               ovl_remove_inode_from_suspect_list(inode);
       }

        fput(file->private_data);

        return 0;
}

int ovl_drop_inode(struct inode *inode)
{
       struct inode *upper_inode = ovl_inode_upper(inode);

       if (!upper_inode)
               return 1;
       if (upper_inode->i_state & I_DIRTY_ALL)
               return 0;

       return !inode_is_open_for_write(upper_inode);
}

static int ovl_sync_fs(struct super_block *sb, int wait)
{
        struct ovl_fs *ofs = sb->s_fs_info;
        struct super_block *upper_sb;
        int ret;

        if (!ovl_upper_mnt(ofs))
                return 0;

        /*
         * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
         * All the super blocks will be iterated, including upper_sb.
         *
         * If this is a syncfs(2) call, then we do need to call
         * sync_filesystem() on upper_sb, but enough if we do it when being
         * called with wait == 1.
         */
        if (!wait) {
                /* mark inodes on the suspect list dirty if thier
upper inode is dirty */
                ovl_mark_suspect_list_inodes_dirty();
                return 0;
        }
...


The races are avoided because inode is added/removed from suspect
list while overlay inode has a reference (from file) and because upper inode
cannot be dirtied by overlayfs when overlay inode is not on the suspect list.

Unless I am missing something.

Thanks,
Amir.

  reply	other threads:[~2020-10-15 16:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-10 14:23 [RFC PATCH 0/5] implement containerized syncfs for overlayfs Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 1/5] fs: introduce notifier list for vfs inode Chengguang Xu
2020-10-14 16:15   ` Jan Kara
2020-10-15  3:03     ` Chengguang Xu
2020-10-15  6:11       ` Amir Goldstein
2020-10-15 11:29         ` Chengguang Xu
2020-10-15 12:32           ` Amir Goldstein
2020-10-15 13:13             ` Chengguang Xu
2020-10-15 16:02               ` Amir Goldstein [this message]
2020-10-15 16:06                 ` Amir Goldstein
2020-10-16  1:56                 ` Chengguang Xu
2020-10-16  4:39                   ` Amir Goldstein
2020-10-16  7:43                     ` Chengguang Xu
2020-10-15  3:25   ` Al Viro
2020-10-15  3:42     ` Chengguang Xu
2020-10-15  4:57       ` Al Viro
2020-10-15 10:56         ` Chengguang Xu
2020-10-16  7:09         ` Chengguang Xu
2020-10-16  9:22           ` Jan Kara
2020-10-10 14:23 ` [RFC PATCH 2/5] fs: export symbol of writeback_single_inode() Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 3/5] ovl: setup overlayfs' private bdi Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 4/5] ovl: monitor marking dirty activity of underlying upper inode Chengguang Xu
2020-10-10 14:23 ` [RFC PATCH 5/5] ovl: impement containerized syncfs for overlayfs Chengguang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxhujP_pzguq+FJ87Mx4GBNzEWQs-izuXK1qhWu3EmLpJA@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=cgxu519@mykernel.net \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).