linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Amir Goldstein <amir73il@gmail.com>
Cc: harshad shirwadkar <harshadshirwadkar@gmail.com>,
	Ext4 <linux-ext4@vger.kernel.org>, Theodore Tso <tytso@mit.edu>,
	overlayfs <linux-unionfs@vger.kernel.org>
Subject: Re: [PATCH] ext4: add rename whiteout support for fast commit
Date: Fri, 19 Mar 2021 09:36:27 +0100	[thread overview]
Message-ID: <CAJfpeguiFU5qv-L-jeXBhc+PqeMOUoVnPO3EN4xOB0nCH9Z2cA@mail.gmail.com> (raw)
In-Reply-To: <CAOQ4uxg+d2WoPEL2mC5H3d0uxh-_HGw3Bhyrun=z4O2nCg-yNQ@mail.gmail.com>

On Fri, Mar 19, 2021 at 6:52 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> [adding overlayfs list]
>
> On Fri, Mar 19, 2021 at 3:32 AM harshad shirwadkar
> <harshadshirwadkar@gmail.com> wrote:
> >
> > Thanks for the review Amir.
> >
> > Sure changing the subject makes sense.
> >
> > Also, on further discussions on Ext4 conference call, we also thought
> > that with this patch, overlayfs customers would not benefit from fast
> > commits much if they call renames often. So, in order to really make
> > rename whiteout a fast commit compatible operation, we probably would
> > need to add support in fast commit to replay a char device creation
> > event (since whiteout object is a char device). That would imply, we
> > would need to do careful versioning and would need to burn an on-disk
> > feature flag.
> >
> > An alternative to this would be to have a static whiteout object with
> > irrelevant nlink count and to have every rename point to that object
> > instead. Based on how we decide to implement that, at max only the
> > first rename operation would be fast commit incompatible since that's
> > when this object would get created. All the further operations would
> > be fast commit compatible. The big benefit of this approach is that
> > this way we don't have to add support for char device creation in fast
> > commit replay code and thus we don't have to worry about versioning.
> >
>
> I'm glad to hear that, Harshad.
>
> Please note that creating a static whiteout object on-disk is one possible
> implementation option. Not creating any object on-disk may be even better.

I don't really get it.  What's the advantage of not having an object?

Readdir returning DT_WHT internally might be nice, but I'd be careful
with exporting that to userspace, since it's likely to cause more
problems that it solves.   And on the stat(2) interface adding S_IFWHT
or even worse: ENOENT are really out of the question due to backward
incompatibility with almost every application.

> One other challenge is how to handle users trying to make operations
> on the upper layer directly (migrating images etc).
> As long as the tools still observe the whiteout as a chadev (with stat(2))
> then export and import should work fine (creating a real chardev on import).

Right.

Can't mkfs.ext4 just create the static object?  That sounds to me like
the simplest approach.

Thanks,
Miklos


Thanks,
Miklos
>
> I had suggested the static object approach because it should be pretty
> simple to implement and add e2fsprogs support for.
>
> However, if we look at the requirements for RENAME_WHITEOUT,
> the resulting directory entry does not actually need to point to any
> object on-disk at all.
>
> An alternative implementation would be to create a directory entry
> with {d_ino = 0, d_type = DT_WHT}. Lookup of this entry will return
> a reference to a singleton read-only ext4 whiteout inode object, which
> does not reside on disk, so fast commit is irrelevant in that sense.
> i_nlink should be handled carefully, but that should be easier from
> doing that for a static on-disk object.
>
> I am not sure how userland tools handle DT_WHT, but I see that
> other filesystems can emit this value in theory and man rename(2)
> claims that BSD uses DT_WHT, so the common tools should be
> able to handle it.
>
> As far as overlayfs is concerned:
> 1. ovl_lookup() will find an IS_WHITEOUT() inode as usual
> 2. ovl_dir_read_merged() will need this small patch (below) and will
>     not access the inode object at all
> 3. At first, ovl_whiteout() -> vfs_whiteout() can still create a usual chardev
> 4. Later, we can initiate the overlayfs instance singleton whiteout
>     reference in ovl_check_rename_whiteout() and ovl_whiteout() will
>     never get -EMLINK when linking this whiteout object
>
> If there are tools that try to change  inode permissions recursively on the
> upper layer (?) there may be a problem with those read-only whiteouts
> although the permission of a whiteout is a moot concept.
>
> Thanks,
> Amir.
>
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -161,7 +161,7 @@ static struct ovl_cache_entry
> *ovl_cache_entry_new(struct ovl_readdir_data *rdd,
>         if (ovl_calc_d_ino(rdd, p))
>                 p->ino = 0;
>         p->is_upper = rdd->is_upper;
> -       p->is_whiteout = false;
> +       p->is_whiteout = (d_type == DT_WHT);
>
>         if (d_type == DT_CHR) {
>                 p->next_maybe_whiteout = rdd->first_maybe_whiteout;

  reply	other threads:[~2021-03-19  8:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210316221921.1124955-1-harshadshirwadkar@gmail.com>
     [not found] ` <CAOQ4uxiD8WGLeSftqL6dOfz_kNp+YSE7qfXYG34Pea4j8G7CxA@mail.gmail.com>
     [not found]   ` <CAD+ocbzMv6SyUUZFnBE0gTnHf8yvMFfq6Dm9rdnLXoUrh7gYkg@mail.gmail.com>
2021-03-19  5:51     ` [PATCH] ext4: add rename whiteout support for fast commit Amir Goldstein
2021-03-19  8:36       ` Miklos Szeredi [this message]
2021-03-19 10:35         ` Amir Goldstein
2021-03-19 10:51           ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJfpeguiFU5qv-L-jeXBhc+PqeMOUoVnPO3EN4xOB0nCH9Z2cA@mail.gmail.com \
    --to=miklos@szeredi.hu \
    --cc=amir73il@gmail.com \
    --cc=harshadshirwadkar@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).