linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Fabian <godi.beat@gmx.net>
Subject: Re: overlayfs: issue with a replaced lower squashfs with export-table
Date: Wed, 8 Jul 2020 09:55:32 +0300	[thread overview]
Message-ID: <CAOQ4uxhd+kYzaDmndCV5rgiswfHnyLjZokmUa+BVk9t31C=HWg@mail.gmail.com> (raw)
In-Reply-To: <20200707215309.GB48341@redhat.com>

On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > > Miklos,
> > > >
> > > > At first glance I did not understand how changing lower file handles causes
> > > > failure to ovl_verify_inode().
> > > > To complete the picture, here is the explanation.
> > > >
> > > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > > and the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file A.
> > > >
> > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > > the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file B.
> > > > Now we have two upper files with the same "origin" that are not hardlinks.
> > > >
> > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > > file B (inode 10) as the lower inode.
> > > > This lower inode is used to get the overlay inode number (10) and as
> > > > the key to hash overlay inode in inode cache.
> > > >
> > > > Suppose A is looked up first and it's inode is hashed.
> > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > > because:
> > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > > >
> > > > This can also happen when copying overlay layers to a new
> > > > fs tree and carrying over the old "origin" xattr.
> > > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > > protect against decoding lower fh when migrating to another
> > > > filesystem, but layers could be migrated inside the same filesystem.
> > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > > to migrating layers inside the same filesystem.
> > >
> > > Hi Amir,
> > >
> > > So we can't use "origin" if lower layers have been copied. If they
> > > have been copied to a different filesystem with uuid we seem to
> > > have a mechanism to detect it but otherwise not and we can run
> > > into these kind of issues.
> > >
> >
> > Correct.
> >
> > > My question is, why do we allow copying or updating lower layers
> > > with same upper when we know this will break stored origin in
> > > upper.
> >
> > I don't know if we "allow" this.
>
> So there seem to two cases. One is copying the lower layers to same
> filesystem or a different filesystem. And another case is recreating
> the lower layers and use previous upper with old upper. IIUC, we
> are currently facing problem with second scenario.
>
> "copying the lower layers" is probably fine as you said because old
> tree still keeps that inode number busy and newly created inode
> should not acquire that number.
>
> But in this case looks like we recreated lower and that led to
> some file B acquiring an inode number which was used by A. So
> this is a different use case. IIUC, simply copying the layer
> will not lead to this situation.
>
> Now question is do we support recreating the lower layer with existing
> upper. And I see following text in "Sharing and copying layer"
>
> "Mounting an overlay using an upper layer path, where the upper layer path
> was previously used by another mounted overlay in combination with a
> different lower layer path, is allowed, unless the "inodes index" feature
> or "metadata only copy up" feature is enabled."
>
> This seems to suggest that recreating lower layer is allowed as long
> as you are not using index or metacopy feature.
>
> If that's the case, probably "origin" should have been an opt-in
> feature or automatically be enabled by some other opt-in feature.
>
>

Yes. "should have been". That is my point.
We thought it was a victimless crime to enable "origin" without opt-in
and I think we were wrong.

> > We never considered the case expect
> > for nfs export and index, see overlayfs.rst:
> > "When the overlay NFS export feature is enabled, overlay filesystems
> > behavior on offline changes of the underlying lower layer is different
> > than the behavior when NFS export is disabled. ..."
> >
> > > Can't I do same thing with ext4/xfs, where I recreate
> > > lower layers when inode numbers get exchanged  and same problem
> > > will happen (despite uuid being same).
> > >
> >
> > Same problem.
>
> >
> > > IOW, how can we support copying layers (with same upper) while origin
> > > is in use. Rest of the features are built on top of the assumption
> > > that origin is valid. And in case of copying layers, we don't
> > > seem to have a sure way to find if origin is valid or not.
> > >
> >
> > With index/nfs_export enabled we at least do:
> > /* Verify lower root is upper root origin */
> > and if verification fails we disable the feature.
>
> So discrepancy is still possible if somebody modifies lower layers
> without changing lower root.
>
> - Copy up file A.
> - Unmount overlay
> - unlink A
> - create B (assume B gets same inode number as A).
> - mount overlay; B gets copied up.
>
> And now we have both upper A and B having same origin despite no
> hardlink. Am I understanding it right?
>

Yes.

> Is there a good use case for allowing modifying lower layers with
> same upper. Given we are adding more complex features to overlayfs,
> will it make sense to not allow modifying lower layer going forward.
> We might not be able to detect it but atleast it will be unsupported
> configuration. And then we can only focus to provide a work around
> for existing use cases.
>

Agreed.

> [..]
> > > >
> > > > We were aware of the "layer migration" case when designing the
> > > > index/nfs_export feature, which is one of the reasons they are
> > > > opt-in features.
> > > >
> > > > But we enabled the functionality of following non-dir origin
> > > > unconditionally because we *thought* it is harmless, as the comment
> > > > in ovl_lookup() says:
> > > >
> > > >          /*
> > > >          * Lookup copy up origin by decoding origin file handle.
> > > >          * We may get a disconnected dentry, which is fine,
> > > >          * because we only need to hold the origin inode in
> > > >          * cache and use its inode number.  We may even get a
> > > >          * connected dentry, that is not under any of the lower
> > > >          * layers root.  That is also fine for using it's inode
> > > >          * number - it's the same as if we held a reference
> > > >          * to a dentry in lower layer that was moved under us.
> > > >          */
> > > >
> > > > The patch I posted disabled decoding of non-dir origin for the special
> > > > case of lower null uuid.
> > > >
> > > > I think we can also warn and auto-disable decoding non-dir origin in
> > > > case index is disabled and we detect this upper inode conflict in
> > > > ovl_verify_inode().
> > > >
> > > > The problem is if A is not metacopy and looked up first, and B is
> > > > metacopy and looked up second, then conflict will be deleted after
> > > > the wrong inode has been hashed.
>
> We don't allow modifying lower layers if metacopy is enabled.
>
> "For "metadata only copy up" feature there is no verification mechanism at
> mount time. So if same upper is mounted with different set of lower, mount
> probably will succeed but expect the unexpected later on. So don't do it.
> "
>
> So if somebody is recreating lower or modifying lower with metacopy
> on, its an unsupported configuration.
>

Very good. So this case is settled.

Now, suggestions for work arounds:

1. Don't follow with lower null uuid (patch posted) - no caveats
2. Opt-out of following origin with explicit option e.g. "index=nofollow"
3. Don't follow origin unless one of the following opt-in features:
    metacopy,index,xino

If we go for option #3, the easiest recommendation for distros
would be to set:
CONFIG_OVERLAY_FS_XINO_AUTO=y

Apart from "compatibility with applications that expect 32bit inodes"
I am not aware of any caveats to enabling XINO_AUTO.

The purpose of xino feature is to make overlay st_ino/st_dev more
posix-like (currently only for non samefs), so it makes some sense
to tie the "origin" xattr feature that preserves the pre copy up
persistent st_ino to xino.

The xino documentation does not mention the samefs exemption:
"...If this feature is disabled or the underlying filesystem doesn't have
enough free bits in the inode number, then overlayfs will not be able to
guarantee that the values of st_ino and st_dev returned by stat(2) and the
value of d_ino returned by readdir(3) will act like on a normal filesystem."

and the 'xino' mount option is currently not displayed with samefs.

If we implement option #3, then with samefs user could enable xino
if default is off and disable xino if default is auto and display xino in
mount options.

Thanks,
Amir.

  reply	other threads:[~2020-07-08  6:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06 13:27 overlayfs: issue with a replaced lower squashfs with export-table Fabian
2020-07-06 14:29 ` Amir Goldstein
2020-07-06 15:14   ` Fabian
2020-07-06 15:33     ` Amir Goldstein
2020-07-06 16:10       ` Fabian
2020-07-06 17:14         ` Amir Goldstein
2020-07-16 13:29           ` Fabian Godehardt
2020-07-07  5:51     ` Amir Goldstein
2020-07-07 15:51       ` Vivek Goyal
2020-07-07 17:41         ` Amir Goldstein
2020-07-07 21:53           ` Vivek Goyal
2020-07-08  6:55             ` Amir Goldstein [this message]
2020-07-08  8:00               ` Miklos Szeredi
2020-07-08  8:31                 ` Amir Goldstein
2020-07-08  8:37                   ` Miklos Szeredi
2020-07-08  8:50                     ` Amir Goldstein
2020-07-08 14:23                       ` Vivek Goyal
2020-07-08 14:26                         ` Vivek Goyal
2020-07-08 17:26                           ` Amir Goldstein
2020-07-08 17:36                             ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhd+kYzaDmndCV5rgiswfHnyLjZokmUa+BVk9t31C=HWg@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=godi.beat@gmx.net \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).