Re: overlayfs: issue with a replaced lower squashfs with export-table

From: Vivek Goyal <vgoyal@redhat.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Fabian <godi.beat@gmx.net>
Subject: Re: overlayfs: issue with a replaced lower squashfs with export-table
Date: Tue, 7 Jul 2020 17:53:09 -0400	[thread overview]
Message-ID: <20200707215309.GB48341@redhat.com> (raw)
In-Reply-To: <CAOQ4uxhMq_8xwCU2t+WveTGgc9MAWE2RD66q5UjQ1r09EoLzHA@mail.gmail.com>

On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > Miklos,
> > >
> > > At first glance I did not understand how changing lower file handles causes
> > > failure to ovl_verify_inode().
> > > To complete the picture, here is the explanation.
> > >
> > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > and the "origin" file handle composed of the inode number 10 is recorded
> > > in upper file A.
> > >
> > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > the "origin" file handle composed of the inode number 10 is recorded
> > > in upper file B.
> > > Now we have two upper files with the same "origin" that are not hardlinks.
> > >
> > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > file B (inode 10) as the lower inode.
> > > This lower inode is used to get the overlay inode number (10) and as
> > > the key to hash overlay inode in inode cache.
> > >
> > > Suppose A is looked up first and it's inode is hashed.
> > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > because:
> > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > >
> > > This can also happen when copying overlay layers to a new
> > > fs tree and carrying over the old "origin" xattr.
> > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > protect against decoding lower fh when migrating to another
> > > filesystem, but layers could be migrated inside the same filesystem.
> > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > to migrating layers inside the same filesystem.
> >
> > Hi Amir,
> >
> > So we can't use "origin" if lower layers have been copied. If they
> > have been copied to a different filesystem with uuid we seem to
> > have a mechanism to detect it but otherwise not and we can run
> > into these kind of issues.
> >
> 
> Correct.
> 
> > My question is, why do we allow copying or updating lower layers
> > with same upper when we know this will break stored origin in
> > upper.
> 
> I don't know if we "allow" this.

So there seem to two cases. One is copying the lower layers to same
filesystem or a different filesystem. And another case is recreating
the lower layers and use previous upper with old upper. IIUC, we
are currently facing problem with second scenario.

"copying the lower layers" is probably fine as you said because old
tree still keeps that inode number busy and newly created inode
should not acquire that number. 

But in this case looks like we recreated lower and that led to 
some file B acquiring an inode number which was used by A. So
this is a different use case. IIUC, simply copying the layer
will not lead to this situation.

Now question is do we support recreating the lower layer with existing
upper. And I see following text in "Sharing and copying layer"

"Mounting an overlay using an upper layer path, where the upper layer path
was previously used by another mounted overlay in combination with a
different lower layer path, is allowed, unless the "inodes index" feature
or "metadata only copy up" feature is enabled."

This seems to suggest that recreating lower layer is allowed as long
as you are not using index or metacopy feature. 

If that's the case, probably "origin" should have been an opt-in
feature or automatically be enabled by some other opt-in feature.

> We never considered the case expect
> for nfs export and index, see overlayfs.rst:
> "When the overlay NFS export feature is enabled, overlay filesystems
> behavior on offline changes of the underlying lower layer is different
> than the behavior when NFS export is disabled. ..."
> 
> > Can't I do same thing with ext4/xfs, where I recreate
> > lower layers when inode numbers get exchanged  and same problem
> > will happen (despite uuid being same).
> >
> 
> Same problem.

> 
> > IOW, how can we support copying layers (with same upper) while origin
> > is in use. Rest of the features are built on top of the assumption
> > that origin is valid. And in case of copying layers, we don't
> > seem to have a sure way to find if origin is valid or not.
> >
> 
> With index/nfs_export enabled we at least do:
> /* Verify lower root is upper root origin */
> and if verification fails we disable the feature.

So discrepancy is still possible if somebody modifies lower layers
without changing lower root.

- Copy up file A.
- Unmount overlay
- unlink A
- create B (assume B gets same inode number as A).
- mount overlay; B gets copied up.

And now we have both upper A and B having same origin despite no
hardlink. Am I understanding it right?

Is there a good use case for allowing modifying lower layers with
same upper. Given we are adding more complex features to overlayfs,
will it make sense to not allow modifying lower layer going forward.
We might not be able to detect it but atleast it will be unsupported
configuration. And then we can only focus to provide a work around
for existing use cases.

[..]
> > >
> > > We were aware of the "layer migration" case when designing the
> > > index/nfs_export feature, which is one of the reasons they are
> > > opt-in features.
> > >
> > > But we enabled the functionality of following non-dir origin
> > > unconditionally because we *thought* it is harmless, as the comment
> > > in ovl_lookup() says:
> > >
> > >          /*
> > >          * Lookup copy up origin by decoding origin file handle.
> > >          * We may get a disconnected dentry, which is fine,
> > >          * because we only need to hold the origin inode in
> > >          * cache and use its inode number.  We may even get a
> > >          * connected dentry, that is not under any of the lower
> > >          * layers root.  That is also fine for using it's inode
> > >          * number - it's the same as if we held a reference
> > >          * to a dentry in lower layer that was moved under us.
> > >          */
> > >
> > > The patch I posted disabled decoding of non-dir origin for the special
> > > case of lower null uuid.
> > >
> > > I think we can also warn and auto-disable decoding non-dir origin in
> > > case index is disabled and we detect this upper inode conflict in
> > > ovl_verify_inode().
> > >
> > > The problem is if A is not metacopy and looked up first, and B is
> > > metacopy and looked up second, then conflict will be deleted after
> > > the wrong inode has been hashed.

We don't allow modifying lower layers if metacopy is enabled.

"For "metadata only copy up" feature there is no verification mechanism at
mount time. So if same upper is mounted with different set of lower, mount
probably will succeed but expect the unexpected later on. So don't do it.
"

So if somebody is recreating lower or modifying lower with metacopy
on, its an unsupported configuration.

Thanks
Vivek