linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* overlayfs: issue with a replaced lower squashfs with export-table
@ 2020-07-06 13:27 Fabian
  2020-07-06 14:29 ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Fabian @ 2020-07-06 13:27 UTC (permalink / raw)
  To: linux-unionfs

Hope this is the right list for asking overlayfs <-> squashfs related issues.
Else please let me know where to ask.

We are seeing problems using an read-writeable overlayfs (upper) on a readonly
squashfs (lower). The squashfs gets an update from time to time while we keep
the upper overlayfs.

On replaced files we then see -ESTALE ("overlayfs: failed to get inode (-116)")
messages if the lower squashfs was created _without_ using the "-no-exports"
switch.
The -ESTALE comes from ovl_get_inode() which in turn calls ovl_verify_inode()
and returns on the line where the upperdentry inode gets compared
( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).

A little debugging shows, that the upper files dentry name does not fit to the
dentry name of the new lower dentry as it seems to look for the inode on the
squashfs "export"-lookup-table which has changed as we replaced the lower fs.

Building the lower squashfs with the "-no-exports"-mksquashfs option, so
without the export-lookup-table, seems to work, but it might be no longer
exportable using nfs (which is ok and we can keep with it).

As we didn't find any other information regarding this behaviour or anyone who
also had this problem before we just want to know if this is the right way to
use the rw overlayfs on a (replaceable) ro squashfs filesystem.

Is this a known issue? Is it really needed to disable the export feature when
using overlayfs on a squashfs if we later need to replace squashfs during an
update? Any hints we can have a look on if this should work and we might have
done wrong during squashfs or overlayfs creation?

Thanks in advance for any hints!

Fabian



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 13:27 overlayfs: issue with a replaced lower squashfs with export-table Fabian
@ 2020-07-06 14:29 ` Amir Goldstein
  2020-07-06 15:14   ` Fabian
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-06 14:29 UTC (permalink / raw)
  To: Fabian; +Cc: overlayfs

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

Hi Fabian,

On Mon, Jul 6, 2020 at 4:28 PM Fabian <godi.beat@gmx.net> wrote:
>
> Hope this is the right list for asking overlayfs <-> squashfs related issues.

Yes.

> Else please let me know where to ask.
>
> We are seeing problems using an read-writeable overlayfs (upper) on a readonly
> squashfs (lower). The squashfs gets an update from time to time while we keep
> the upper overlayfs.
>

It gets updated while the overlay is offline (not mounted) correct?

> On replaced files we then see -ESTALE ("overlayfs: failed to get inode (-116)")
> messages if the lower squashfs was created _without_ using the "-no-exports"
> switch.
> The -ESTALE comes from ovl_get_inode() which in turn calls ovl_verify_inode()
> and returns on the line where the upperdentry inode gets compared
> ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
>
> A little debugging shows, that the upper files dentry name does not fit to the
> dentry name of the new lower dentry as it seems to look for the inode on the
> squashfs "export"-lookup-table which has changed as we replaced the lower fs.
>
> Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> without the export-lookup-table, seems to work, but it might be no longer
> exportable using nfs (which is ok and we can keep with it).
>
> As we didn't find any other information regarding this behaviour or anyone who
> also had this problem before we just want to know if this is the right way to
> use the rw overlayfs on a (replaceable) ro squashfs filesystem.
>
> Is this a known issue? Is it really needed to disable the export feature when
> using overlayfs on a squashfs if we later need to replace squashfs during an
> update? Any hints we can have a look on if this should work and we might have
> done wrong during squashfs or overlayfs creation?
>

This sounds like an unintentional outcome of:
9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs

Which enabled nfs_export for overlay with lower squashfs.

If you do not need to export overlayfs to NFS, then you can check if the
attached patch solves your problem.

If you do need to export overlayfs to NFS or to export squashfs to NFS
for that matter, you will have a problem, because when re-creating
squashfs (I suppose) all file handles are re-assigned randomly to new
files, so they have no meaning in the context of NFS file handles exported
in the old squashfs.

Thanks,
Amir.

[-- Attachment #2: ovl-fix-regression-with-re-formatted-lower-squashfs.patch.txt --]
[-- Type: text/plain, Size: 1484 bytes --]

From 8e9f532ce419ae72ba47b49582342377780a32db Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Mon, 6 Jul 2020 17:21:21 +0300
Subject: [PATCH] ovl: fix regression with re-formatted lower squashfs

Relax the requirement for non null uuid only if user opts-in to
index=on or nfs_export=on to avoid regressions with overlay that
is migrated to a newly formatted lower squashfs.

Reported-by: Fabian <godi.beat@gmx.net>
Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid...")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/super.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 15939ab39c1c..af95f83e3a70 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1402,6 +1402,16 @@ static bool ovl_lower_uuid_ok(struct ovl_fs *ofs, const uuid_t *uuid)
 	if (!ofs->config.nfs_export && !ovl_upper_mnt(ofs))
 		return true;
 
+	/*
+	 * We allow using single lower with null uuid for index and nfs_export
+	 * for example to support those features with single lower squashfs.
+	 * To avoid regressions setups of overlay with re-formatted lower
+	 * squashfs do not allow decoding origin with lower null uuid unless
+	 * index or nfs_export are explicitly enabled.
+	 */
+	if (!ofs->config.index && uuid_is_null(uuid))
+		return false;
+
 	for (i = 0; i < ofs->numfs; i++) {
 		/*
 		 * We use uuid to associate an overlay lower file handle with a
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 14:29 ` Amir Goldstein
@ 2020-07-06 15:14   ` Fabian
  2020-07-06 15:33     ` Amir Goldstein
  2020-07-07  5:51     ` Amir Goldstein
  0 siblings, 2 replies; 20+ messages in thread
From: Fabian @ 2020-07-06 15:14 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs

Hi Amir,

thanks for your mail and the quick reply!

Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > We are seeing problems using an read-writeable overlayfs (upper) on a
> > readonly squashfs (lower). The squashfs gets an update from time to time
> > while we keep the upper overlayfs.
>
> It gets updated while the overlay is offline (not mounted) correct?

Yes. We boot into a recovery system outside the rootfs and its overlayfs,
replace the lower squashfs, and then reboot into the new system.

> > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > (-116)") messages if the lower squashfs was created _without_ using the
> > "-no-exports" switch.
> > The -ESTALE comes from ovl_get_inode() which in turn calls
> > ovl_verify_inode() and returns on the line where the upperdentry inode
> > gets compared
> > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
> >
> > A little debugging shows, that the upper files dentry name does not fit to
> > the dentry name of the new lower dentry as it seems to look for the inode
> > on the squashfs "export"-lookup-table which has changed as we replaced
> > the lower fs.
> >
> > Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> > without the export-lookup-table, seems to work, but it might be no longer
> > exportable using nfs (which is ok and we can keep with it).
> >
> > As we didn't find any other information regarding this behaviour or anyone
> > who also had this problem before we just want to know if this is the
> > right way to use the rw overlayfs on a (replaceable) ro squashfs
> > filesystem.
> >
> > Is this a known issue? Is it really needed to disable the export feature
> > when using overlayfs on a squashfs if we later need to replace squashfs
> > during an update? Any hints we can have a look on if this should work and
> > we might have done wrong during squashfs or overlayfs creation?
>
> This sounds like an unintentional outcome of:
> 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs
>
> Which enabled nfs_export for overlay with lower squashfs.
>
> If you do not need to export overlayfs to NFS, then you can check if the
> attached patch solves your problem.

With the attached patch i'm now getting to a point where the overlayfs tries
to handle the /run-directory (a symlink). There seems to be a -ESTALE at
ovl_check_origin_fh() after the for-loop where it checks if origin was not
found ( if (!origin) ). Maybe i should debug for more details here? Please let
me know.

> If you do need to export overlayfs to NFS or to export squashfs to NFS
> for that matter, you will have a problem, because when re-creating
> squashfs (I suppose) all file handles are re-assigned randomly to new
> files, so they have no meaning in the context of NFS file handles exported
> in the old squashfs.

No, i think we currently can live without NFS support. Currently it is more
important that we can safely replace the lower squashfs.

Thanks again!
Fabian



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 15:14   ` Fabian
@ 2020-07-06 15:33     ` Amir Goldstein
  2020-07-06 16:10       ` Fabian
  2020-07-07  5:51     ` Amir Goldstein
  1 sibling, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-06 15:33 UTC (permalink / raw)
  To: Fabian; +Cc: overlayfs

On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
>
> Hi Amir,
>
> thanks for your mail and the quick reply!
>
> Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > readonly squashfs (lower). The squashfs gets an update from time to time
> > > while we keep the upper overlayfs.
> >
> > It gets updated while the overlay is offline (not mounted) correct?
>
> Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> replace the lower squashfs, and then reboot into the new system.
>
> > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > (-116)") messages if the lower squashfs was created _without_ using the
> > > "-no-exports" switch.
> > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > gets compared
> > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
> > >
> > > A little debugging shows, that the upper files dentry name does not fit to
> > > the dentry name of the new lower dentry as it seems to look for the inode
> > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > the lower fs.
> > >
> > > Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> > > without the export-lookup-table, seems to work, but it might be no longer
> > > exportable using nfs (which is ok and we can keep with it).
> > >
> > > As we didn't find any other information regarding this behaviour or anyone
> > > who also had this problem before we just want to know if this is the
> > > right way to use the rw overlayfs on a (replaceable) ro squashfs
> > > filesystem.
> > >
> > > Is this a known issue? Is it really needed to disable the export feature
> > > when using overlayfs on a squashfs if we later need to replace squashfs
> > > during an update? Any hints we can have a look on if this should work and
> > > we might have done wrong during squashfs or overlayfs creation?
> >
> > This sounds like an unintentional outcome of:
> > 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs
> >
> > Which enabled nfs_export for overlay with lower squashfs.
> >
> > If you do not need to export overlayfs to NFS, then you can check if the
> > attached patch solves your problem.
>
> With the attached patch i'm now getting to a point where the overlayfs tries
> to handle the /run-directory (a symlink). There seems to be a -ESTALE at
> ovl_check_origin_fh() after the for-loop where it checks if origin was not
> found ( if (!origin) ). Maybe i should debug for more details here? Please let
> me know.
>

This is expected. Does it cause any problem?

The patch marks the lower squashfs as "bad_uuid", because:
        if (!ofs->config.index && uuid_is_null(uuid))
                return false;
...
        if (!ovl_lower_uuid_ok(ofs, &sb->s_uuid)) {
                bad_uuid = true;
...
        ofs->fs[ofs->numfs].bad_uuid = bad_uuid;

That's ofs->fs[1].bad_uuid = bad_uuid;


Then in ovl_lookup() => ovl_check_origin() => ovl_check_origin_fh()
will return ESALE because of:
                if (ofs->layers[i].fsid &&
                    ofs->layers[i].fs->bad_uuid)
                        continue;

And ovl_check_origin() will return 0 to ovl_lookup().

When problems are you observing?

Maybe I did not understand the problem.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 15:33     ` Amir Goldstein
@ 2020-07-06 16:10       ` Fabian
  2020-07-06 17:14         ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Fabian @ 2020-07-06 16:10 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs

Hi Amir,

Am Montag, 6. Juli 2020, 17:33:54 CEST schrieb Amir Goldstein:
> On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
> > Hi Amir,
> >
> > thanks for your mail and the quick reply!
> >
> > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > > readonly squashfs (lower). The squashfs gets an update from time to
> > > > time
> > > > while we keep the upper overlayfs.
> > >
> > > It gets updated while the overlay is offline (not mounted) correct?
> >
> > Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> > replace the lower squashfs, and then reboot into the new system.
> >
> > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > > (-116)") messages if the lower squashfs was created _without_ using
> > > > the
> > > > "-no-exports" switch.
> > > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > > gets compared
> > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry))
> > > > ).
> > > >
> > > > A little debugging shows, that the upper files dentry name does not
> > > > fit to
> > > > the dentry name of the new lower dentry as it seems to look for the
> > > > inode
> > > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > > the lower fs.
> > > >
> > > > Building the lower squashfs with the "-no-exports"-mksquashfs option,
> > > > so
> > > > without the export-lookup-table, seems to work, but it might be no
> > > > longer
> > > > exportable using nfs (which is ok and we can keep with it).
> > > >
> > > > As we didn't find any other information regarding this behaviour or
> > > > anyone
> > > > who also had this problem before we just want to know if this is the
> > > > right way to use the rw overlayfs on a (replaceable) ro squashfs
> > > > filesystem.
> > > >
> > > > Is this a known issue? Is it really needed to disable the export
> > > > feature
> > > > when using overlayfs on a squashfs if we later need to replace
> > > > squashfs
> > > > during an update? Any hints we can have a look on if this should work
> > > > and
> > > > we might have done wrong during squashfs or overlayfs creation?
> > >
> > > This sounds like an unintentional outcome of:
> > > 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs
> > >
> > > Which enabled nfs_export for overlay with lower squashfs.
> > >
> > > If you do not need to export overlayfs to NFS, then you can check if the
> > > attached patch solves your problem.
> >
> > With the attached patch i'm now getting to a point where the overlayfs
> > tries to handle the /run-directory (a symlink). There seems to be a
> > -ESTALE at ovl_check_origin_fh() after the for-loop where it checks if
> > origin was not found ( if (!origin) ). Maybe i should debug for more
> > details here? Please let me know.
>
> This is expected. Does it cause any problem?
>
> The patch marks the lower squashfs as "bad_uuid", because:
>         if (!ofs->config.index && uuid_is_null(uuid))
>                 return false;
> ...
>         if (!ovl_lower_uuid_ok(ofs, &sb->s_uuid)) {
>                 bad_uuid = true;
> ...
>         ofs->fs[ofs->numfs].bad_uuid = bad_uuid;
>
> That's ofs->fs[1].bad_uuid = bad_uuid;
>
>
> Then in ovl_lookup() => ovl_check_origin() => ovl_check_origin_fh()
> will return ESALE because of:
>                 if (ofs->layers[i].fsid &&
>                     ofs->layers[i].fs->bad_uuid)
>                         continue;
>
> And ovl_check_origin() will return 0 to ovl_lookup().

I'm sorry. You are totaly right! RootFS now completely comes up - just missed
the console start in our latest inittab - so thought something still hangs.
The ESTALE was printed for me because i debugged the whole ESTALE positions in
the overlayfs code while studying the first problem. Time to remove my debug
code...

We will now continue with update tests. If we see something else i will let
you know.


Thanks again for your help and the quick reply!
Fabian



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 16:10       ` Fabian
@ 2020-07-06 17:14         ` Amir Goldstein
  2020-07-16 13:29           ` Fabian Godehardt
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-06 17:14 UTC (permalink / raw)
  To: Fabian; +Cc: overlayfs

On Mon, Jul 6, 2020 at 7:10 PM Fabian <godi.beat@gmx.net> wrote:
>
> Hi Amir,
>
> Am Montag, 6. Juli 2020, 17:33:54 CEST schrieb Amir Goldstein:
> > On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
> > > Hi Amir,
> > >
> > > thanks for your mail and the quick reply!
> > >
> > > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > > > readonly squashfs (lower). The squashfs gets an update from time to
> > > > > time
> > > > > while we keep the upper overlayfs.
> > > >
> > > > It gets updated while the overlay is offline (not mounted) correct?
> > >
> > > Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> > > replace the lower squashfs, and then reboot into the new system.
> > >
> > > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > > > (-116)") messages if the lower squashfs was created _without_ using
> > > > > the
> > > > > "-no-exports" switch.
> > > > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > > > gets compared
> > > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry))
> > > > > ).
> > > > >
> > > > > A little debugging shows, that the upper files dentry name does not
> > > > > fit to
> > > > > the dentry name of the new lower dentry as it seems to look for the
> > > > > inode
> > > > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > > > the lower fs.
> > > > >
> > > > > Building the lower squashfs with the "-no-exports"-mksquashfs option,
> > > > > so
> > > > > without the export-lookup-table, seems to work, but it might be no
> > > > > longer
> > > > > exportable using nfs (which is ok and we can keep with it).
> > > > >
> > > > > As we didn't find any other information regarding this behaviour or
> > > > > anyone
> > > > > who also had this problem before we just want to know if this is the
> > > > > right way to use the rw overlayfs on a (replaceable) ro squashfs
> > > > > filesystem.
> > > > >
> > > > > Is this a known issue? Is it really needed to disable the export
> > > > > feature
> > > > > when using overlayfs on a squashfs if we later need to replace
> > > > > squashfs
> > > > > during an update? Any hints we can have a look on if this should work
> > > > > and
> > > > > we might have done wrong during squashfs or overlayfs creation?
> > > >
> > > > This sounds like an unintentional outcome of:
> > > > 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs
> > > >
> > > > Which enabled nfs_export for overlay with lower squashfs.
> > > >
> > > > If you do not need to export overlayfs to NFS, then you can check if the
> > > > attached patch solves your problem.
> > >
> > > With the attached patch i'm now getting to a point where the overlayfs
> > > tries to handle the /run-directory (a symlink). There seems to be a
> > > -ESTALE at ovl_check_origin_fh() after the for-loop where it checks if
> > > origin was not found ( if (!origin) ). Maybe i should debug for more
> > > details here? Please let me know.
> >
> > This is expected. Does it cause any problem?
> >
> > The patch marks the lower squashfs as "bad_uuid", because:
> >         if (!ofs->config.index && uuid_is_null(uuid))
> >                 return false;
> > ...
> >         if (!ovl_lower_uuid_ok(ofs, &sb->s_uuid)) {
> >                 bad_uuid = true;
> > ...
> >         ofs->fs[ofs->numfs].bad_uuid = bad_uuid;
> >
> > That's ofs->fs[1].bad_uuid = bad_uuid;
> >
> >
> > Then in ovl_lookup() => ovl_check_origin() => ovl_check_origin_fh()
> > will return ESALE because of:
> >                 if (ofs->layers[i].fsid &&
> >                     ofs->layers[i].fs->bad_uuid)
> >                         continue;
> >
> > And ovl_check_origin() will return 0 to ovl_lookup().
>
> I'm sorry. You are totaly right! RootFS now completely comes up - just missed
> the console start in our latest inittab - so thought something still hangs.
> The ESTALE was printed for me because i debugged the whole ESTALE positions in
> the overlayfs code while studying the first problem. Time to remove my debug
> code...
>
> We will now continue with update tests. If we see something else i will let
> you know.
>
>

OK. please report back when done testing so I can add your tested-by

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 15:14   ` Fabian
  2020-07-06 15:33     ` Amir Goldstein
@ 2020-07-07  5:51     ` Amir Goldstein
  2020-07-07 15:51       ` Vivek Goyal
  1 sibling, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-07  5:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: overlayfs, Fabian, Vivek Goyal

On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
>
> Hi Amir,
>
> thanks for your mail and the quick reply!
>
> Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > readonly squashfs (lower). The squashfs gets an update from time to time
> > > while we keep the upper overlayfs.
> >
> > It gets updated while the overlay is offline (not mounted) correct?
>
> Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> replace the lower squashfs, and then reboot into the new system.
>
> > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > (-116)") messages if the lower squashfs was created _without_ using the
> > > "-no-exports" switch.
> > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > gets compared
> > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
> > >
> > > A little debugging shows, that the upper files dentry name does not fit to
> > > the dentry name of the new lower dentry as it seems to look for the inode
> > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > the lower fs.
> > >
> > > Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> > > without the export-lookup-table, seems to work, but it might be no longer
> > > exportable using nfs (which is ok and we can keep with it).

Miklos,

At first glance I did not understand how changing lower file handles causes
failure to ovl_verify_inode().
To complete the picture, here is the explanation.

Upper file A was copied up from lower file with inode 10 in old squashfs
and the "origin" file handle composed of the inode number 10 is recorded
in upper file A.

With newly formatted lower, lower A has inode 11 and lower B has inode 10.
Upper file B is copied from lower file B with inode 10 in new squashfs and
the "origin" file handle composed of the inode number 10 is recorded
in upper file B.
Now we have two upper files with the same "origin" that are not hardlinks.

On lookup of both overlay files A and B, ovl_check_origin() decodes lower
file B (inode 10) as the lower inode.
This lower inode is used to get the overlay inode number (10) and as
the key to hash overlay inode in inode cache.

Suppose A is looked up first and it's inode is hashed.
Then B is looked up and in ovl_get_inode() it finds the inode hashed
by the same lower inode in inode cache, but fails ovl_verify_inode()
because:
d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */

This can also happen when copying overlay layers to a new
fs tree and carrying over the old "origin" xattr.
In practice, the UUID part of the stored "origin" xattr is meant to
protect against decoding lower fh when migrating to another
filesystem, but layers could be migrated inside the same filesystem.
Since squashfs does not have a UUID, re-creating sqhashfs is similar
to migrating layers inside the same filesystem.

We were aware of the "layer migration" case when designing the
index/nfs_export feature, which is one of the reasons they are
opt-in features.

But we enabled the functionality of following non-dir origin
unconditionally because we *thought* it is harmless, as the comment
in ovl_lookup() says:

         /*
         * Lookup copy up origin by decoding origin file handle.
         * We may get a disconnected dentry, which is fine,
         * because we only need to hold the origin inode in
         * cache and use its inode number.  We may even get a
         * connected dentry, that is not under any of the lower
         * layers root.  That is also fine for using it's inode
         * number - it's the same as if we held a reference
         * to a dentry in lower layer that was moved under us.
         */

The patch I posted disabled decoding of non-dir origin for the special
case of lower null uuid.

I think we can also warn and auto-disable decoding non-dir origin in
case index is disabled and we detect this upper inode conflict in
ovl_verify_inode().

The problem is if A is not metacopy and looked up first, and B is
metacopy and looked up second, then conflict will be deleted after
the wrong inode has been hashed.

Perhaps we should disable decoding non-dir origin with in case
metacopy=on,index=off?
Maybe also provide a user option to disable decoding non-dir origin
at the price of losing persistent inode number for copied up non-dir?
Something like 'index=nofollow'.

Thoughts?
Am I overthinking this?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-07  5:51     ` Amir Goldstein
@ 2020-07-07 15:51       ` Vivek Goyal
  2020-07-07 17:41         ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-07-07 15:51 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Fabian

On Tue, Jul 07, 2020 at 08:51:53AM +0300, Amir Goldstein wrote:
> On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
> >
> > Hi Amir,
> >
> > thanks for your mail and the quick reply!
> >
> > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > > readonly squashfs (lower). The squashfs gets an update from time to time
> > > > while we keep the upper overlayfs.
> > >
> > > It gets updated while the overlay is offline (not mounted) correct?
> >
> > Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> > replace the lower squashfs, and then reboot into the new system.
> >
> > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > > (-116)") messages if the lower squashfs was created _without_ using the
> > > > "-no-exports" switch.
> > > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > > gets compared
> > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
> > > >
> > > > A little debugging shows, that the upper files dentry name does not fit to
> > > > the dentry name of the new lower dentry as it seems to look for the inode
> > > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > > the lower fs.
> > > >
> > > > Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> > > > without the export-lookup-table, seems to work, but it might be no longer
> > > > exportable using nfs (which is ok and we can keep with it).
> 
> Miklos,
> 
> At first glance I did not understand how changing lower file handles causes
> failure to ovl_verify_inode().
> To complete the picture, here is the explanation.
> 
> Upper file A was copied up from lower file with inode 10 in old squashfs
> and the "origin" file handle composed of the inode number 10 is recorded
> in upper file A.
> 
> With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> Upper file B is copied from lower file B with inode 10 in new squashfs and
> the "origin" file handle composed of the inode number 10 is recorded
> in upper file B.
> Now we have two upper files with the same "origin" that are not hardlinks.
> 
> On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> file B (inode 10) as the lower inode.
> This lower inode is used to get the overlay inode number (10) and as
> the key to hash overlay inode in inode cache.
> 
> Suppose A is looked up first and it's inode is hashed.
> Then B is looked up and in ovl_get_inode() it finds the inode hashed
> by the same lower inode in inode cache, but fails ovl_verify_inode()
> because:
> d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> 
> This can also happen when copying overlay layers to a new
> fs tree and carrying over the old "origin" xattr.
> In practice, the UUID part of the stored "origin" xattr is meant to
> protect against decoding lower fh when migrating to another
> filesystem, but layers could be migrated inside the same filesystem.
> Since squashfs does not have a UUID, re-creating sqhashfs is similar
> to migrating layers inside the same filesystem.

Hi Amir,

So we can't use "origin" if lower layers have been copied. If they
have been copied to a different filesystem with uuid we seem to
have a mechanism to detect it but otherwise not and we can run
into these kind of issues.

My question is, why do we allow copying or updating lower layers
with same upper when we know this will break stored origin in
upper. Can't I do same thing with ext4/xfs, where I recreate
lower layers when inode numbers get exchanged  and same problem
will happen (despite uuid being same). 

IOW, how can we support copying layers (with same upper) while origin
is in use. Rest of the features are built on top of the assumption
that origin is valid. And in case of copying layers, we don't
seem to have a sure way to find if origin is valid or not.

Should we put a restrictions that if lower layers are updated or
moved then upper should be rotated as lower layer and a new
fresh upper should be used instead?

I might be missing something very fundamental, but before I try
to understand fine details of all the features built on top of
"origin", I fail to understand that how can we allow changing
lower layers and still expect "origin" to be valid and use it.

Thanks
Vivek
> 
> We were aware of the "layer migration" case when designing the
> index/nfs_export feature, which is one of the reasons they are
> opt-in features.
> 
> But we enabled the functionality of following non-dir origin
> unconditionally because we *thought* it is harmless, as the comment
> in ovl_lookup() says:
> 
>          /*
>          * Lookup copy up origin by decoding origin file handle.
>          * We may get a disconnected dentry, which is fine,
>          * because we only need to hold the origin inode in
>          * cache and use its inode number.  We may even get a
>          * connected dentry, that is not under any of the lower
>          * layers root.  That is also fine for using it's inode
>          * number - it's the same as if we held a reference
>          * to a dentry in lower layer that was moved under us.
>          */
> 
> The patch I posted disabled decoding of non-dir origin for the special
> case of lower null uuid.
> 
> I think we can also warn and auto-disable decoding non-dir origin in
> case index is disabled and we detect this upper inode conflict in
> ovl_verify_inode().
> 
> The problem is if A is not metacopy and looked up first, and B is
> metacopy and looked up second, then conflict will be deleted after
> the wrong inode has been hashed.
> 
> Perhaps we should disable decoding non-dir origin with in case
> metacopy=on,index=off?
> Maybe also provide a user option to disable decoding non-dir origin
> at the price of losing persistent inode number for copied up non-dir?
> Something like 'index=nofollow'.
> 
> Thoughts?
> Am I overthinking this?
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-07 15:51       ` Vivek Goyal
@ 2020-07-07 17:41         ` Amir Goldstein
  2020-07-07 21:53           ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-07 17:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, Fabian

> > Miklos,
> >
> > At first glance I did not understand how changing lower file handles causes
> > failure to ovl_verify_inode().
> > To complete the picture, here is the explanation.
> >
> > Upper file A was copied up from lower file with inode 10 in old squashfs
> > and the "origin" file handle composed of the inode number 10 is recorded
> > in upper file A.
> >
> > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > the "origin" file handle composed of the inode number 10 is recorded
> > in upper file B.
> > Now we have two upper files with the same "origin" that are not hardlinks.
> >
> > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > file B (inode 10) as the lower inode.
> > This lower inode is used to get the overlay inode number (10) and as
> > the key to hash overlay inode in inode cache.
> >
> > Suppose A is looked up first and it's inode is hashed.
> > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > because:
> > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> >
> > This can also happen when copying overlay layers to a new
> > fs tree and carrying over the old "origin" xattr.
> > In practice, the UUID part of the stored "origin" xattr is meant to
> > protect against decoding lower fh when migrating to another
> > filesystem, but layers could be migrated inside the same filesystem.
> > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > to migrating layers inside the same filesystem.
>
> Hi Amir,
>
> So we can't use "origin" if lower layers have been copied. If they
> have been copied to a different filesystem with uuid we seem to
> have a mechanism to detect it but otherwise not and we can run
> into these kind of issues.
>

Correct.

> My question is, why do we allow copying or updating lower layers
> with same upper when we know this will break stored origin in
> upper.

I don't know if we "allow" this. We never considered the case expect
for nfs export and index, see overlayfs.rst:
"When the overlay NFS export feature is enabled, overlay filesystems
behavior on offline changes of the underlying lower layer is different
than the behavior when NFS export is disabled. ..."

> Can't I do same thing with ext4/xfs, where I recreate
> lower layers when inode numbers get exchanged  and same problem
> will happen (despite uuid being same).
>

Same problem.

> IOW, how can we support copying layers (with same upper) while origin
> is in use. Rest of the features are built on top of the assumption
> that origin is valid. And in case of copying layers, we don't
> seem to have a sure way to find if origin is valid or not.
>

With index/nfs_export enabled we at least do:
/* Verify lower root is upper root origin */
and if verification fails we disable the feature.

> Should we put a restrictions that if lower layers are updated or
> moved then upper should be rotated as lower layer and a new
> fresh upper should be used instead?
>

Don't know and it doesn't matter.
Fabian's bug report is for an old setup that used to work.
He did not opt-in to any new feature.
In kernel v4.12 we added "origin" for persistent inode numbers on same-fs
and we didn't make it an opt-in feature.
This is when the regression of re-creating lower happened, but back then
squashfs (null uuid) did not suffer from the problem.

> I might be missing something very fundamental, but before I try
> to understand fine details of all the features built on top of
> "origin", I fail to understand that how can we allow changing
> lower layers and still expect "origin" to be valid and use it.
>

See the text below.
We *thought* that using "origin" for persistent st_ino was safe
even if lower layers were copied on the same fs, because we thought
If the file handle can be decoded then it guarantees a unique inode
number even if it is in the old lower tree and not the new copied tree.
For me the reported bug was an oversight, but maybe Miklos has
another way of looking at this.

Thanks,
Amir.

> >
> > We were aware of the "layer migration" case when designing the
> > index/nfs_export feature, which is one of the reasons they are
> > opt-in features.
> >
> > But we enabled the functionality of following non-dir origin
> > unconditionally because we *thought* it is harmless, as the comment
> > in ovl_lookup() says:
> >
> >          /*
> >          * Lookup copy up origin by decoding origin file handle.
> >          * We may get a disconnected dentry, which is fine,
> >          * because we only need to hold the origin inode in
> >          * cache and use its inode number.  We may even get a
> >          * connected dentry, that is not under any of the lower
> >          * layers root.  That is also fine for using it's inode
> >          * number - it's the same as if we held a reference
> >          * to a dentry in lower layer that was moved under us.
> >          */
> >
> > The patch I posted disabled decoding of non-dir origin for the special
> > case of lower null uuid.
> >
> > I think we can also warn and auto-disable decoding non-dir origin in
> > case index is disabled and we detect this upper inode conflict in
> > ovl_verify_inode().
> >
> > The problem is if A is not metacopy and looked up first, and B is
> > metacopy and looked up second, then conflict will be deleted after
> > the wrong inode has been hashed.
> >
> > Perhaps we should disable decoding non-dir origin with in case
> > metacopy=on,index=off?
> > Maybe also provide a user option to disable decoding non-dir origin
> > at the price of losing persistent inode number for copied up non-dir?
> > Something like 'index=nofollow'.
> >
> > Thoughts?
> > Am I overthinking this?
> >
> > Thanks,
> > Amir.
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-07 17:41         ` Amir Goldstein
@ 2020-07-07 21:53           ` Vivek Goyal
  2020-07-08  6:55             ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-07-07 21:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Fabian

On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > Miklos,
> > >
> > > At first glance I did not understand how changing lower file handles causes
> > > failure to ovl_verify_inode().
> > > To complete the picture, here is the explanation.
> > >
> > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > and the "origin" file handle composed of the inode number 10 is recorded
> > > in upper file A.
> > >
> > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > the "origin" file handle composed of the inode number 10 is recorded
> > > in upper file B.
> > > Now we have two upper files with the same "origin" that are not hardlinks.
> > >
> > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > file B (inode 10) as the lower inode.
> > > This lower inode is used to get the overlay inode number (10) and as
> > > the key to hash overlay inode in inode cache.
> > >
> > > Suppose A is looked up first and it's inode is hashed.
> > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > because:
> > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > >
> > > This can also happen when copying overlay layers to a new
> > > fs tree and carrying over the old "origin" xattr.
> > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > protect against decoding lower fh when migrating to another
> > > filesystem, but layers could be migrated inside the same filesystem.
> > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > to migrating layers inside the same filesystem.
> >
> > Hi Amir,
> >
> > So we can't use "origin" if lower layers have been copied. If they
> > have been copied to a different filesystem with uuid we seem to
> > have a mechanism to detect it but otherwise not and we can run
> > into these kind of issues.
> >
> 
> Correct.
> 
> > My question is, why do we allow copying or updating lower layers
> > with same upper when we know this will break stored origin in
> > upper.
> 
> I don't know if we "allow" this.

So there seem to two cases. One is copying the lower layers to same
filesystem or a different filesystem. And another case is recreating
the lower layers and use previous upper with old upper. IIUC, we
are currently facing problem with second scenario.

"copying the lower layers" is probably fine as you said because old
tree still keeps that inode number busy and newly created inode
should not acquire that number. 

But in this case looks like we recreated lower and that led to 
some file B acquiring an inode number which was used by A. So
this is a different use case. IIUC, simply copying the layer
will not lead to this situation.

Now question is do we support recreating the lower layer with existing
upper. And I see following text in "Sharing and copying layer"

"Mounting an overlay using an upper layer path, where the upper layer path
was previously used by another mounted overlay in combination with a
different lower layer path, is allowed, unless the "inodes index" feature
or "metadata only copy up" feature is enabled."

This seems to suggest that recreating lower layer is allowed as long
as you are not using index or metacopy feature. 

If that's the case, probably "origin" should have been an opt-in
feature or automatically be enabled by some other opt-in feature.


> We never considered the case expect
> for nfs export and index, see overlayfs.rst:
> "When the overlay NFS export feature is enabled, overlay filesystems
> behavior on offline changes of the underlying lower layer is different
> than the behavior when NFS export is disabled. ..."
> 
> > Can't I do same thing with ext4/xfs, where I recreate
> > lower layers when inode numbers get exchanged  and same problem
> > will happen (despite uuid being same).
> >
> 
> Same problem.

> 
> > IOW, how can we support copying layers (with same upper) while origin
> > is in use. Rest of the features are built on top of the assumption
> > that origin is valid. And in case of copying layers, we don't
> > seem to have a sure way to find if origin is valid or not.
> >
> 
> With index/nfs_export enabled we at least do:
> /* Verify lower root is upper root origin */
> and if verification fails we disable the feature.

So discrepancy is still possible if somebody modifies lower layers
without changing lower root.

- Copy up file A.
- Unmount overlay
- unlink A
- create B (assume B gets same inode number as A).
- mount overlay; B gets copied up.

And now we have both upper A and B having same origin despite no
hardlink. Am I understanding it right?

Is there a good use case for allowing modifying lower layers with
same upper. Given we are adding more complex features to overlayfs,
will it make sense to not allow modifying lower layer going forward.
We might not be able to detect it but atleast it will be unsupported
configuration. And then we can only focus to provide a work around
for existing use cases.

[..]
> > >
> > > We were aware of the "layer migration" case when designing the
> > > index/nfs_export feature, which is one of the reasons they are
> > > opt-in features.
> > >
> > > But we enabled the functionality of following non-dir origin
> > > unconditionally because we *thought* it is harmless, as the comment
> > > in ovl_lookup() says:
> > >
> > >          /*
> > >          * Lookup copy up origin by decoding origin file handle.
> > >          * We may get a disconnected dentry, which is fine,
> > >          * because we only need to hold the origin inode in
> > >          * cache and use its inode number.  We may even get a
> > >          * connected dentry, that is not under any of the lower
> > >          * layers root.  That is also fine for using it's inode
> > >          * number - it's the same as if we held a reference
> > >          * to a dentry in lower layer that was moved under us.
> > >          */
> > >
> > > The patch I posted disabled decoding of non-dir origin for the special
> > > case of lower null uuid.
> > >
> > > I think we can also warn and auto-disable decoding non-dir origin in
> > > case index is disabled and we detect this upper inode conflict in
> > > ovl_verify_inode().
> > >
> > > The problem is if A is not metacopy and looked up first, and B is
> > > metacopy and looked up second, then conflict will be deleted after
> > > the wrong inode has been hashed.

We don't allow modifying lower layers if metacopy is enabled.

"For "metadata only copy up" feature there is no verification mechanism at
mount time. So if same upper is mounted with different set of lower, mount
probably will succeed but expect the unexpected later on. So don't do it.
"

So if somebody is recreating lower or modifying lower with metacopy
on, its an unsupported configuration.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-07 21:53           ` Vivek Goyal
@ 2020-07-08  6:55             ` Amir Goldstein
  2020-07-08  8:00               ` Miklos Szeredi
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-08  6:55 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, Fabian

On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > > Miklos,
> > > >
> > > > At first glance I did not understand how changing lower file handles causes
> > > > failure to ovl_verify_inode().
> > > > To complete the picture, here is the explanation.
> > > >
> > > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > > and the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file A.
> > > >
> > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > > the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file B.
> > > > Now we have two upper files with the same "origin" that are not hardlinks.
> > > >
> > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > > file B (inode 10) as the lower inode.
> > > > This lower inode is used to get the overlay inode number (10) and as
> > > > the key to hash overlay inode in inode cache.
> > > >
> > > > Suppose A is looked up first and it's inode is hashed.
> > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > > because:
> > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > > >
> > > > This can also happen when copying overlay layers to a new
> > > > fs tree and carrying over the old "origin" xattr.
> > > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > > protect against decoding lower fh when migrating to another
> > > > filesystem, but layers could be migrated inside the same filesystem.
> > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > > to migrating layers inside the same filesystem.
> > >
> > > Hi Amir,
> > >
> > > So we can't use "origin" if lower layers have been copied. If they
> > > have been copied to a different filesystem with uuid we seem to
> > > have a mechanism to detect it but otherwise not and we can run
> > > into these kind of issues.
> > >
> >
> > Correct.
> >
> > > My question is, why do we allow copying or updating lower layers
> > > with same upper when we know this will break stored origin in
> > > upper.
> >
> > I don't know if we "allow" this.
>
> So there seem to two cases. One is copying the lower layers to same
> filesystem or a different filesystem. And another case is recreating
> the lower layers and use previous upper with old upper. IIUC, we
> are currently facing problem with second scenario.
>
> "copying the lower layers" is probably fine as you said because old
> tree still keeps that inode number busy and newly created inode
> should not acquire that number.
>
> But in this case looks like we recreated lower and that led to
> some file B acquiring an inode number which was used by A. So
> this is a different use case. IIUC, simply copying the layer
> will not lead to this situation.
>
> Now question is do we support recreating the lower layer with existing
> upper. And I see following text in "Sharing and copying layer"
>
> "Mounting an overlay using an upper layer path, where the upper layer path
> was previously used by another mounted overlay in combination with a
> different lower layer path, is allowed, unless the "inodes index" feature
> or "metadata only copy up" feature is enabled."
>
> This seems to suggest that recreating lower layer is allowed as long
> as you are not using index or metacopy feature.
>
> If that's the case, probably "origin" should have been an opt-in
> feature or automatically be enabled by some other opt-in feature.
>
>

Yes. "should have been". That is my point.
We thought it was a victimless crime to enable "origin" without opt-in
and I think we were wrong.

> > We never considered the case expect
> > for nfs export and index, see overlayfs.rst:
> > "When the overlay NFS export feature is enabled, overlay filesystems
> > behavior on offline changes of the underlying lower layer is different
> > than the behavior when NFS export is disabled. ..."
> >
> > > Can't I do same thing with ext4/xfs, where I recreate
> > > lower layers when inode numbers get exchanged  and same problem
> > > will happen (despite uuid being same).
> > >
> >
> > Same problem.
>
> >
> > > IOW, how can we support copying layers (with same upper) while origin
> > > is in use. Rest of the features are built on top of the assumption
> > > that origin is valid. And in case of copying layers, we don't
> > > seem to have a sure way to find if origin is valid or not.
> > >
> >
> > With index/nfs_export enabled we at least do:
> > /* Verify lower root is upper root origin */
> > and if verification fails we disable the feature.
>
> So discrepancy is still possible if somebody modifies lower layers
> without changing lower root.
>
> - Copy up file A.
> - Unmount overlay
> - unlink A
> - create B (assume B gets same inode number as A).
> - mount overlay; B gets copied up.
>
> And now we have both upper A and B having same origin despite no
> hardlink. Am I understanding it right?
>

Yes.

> Is there a good use case for allowing modifying lower layers with
> same upper. Given we are adding more complex features to overlayfs,
> will it make sense to not allow modifying lower layer going forward.
> We might not be able to detect it but atleast it will be unsupported
> configuration. And then we can only focus to provide a work around
> for existing use cases.
>

Agreed.

> [..]
> > > >
> > > > We were aware of the "layer migration" case when designing the
> > > > index/nfs_export feature, which is one of the reasons they are
> > > > opt-in features.
> > > >
> > > > But we enabled the functionality of following non-dir origin
> > > > unconditionally because we *thought* it is harmless, as the comment
> > > > in ovl_lookup() says:
> > > >
> > > >          /*
> > > >          * Lookup copy up origin by decoding origin file handle.
> > > >          * We may get a disconnected dentry, which is fine,
> > > >          * because we only need to hold the origin inode in
> > > >          * cache and use its inode number.  We may even get a
> > > >          * connected dentry, that is not under any of the lower
> > > >          * layers root.  That is also fine for using it's inode
> > > >          * number - it's the same as if we held a reference
> > > >          * to a dentry in lower layer that was moved under us.
> > > >          */
> > > >
> > > > The patch I posted disabled decoding of non-dir origin for the special
> > > > case of lower null uuid.
> > > >
> > > > I think we can also warn and auto-disable decoding non-dir origin in
> > > > case index is disabled and we detect this upper inode conflict in
> > > > ovl_verify_inode().
> > > >
> > > > The problem is if A is not metacopy and looked up first, and B is
> > > > metacopy and looked up second, then conflict will be deleted after
> > > > the wrong inode has been hashed.
>
> We don't allow modifying lower layers if metacopy is enabled.
>
> "For "metadata only copy up" feature there is no verification mechanism at
> mount time. So if same upper is mounted with different set of lower, mount
> probably will succeed but expect the unexpected later on. So don't do it.
> "
>
> So if somebody is recreating lower or modifying lower with metacopy
> on, its an unsupported configuration.
>

Very good. So this case is settled.

Now, suggestions for work arounds:

1. Don't follow with lower null uuid (patch posted) - no caveats
2. Opt-out of following origin with explicit option e.g. "index=nofollow"
3. Don't follow origin unless one of the following opt-in features:
    metacopy,index,xino

If we go for option #3, the easiest recommendation for distros
would be to set:
CONFIG_OVERLAY_FS_XINO_AUTO=y

Apart from "compatibility with applications that expect 32bit inodes"
I am not aware of any caveats to enabling XINO_AUTO.

The purpose of xino feature is to make overlay st_ino/st_dev more
posix-like (currently only for non samefs), so it makes some sense
to tie the "origin" xattr feature that preserves the pre copy up
persistent st_ino to xino.

The xino documentation does not mention the samefs exemption:
"...If this feature is disabled or the underlying filesystem doesn't have
enough free bits in the inode number, then overlayfs will not be able to
guarantee that the values of st_ino and st_dev returned by stat(2) and the
value of d_ino returned by readdir(3) will act like on a normal filesystem."

and the 'xino' mount option is currently not displayed with samefs.

If we implement option #3, then with samefs user could enable xino
if default is off and disable xino if default is auto and display xino in
mount options.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08  6:55             ` Amir Goldstein
@ 2020-07-08  8:00               ` Miklos Szeredi
  2020-07-08  8:31                 ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2020-07-08  8:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, Fabian

On Wed, Jul 8, 2020 at 8:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > > > Miklos,
> > > > >
> > > > > At first glance I did not understand how changing lower file handles causes
> > > > > failure to ovl_verify_inode().
> > > > > To complete the picture, here is the explanation.
> > > > >
> > > > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > > > and the "origin" file handle composed of the inode number 10 is recorded
> > > > > in upper file A.
> > > > >
> > > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > > > the "origin" file handle composed of the inode number 10 is recorded
> > > > > in upper file B.
> > > > > Now we have two upper files with the same "origin" that are not hardlinks.
> > > > >
> > > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > > > file B (inode 10) as the lower inode.
> > > > > This lower inode is used to get the overlay inode number (10) and as
> > > > > the key to hash overlay inode in inode cache.
> > > > >
> > > > > Suppose A is looked up first and it's inode is hashed.
> > > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > > > because:
> > > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > > > >
> > > > > This can also happen when copying overlay layers to a new
> > > > > fs tree and carrying over the old "origin" xattr.
> > > > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > > > protect against decoding lower fh when migrating to another
> > > > > filesystem, but layers could be migrated inside the same filesystem.
> > > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > > > to migrating layers inside the same filesystem.

No, I think copying layers is a different issue: it's not possible to
get the same file handle for A and B since they are

 - either on different filesystems and uuid is different
 - on the same filesystem, hence fh must be unique for the lifetime of the fs

> Now, suggestions for work arounds:
>
> 1. Don't follow with lower null uuid (patch posted) - no caveats

We could add some sort of "overlay version"  to origin to be able to
trust null uuid, but only if created in *this* instance of overlayfs.
That way we retain some of the advantages without any of the
drawbacks.   This does not seem too complex theoretically, but not
sure if it's worth it.

> 2. Opt-out of following origin with explicit option e.g. "index=nofollow"
> 3. Don't follow origin unless one of the following opt-in features:
>     metacopy,index,xino

Maybe, if 1) is problematic for some reason.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08  8:00               ` Miklos Szeredi
@ 2020-07-08  8:31                 ` Amir Goldstein
  2020-07-08  8:37                   ` Miklos Szeredi
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-08  8:31 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, Fabian

On Wed, Jul 8, 2020 at 11:00 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, Jul 8, 2020 at 8:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > > > > Miklos,
> > > > > >
> > > > > > At first glance I did not understand how changing lower file handles causes
> > > > > > failure to ovl_verify_inode().
> > > > > > To complete the picture, here is the explanation.
> > > > > >
> > > > > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > > > > and the "origin" file handle composed of the inode number 10 is recorded
> > > > > > in upper file A.
> > > > > >
> > > > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > > > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > > > > the "origin" file handle composed of the inode number 10 is recorded
> > > > > > in upper file B.
> > > > > > Now we have two upper files with the same "origin" that are not hardlinks.
> > > > > >
> > > > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > > > > file B (inode 10) as the lower inode.
> > > > > > This lower inode is used to get the overlay inode number (10) and as
> > > > > > the key to hash overlay inode in inode cache.
> > > > > >
> > > > > > Suppose A is looked up first and it's inode is hashed.
> > > > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > > > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > > > > because:
> > > > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > > > > >
> > > > > > This can also happen when copying overlay layers to a new
> > > > > > fs tree and carrying over the old "origin" xattr.
> > > > > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > > > > protect against decoding lower fh when migrating to another
> > > > > > filesystem, but layers could be migrated inside the same filesystem.
> > > > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > > > > to migrating layers inside the same filesystem.
>
> No, I think copying layers is a different issue: it's not possible to
> get the same file handle for A and B since they are
>
>  - either on different filesystems and uuid is different
>  - on the same filesystem, hence fh must be unique for the lifetime of the fs
>
> > Now, suggestions for work arounds:
> >
> > 1. Don't follow with lower null uuid (patch posted) - no caveats
>
> We could add some sort of "overlay version"  to origin to be able to
> trust null uuid, but only if created in *this* instance of overlayfs.
> That way we retain some of the advantages without any of the
> drawbacks.   This does not seem too complex theoretically, but not
> sure if it's worth it.

Don't think it is worth it.
It is a corner case of a special case.

>
> > 2. Opt-out of following origin with explicit option e.g. "index=nofollow"
> > 3. Don't follow origin unless one of the following opt-in features:
> >     metacopy,index,xino
>
> Maybe, if 1) is problematic for some reason.
>

1) is not problematic IMO and the simple patch I posted may be applied
for fixing the reported issue, but it only solved the special case of null uuid.
The problem still exists with re-creating lower on xfs/ext4, e.g. by
rm -rf and unpacking image tar.

So either we solve only the special case and wait for someone to complain
on the non-null uuid case or we try to make the code more robust.

Suggesting improvement for auto detecting re-created lower:
- On mount, even with index disabled, do ovl_verify_origin() of lower root
- On lookup of dir, do ovl_verify_origin() regardless of ovl_verify_lower()
- When ovl_verify_origin() on some dir fails and index/metacopy are disabled,
  warn and set ofs->bad_origin to disable ovl_check_origin() from here on
- Maybe also set null fh in ovl_set_origin() from here on?
- In ovl_dentry_*revalidate(), if non-dir has lower and ofs->bad_origin,
  invalidate the dentry
- Can also set ofs->bad_origin on ovl_verify_inode() for conflicting
  upper inode as reported in the bug

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08  8:31                 ` Amir Goldstein
@ 2020-07-08  8:37                   ` Miklos Szeredi
  2020-07-08  8:50                     ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2020-07-08  8:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, Fabian

On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:

>
> 1) is not problematic IMO and the simple patch I posted may be applied
> for fixing the reported issue, but it only solved the special case of null uuid.
> The problem still exists with re-creating lower on xfs/ext4, e.g. by
> rm -rf and unpacking image tar.

How so?  st_ino may be reused but the fh is guaranteed to be unique.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08  8:37                   ` Miklos Szeredi
@ 2020-07-08  8:50                     ` Amir Goldstein
  2020-07-08 14:23                       ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-08  8:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, Fabian

On Wed, Jul 8, 2020 at 11:37 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> >
> > 1) is not problematic IMO and the simple patch I posted may be applied
> > for fixing the reported issue, but it only solved the special case of null uuid.
> > The problem still exists with re-creating lower on xfs/ext4, e.g. by
> > rm -rf and unpacking image tar.
>
> How so?  st_ino may be reused but the fh is guaranteed to be unique.
>

Doh! You are right. I was talking nonsense.
The only problem would be with re-creating an xfs/ext4 lower image
with the same uuid maybe because a basic image is cloned.

In any case, it's a corner of a corner of a corner.
I will post the patch to fix null uuid.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08  8:50                     ` Amir Goldstein
@ 2020-07-08 14:23                       ` Vivek Goyal
  2020-07-08 14:26                         ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-07-08 14:23 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Fabian

On Wed, Jul 08, 2020 at 11:50:29AM +0300, Amir Goldstein wrote:
> On Wed, Jul 8, 2020 at 11:37 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > >
> > > 1) is not problematic IMO and the simple patch I posted may be applied
> > > for fixing the reported issue, but it only solved the special case of null uuid.
> > > The problem still exists with re-creating lower on xfs/ext4, e.g. by
> > > rm -rf and unpacking image tar.
> >
> > How so?  st_ino may be reused but the fh is guaranteed to be unique.
> >
> 
> Doh! You are right. I was talking nonsense.
> The only problem would be with re-creating an xfs/ext4 lower image
> with the same uuid maybe because a basic image is cloned.
> 
> In any case, it's a corner of a corner of a corner.
> I will post the patch to fix null uuid.

It will also be good if we can bring some clarity to the documentation
for future references in section "Sharing and copying layers".

So if IIUC,

- sharing layers should work with all features of overlayfs.

- copying layers works only if index and nfs_export is not enabled. Even
  if index is not enabled, copying layers will change inode number
  reporting behavior (as origin verification will fail). We probably
  say something about this.

- Modifying/recreating lower layer only works when
  metacopy/index/nfs_export are not enabled at any point of time. This
  also will change inode number reporting behavior.

Is that a fair understanding? I am not sure how exactly inode number
reporting will change when lower layers are copied or recreated.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08 14:23                       ` Vivek Goyal
@ 2020-07-08 14:26                         ` Vivek Goyal
  2020-07-08 17:26                           ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-07-08 14:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Fabian

On Wed, Jul 08, 2020 at 10:23:53AM -0400, Vivek Goyal wrote:
> On Wed, Jul 08, 2020 at 11:50:29AM +0300, Amir Goldstein wrote:
> > On Wed, Jul 8, 2020 at 11:37 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > >
> > > > 1) is not problematic IMO and the simple patch I posted may be applied
> > > > for fixing the reported issue, but it only solved the special case of null uuid.
> > > > The problem still exists with re-creating lower on xfs/ext4, e.g. by
> > > > rm -rf and unpacking image tar.
> > >
> > > How so?  st_ino may be reused but the fh is guaranteed to be unique.
> > >
> > 
> > Doh! You are right. I was talking nonsense.
> > The only problem would be with re-creating an xfs/ext4 lower image
> > with the same uuid maybe because a basic image is cloned.
> > 
> > In any case, it's a corner of a corner of a corner.
> > I will post the patch to fix null uuid.
> 
> It will also be good if we can bring some clarity to the documentation
> for future references in section "Sharing and copying layers".
> 
> So if IIUC,
> 
> - sharing layers should work with all features of overlayfs.
> 
> - copying layers works only if index and nfs_export is not enabled. Even
>   if index is not enabled, copying layers will change inode number
>   reporting behavior (as origin verification will fail). We probably
>   say something about this.
> 
> - Modifying/recreating lower layer only works when
>   metacopy/index/nfs_export are not enabled at any point of time. This
>   also will change inode number reporting behavior.

Well, this is not entirely true. redirect might be broken if lower layers have
been modified/recreated and that will have issues with directories.

/me is again wondering what's the use case of modifying lower layer
with an existing upper. Is it fair to say, no don't recreate/modify
lower layers and use with existing upper.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08 14:26                         ` Vivek Goyal
@ 2020-07-08 17:26                           ` Amir Goldstein
  2020-07-08 17:36                             ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2020-07-08 17:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, Fabian

On Wed, Jul 8, 2020 at 5:27 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Jul 08, 2020 at 10:23:53AM -0400, Vivek Goyal wrote:
> > On Wed, Jul 08, 2020 at 11:50:29AM +0300, Amir Goldstein wrote:
> > > On Wed, Jul 8, 2020 at 11:37 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > >
> > > > > 1) is not problematic IMO and the simple patch I posted may be applied
> > > > > for fixing the reported issue, but it only solved the special case of null uuid.
> > > > > The problem still exists with re-creating lower on xfs/ext4, e.g. by
> > > > > rm -rf and unpacking image tar.
> > > >
> > > > How so?  st_ino may be reused but the fh is guaranteed to be unique.
> > > >
> > >
> > > Doh! You are right. I was talking nonsense.
> > > The only problem would be with re-creating an xfs/ext4 lower image
> > > with the same uuid maybe because a basic image is cloned.
> > >
> > > In any case, it's a corner of a corner of a corner.
> > > I will post the patch to fix null uuid.
> >
> > It will also be good if we can bring some clarity to the documentation
> > for future references in section "Sharing and copying layers".

I am very bad at documenting.
Feel free to post a patch to add clarity.

> >
> > So if IIUC,
> >
> > - sharing layers should work with all features of overlayfs.
> >
> > - copying layers works only if index and nfs_export is not enabled. Even
> >   if index is not enabled, copying layers will change inode number
> >   reporting behavior (as origin verification will fail). We probably
> >   say something about this.
> >
> > - Modifying/recreating lower layer only works when
> >   metacopy/index/nfs_export are not enabled at any point of time. This
> >   also will change inode number reporting behavior.
>
> Well, this is not entirely true. redirect might be broken if lower layers have
> been modified/recreated and that will have issues with directories.
>
> /me is again wondering what's the use case of modifying lower layer
> with an existing upper. Is it fair to say, no don't recreate/modify
> lower layers and use with existing upper.
>

It's fine by me to document that this is not supported.
Only thing is that we usually do not want to break existing setups that
used to work if we dont have to.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-08 17:26                           ` Amir Goldstein
@ 2020-07-08 17:36                             ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-07-08 17:36 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Fabian

On Wed, Jul 08, 2020 at 08:26:19PM +0300, Amir Goldstein wrote:
> On Wed, Jul 8, 2020 at 5:27 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jul 08, 2020 at 10:23:53AM -0400, Vivek Goyal wrote:
> > > On Wed, Jul 08, 2020 at 11:50:29AM +0300, Amir Goldstein wrote:
> > > > On Wed, Jul 8, 2020 at 11:37 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > On Wed, Jul 8, 2020 at 10:31 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > >
> > > > > > 1) is not problematic IMO and the simple patch I posted may be applied
> > > > > > for fixing the reported issue, but it only solved the special case of null uuid.
> > > > > > The problem still exists with re-creating lower on xfs/ext4, e.g. by
> > > > > > rm -rf and unpacking image tar.
> > > > >
> > > > > How so?  st_ino may be reused but the fh is guaranteed to be unique.
> > > > >
> > > >
> > > > Doh! You are right. I was talking nonsense.
> > > > The only problem would be with re-creating an xfs/ext4 lower image
> > > > with the same uuid maybe because a basic image is cloned.
> > > >
> > > > In any case, it's a corner of a corner of a corner.
> > > > I will post the patch to fix null uuid.
> > >
> > > It will also be good if we can bring some clarity to the documentation
> > > for future references in section "Sharing and copying layers".
> 
> I am very bad at documenting.
> Feel free to post a patch to add clarity.

Ok, I will send a patch and improve it based on feedback.

> 
> > >
> > > So if IIUC,
> > >
> > > - sharing layers should work with all features of overlayfs.
> > >
> > > - copying layers works only if index and nfs_export is not enabled. Even
> > >   if index is not enabled, copying layers will change inode number
> > >   reporting behavior (as origin verification will fail). We probably
> > >   say something about this.
> > >
> > > - Modifying/recreating lower layer only works when
> > >   metacopy/index/nfs_export are not enabled at any point of time. This
> > >   also will change inode number reporting behavior.
> >
> > Well, this is not entirely true. redirect might be broken if lower layers have
> > been modified/recreated and that will have issues with directories.
> >
> > /me is again wondering what's the use case of modifying lower layer
> > with an existing upper. Is it fair to say, no don't recreate/modify
> > lower layers and use with existing upper.
> >
> 
> It's fine by me to document that this is not supported.
> Only thing is that we usually do not want to break existing setups that
> used to work if we dont have to.

I will limit this case to "!redirect, !metacopy, !index, !export"
and that will make sure existing setups work and any of the new features
we added will not support this.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: overlayfs: issue with a replaced lower squashfs with export-table
  2020-07-06 17:14         ` Amir Goldstein
@ 2020-07-16 13:29           ` Fabian Godehardt
  0 siblings, 0 replies; 20+ messages in thread
From: Fabian Godehardt @ 2020-07-16 13:29 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: overlayfs

Hi Amir,

Am Montag, 6. Juli 2020, 19:14:11 CEST schrieb Amir Goldstein:
> On Mon, Jul 6, 2020 at 7:10 PM Fabian <godi.beat@gmx.net> wrote:
> > Hi Amir,
> >
> > Am Montag, 6. Juli 2020, 17:33:54 CEST schrieb Amir Goldstein:
> > > On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@gmx.net> wrote:
> > > > Hi Amir,
> > > >
> > > > thanks for your mail and the quick reply!
> > > >
> > > > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > > > > We are seeing problems using an read-writeable overlayfs (upper)
> > > > > > on a
> > > > > > readonly squashfs (lower). The squashfs gets an update from time
> > > > > > to
> > > > > > time
> > > > > > while we keep the upper overlayfs.
> > > > >
> > > > > It gets updated while the overlay is offline (not mounted) correct?
> > > >
> > > > Yes. We boot into a recovery system outside the rootfs and its
> > > > overlayfs,
> > > > replace the lower squashfs, and then reboot into the new system.
> > > >
> > > > > > On replaced files we then see -ESTALE ("overlayfs: failed to get
> > > > > > inode
> > > > > > (-116)") messages if the lower squashfs was created _without_
> > > > > > using
> > > > > > the
> > > > > > "-no-exports" switch.
> > > > > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > > > > ovl_verify_inode() and returns on the line where the upperdentry
> > > > > > inode
> > > > > > gets compared
> > > > > > ( if (upperdentry && ovl_inode_upper(inode) !=
> > > > > > d_inode(upperdentry))
> > > > > > ).
> > > > > >
> > > > > > A little debugging shows, that the upper files dentry name does
> > > > > > not
> > > > > > fit to
> > > > > > the dentry name of the new lower dentry as it seems to look for
> > > > > > the
> > > > > > inode
> > > > > > on the squashfs "export"-lookup-table which has changed as we
> > > > > > replaced
> > > > > > the lower fs.
> > > > > >
> > > > > > Building the lower squashfs with the "-no-exports"-mksquashfs
> > > > > > option,
> > > > > > so
> > > > > > without the export-lookup-table, seems to work, but it might be no
> > > > > > longer
> > > > > > exportable using nfs (which is ok and we can keep with it).
> > > > > >
> > > > > > As we didn't find any other information regarding this behaviour
> > > > > > or
> > > > > > anyone
> > > > > > who also had this problem before we just want to know if this is
> > > > > > the
> > > > > > right way to use the rw overlayfs on a (replaceable) ro squashfs
> > > > > > filesystem.
> > > > > >
> > > > > > Is this a known issue? Is it really needed to disable the export
> > > > > > feature
> > > > > > when using overlayfs on a squashfs if we later need to replace
> > > > > > squashfs
> > > > > > during an update? Any hints we can have a look on if this should
> > > > > > work
> > > > > > and
> > > > > > we might have done wrong during squashfs or overlayfs creation?
> > > > >
> > > > > This sounds like an unintentional outcome of:
> > > > > 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs
> > > > >
> > > > > Which enabled nfs_export for overlay with lower squashfs.
> > > > >
> > > > > If you do not need to export overlayfs to NFS, then you can check if
> > > > > the
> > > > > attached patch solves your problem.
> > > >
> > > > With the attached patch i'm now getting to a point where the overlayfs
> > > > tries to handle the /run-directory (a symlink). There seems to be a
> > > > -ESTALE at ovl_check_origin_fh() after the for-loop where it checks if
> > > > origin was not found ( if (!origin) ). Maybe i should debug for more
> > > > details here? Please let me know.
> > >
> > > This is expected. Does it cause any problem?
> > >
> > > The patch marks the lower squashfs as "bad_uuid", because:
> > >         if (!ofs->config.index && uuid_is_null(uuid))
> > >
> > >                 return false;
> > >
> > > ...
> > >
> > >         if (!ovl_lower_uuid_ok(ofs, &sb->s_uuid)) {
> > >
> > >                 bad_uuid = true;
> > >
> > > ...
> > >
> > >         ofs->fs[ofs->numfs].bad_uuid = bad_uuid;
> > >
> > > That's ofs->fs[1].bad_uuid = bad_uuid;
> > >
> > >
> > > Then in ovl_lookup() => ovl_check_origin() => ovl_check_origin_fh()
> > >
> > > will return ESALE because of:
> > >                 if (ofs->layers[i].fsid &&
> > >
> > >                     ofs->layers[i].fs->bad_uuid)
> > >
> > >                         continue;
> > >
> > > And ovl_check_origin() will return 0 to ovl_lookup().
> >
> > I'm sorry. You are totaly right! RootFS now completely comes up - just
> > missed the console start in our latest inittab - so thought something
> > still hangs. The ESTALE was printed for me because i debugged the whole
> > ESTALE positions in the overlayfs code while studying the first problem.
> > Time to remove my debug code...
> >
> > We will now continue with update tests. If we see something else i will
> > let
> > you know.
>
> OK. please report back when done testing so I can add your tested-by

A lot of tests are done without any problems so far. From our point of view
the patch works very well.


Thanks again!
Fabian



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-07-16 13:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 13:27 overlayfs: issue with a replaced lower squashfs with export-table Fabian
2020-07-06 14:29 ` Amir Goldstein
2020-07-06 15:14   ` Fabian
2020-07-06 15:33     ` Amir Goldstein
2020-07-06 16:10       ` Fabian
2020-07-06 17:14         ` Amir Goldstein
2020-07-16 13:29           ` Fabian Godehardt
2020-07-07  5:51     ` Amir Goldstein
2020-07-07 15:51       ` Vivek Goyal
2020-07-07 17:41         ` Amir Goldstein
2020-07-07 21:53           ` Vivek Goyal
2020-07-08  6:55             ` Amir Goldstein
2020-07-08  8:00               ` Miklos Szeredi
2020-07-08  8:31                 ` Amir Goldstein
2020-07-08  8:37                   ` Miklos Szeredi
2020-07-08  8:50                     ` Amir Goldstein
2020-07-08 14:23                       ` Vivek Goyal
2020-07-08 14:26                         ` Vivek Goyal
2020-07-08 17:26                           ` Amir Goldstein
2020-07-08 17:36                             ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).