linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
@ 2021-08-07 10:17 Zhihao Cheng
  2021-08-07 11:05 ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Zhihao Cheng @ 2021-08-07 10:17 UTC (permalink / raw)
  To: miklos, amir73il, linux-unionfs

Hi, all.

As title said. I wonder to know the reason for overlayfs mount failure 
with '-o nfs_export=on,metacopy=on'.

I modified kernel to enable these two options 'on',  it looks like that 
overlayfs can still work fine under nfs_v4.

Besides, I can get no more information about the reason from source 
code, maybe I missed something.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-07 10:17 [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy? Zhihao Cheng
@ 2021-08-07 11:05 ` Amir Goldstein
  2021-08-07 16:37   ` Amir Goldstein
  2021-08-09 19:56   ` Vivek Goyal
  0 siblings, 2 replies; 13+ messages in thread
From: Amir Goldstein @ 2021-08-07 11:05 UTC (permalink / raw)
  To: Zhihao Cheng; +Cc: Miklos Szeredi, overlayfs, Vivek Goyal

On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
>
> Hi, all.
>
> As title said. I wonder to know the reason for overlayfs mount failure
> with '-o nfs_export=on,metacopy=on'.
>
> I modified kernel to enable these two options 'on',  it looks like that
> overlayfs can still work fine under nfs_v4.
>
> Besides, I can get no more information about the reason from source
> code, maybe I missed something.
>

It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
does not know how to construct a metacopy overlayfs inode.

Maybe Vivek will be able to point you to the discussion that lead to making
the features mutually exclusive.

I don't remember any other reason.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-07 11:05 ` Amir Goldstein
@ 2021-08-07 16:37   ` Amir Goldstein
  2021-08-09  9:21     ` Zhihao Cheng
  2021-08-09 21:35     ` Vivek Goyal
  2021-08-09 19:56   ` Vivek Goyal
  1 sibling, 2 replies; 13+ messages in thread
From: Amir Goldstein @ 2021-08-07 16:37 UTC (permalink / raw)
  To: Zhihao Cheng; +Cc: Miklos Szeredi, overlayfs, Vivek Goyal

On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> >
> > Hi, all.
> >
> > As title said. I wonder to know the reason for overlayfs mount failure
> > with '-o nfs_export=on,metacopy=on'.
> >
> > I modified kernel to enable these two options 'on',  it looks like that
> > overlayfs can still work fine under nfs_v4.
> >
> > Besides, I can get no more information about the reason from source
> > code, maybe I missed something.
> >
>
> It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> does not know how to construct a metacopy overlayfs inode.
>
> Maybe Vivek will be able to point you to the discussion that lead to making
> the features mutually exclusive.
>
> I don't remember any other reason.
>

I remembered some more details...

I think the main complication discussed w.r.t decoding a metacopy
inode was for the case where ovl_inode_lowerdata() differs from
ovl_inode_lower().

If we had a weaker variant of metacopy (e.g. metacopy=upper) that
only allows creating and following metacopy inodes in the upper layer,
it would have been simpler to implement ovl_obtain_alias().

Specifically, when ofs->numlayer == 2 (single lower layer), there can
be no valid metacopy inodes in the lower layer, so that configuration
should also be rather easy to support.

Basically, for ovl_obtain_alias():
- 'lowerpath' must not have metadata xattr
- 'upper_alias' must not have metadata xattr
- If 'index' has metacopy xattr, OVL_UPPERDATA flag
  should not be set on ovl inode

But there are bigger complications w.r.t disconnected dentry.
Overlayfs knows how to decode, work with and copy up
disconnected dentries (parent is unknown), but in ovl_link(old, ...),
'old' dentry must not be a disconnected metacopy dentry when
calling ovl_set_link_redirect() => ... ovl_get_redirect(), so we will
also need to:
- On copy up of a disconnected lower, do not use metacopy
- Copy up data before ovl_link() when nfs_export is enabled
- In ovl_obtain_alias(), if 'index' has redirect:
-- Verify that it is an absolute path that it is resolved to the
'lowerpath's inode
-- oip.redirect needs to be passed to ovl_get_inode()
-- ovl_verify_inode() needs to verify that oip.redirect matches
   redirect that is found on existing ovl inode

And probably other things that I am forgetting...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-07 16:37   ` Amir Goldstein
@ 2021-08-09  9:21     ` Zhihao Cheng
  2021-08-09 21:35     ` Vivek Goyal
  1 sibling, 0 replies; 13+ messages in thread
From: Zhihao Cheng @ 2021-08-09  9:21 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, Vivek Goyal

在 2021/8/8 0:37, Amir Goldstein 写道:
> I remembered some more details...
>
> I think the main complication discussed w.r.t decoding a metacopy
> inode was for the case where ovl_inode_lowerdata() differs from
> ovl_inode_lower().
>
> If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> only allows creating and following metacopy inodes in the upper layer,
> it would have been simpler to implement ovl_obtain_alias().
>
> Specifically, when ofs->numlayer == 2 (single lower layer), there can
> be no valid metacopy inodes in the lower layer, so that configuration
> should also be rather easy to support.
>
> Basically, for ovl_obtain_alias():
> - 'lowerpath' must not have metadata xattr
> - 'upper_alias' must not have metadata xattr
> - If 'index' has metacopy xattr, OVL_UPPERDATA flag
>    should not be set on ovl inode
>
> But there are bigger complications w.r.t disconnected dentry.
> Overlayfs knows how to decode, work with and copy up
> disconnected dentries (parent is unknown), but in ovl_link(old, ...),

Can we get a disconnected non-dir dentry in normal process, or how to 
produce a disconnected dentry?

If I understand correctly, metacopy only inode will be processed in 
ovl_lower_fh_to_d():

706         /* First lookup overlay inode in inode cache by origin fh */
707         err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
708         if (err)
709                 return ERR_PTR(err);
710
711         if (!d_is_dir(origin.dentry) ||
712             !(origin.dentry->d_flags & DCACHE_DISCONNECTED)) {
713                 inode = ovl_lookup_inode(sb, origin.dentry, false);
714                 err = PTR_ERR(inode);
715                 if (IS_ERR(inode))
716                         goto out_err;
717                 if (inode) {
718                         dentry = 
d_find_any_alias(inode);                        // A NULL dentry found 
here? How did it happen?
719 iput(inode);
720                         if (dentry)
721                                 goto out;
722 }
723         }

[...]

762         /* Get a connected non-upper dir or disconnected non-dir */
763         dentry = ovl_get_dentry(sb, NULL, &origin, index);      // 
Get disconnected non-dir.

> 'old' dentry must not be a disconnected metacopy dentry when
> calling ovl_set_link_redirect() => ... ovl_get_redirect(), so we will
> also need to:
> - On copy up of a disconnected lower, do not use metacopy
> - Copy up data before ovl_link() when nfs_export is enabled
> - In ovl_obtain_alias(), if 'index' has redirect:
> -- Verify that it is an absolute path that it is resolved to the
> 'lowerpath's inode
> -- oip.redirect needs to be passed to ovl_get_inode()
> -- ovl_verify_inode() needs to verify that oip.redirect matches
>     redirect that is found on existing ovl inode
>
> And probably other things that I am forgetting...
>
> Thanks,
> Amir.
> .



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-07 11:05 ` Amir Goldstein
  2021-08-07 16:37   ` Amir Goldstein
@ 2021-08-09 19:56   ` Vivek Goyal
  1 sibling, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2021-08-09 19:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Zhihao Cheng, Miklos Szeredi, overlayfs

On Sat, Aug 07, 2021 at 02:05:15PM +0300, Amir Goldstein wrote:
> On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> >
> > Hi, all.
> >
> > As title said. I wonder to know the reason for overlayfs mount failure
> > with '-o nfs_export=on,metacopy=on'.
> >
> > I modified kernel to enable these two options 'on',  it looks like that
> > overlayfs can still work fine under nfs_v4.
> >
> > Besides, I can get no more information about the reason from source
> > code, maybe I missed something.
> >
> 
> It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> does not know how to construct a metacopy overlayfs inode.
> 
> Maybe Vivek will be able to point you to the discussion that lead to making
> the features mutually exclusive.

I think I had not implemented metacopy with nfs export because it seemed
non-trivial and I did not need nfs export feature support with metacopy.
So I decided to narrow down the problem space and not support nfs
export and metacopy together.

It will be good if somebody can dive deeper and make it work (if it
is possible to make it work).

Vivek
> 
> I don't remember any other reason.
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-07 16:37   ` Amir Goldstein
  2021-08-09  9:21     ` Zhihao Cheng
@ 2021-08-09 21:35     ` Vivek Goyal
  2021-08-10  4:17       ` Amir Goldstein
  1 sibling, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2021-08-09 21:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Zhihao Cheng, Miklos Szeredi, overlayfs

On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > >
> > > Hi, all.
> > >
> > > As title said. I wonder to know the reason for overlayfs mount failure
> > > with '-o nfs_export=on,metacopy=on'.
> > >
> > > I modified kernel to enable these two options 'on',  it looks like that
> > > overlayfs can still work fine under nfs_v4.
> > >
> > > Besides, I can get no more information about the reason from source
> > > code, maybe I missed something.
> > >
> >
> > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > does not know how to construct a metacopy overlayfs inode.
> >
> > Maybe Vivek will be able to point you to the discussion that lead to making
> > the features mutually exclusive.
> >
> > I don't remember any other reason.
> >
> 
> I remembered some more details...
> 
> I think the main complication discussed w.r.t decoding a metacopy
> inode was for the case where ovl_inode_lowerdata() differs from
> ovl_inode_lower().
> 
> If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> only allows creating and following metacopy inodes in the upper layer,
> it would have been simpler to implement ovl_obtain_alias().
> 
> Specifically, when ofs->numlayer == 2 (single lower layer), there can
> be no valid metacopy inodes in the lower layer, so that configuration
> should also be rather easy to support.

Hi Amir,

/me does not understand well the notion of disconnected dentries and
 how nfs export stuff works. So please bear with my stupid questions.

I am wondering why a lower inode can't be metacopy inode. For the
normal lookup case, we can lookup in all lower layers and figure out
which is actual data inode and which inodes are metacopy inodes. 

For the case of disconnected dentry, we probably can't do lookup. So
are calling underlying filesystem to decode. (Using origin?). If yes,
will intermediate lower not have origin xattr which we can use
to follow the complete lower chain and reconstruct all real lower
dentries and use lower data dentry and latest lower meatacopy dentry
(in the same way we do as for lookup).

Thanks
Vivek

> 
> Basically, for ovl_obtain_alias():
> - 'lowerpath' must not have metadata xattr
> - 'upper_alias' must not have metadata xattr
> - If 'index' has metacopy xattr, OVL_UPPERDATA flag
>   should not be set on ovl inode
> 
> But there are bigger complications w.r.t disconnected dentry.
> Overlayfs knows how to decode, work with and copy up
> disconnected dentries (parent is unknown), but in ovl_link(old, ...),
> 'old' dentry must not be a disconnected metacopy dentry when
> calling ovl_set_link_redirect() => ... ovl_get_redirect(), so we will
> also need to:
> - On copy up of a disconnected lower, do not use metacopy
> - Copy up data before ovl_link() when nfs_export is enabled
> - In ovl_obtain_alias(), if 'index' has redirect:
> -- Verify that it is an absolute path that it is resolved to the
> 'lowerpath's inode
> -- oip.redirect needs to be passed to ovl_get_inode()
> -- ovl_verify_inode() needs to verify that oip.redirect matches
>    redirect that is found on existing ovl inode
> 
> And probably other things that I am forgetting...
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-09 21:35     ` Vivek Goyal
@ 2021-08-10  4:17       ` Amir Goldstein
  2021-08-10 19:48         ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2021-08-10  4:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Zhihao Cheng, Miklos Szeredi, overlayfs

On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > >
> > > > Hi, all.
> > > >
> > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > with '-o nfs_export=on,metacopy=on'.
> > > >
> > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > overlayfs can still work fine under nfs_v4.
> > > >
> > > > Besides, I can get no more information about the reason from source
> > > > code, maybe I missed something.
> > > >
> > >
> > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > does not know how to construct a metacopy overlayfs inode.
> > >
> > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > the features mutually exclusive.
> > >
> > > I don't remember any other reason.
> > >
> >
> > I remembered some more details...
> >
> > I think the main complication discussed w.r.t decoding a metacopy
> > inode was for the case where ovl_inode_lowerdata() differs from
> > ovl_inode_lower().
> >
> > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > only allows creating and following metacopy inodes in the upper layer,
> > it would have been simpler to implement ovl_obtain_alias().
> >
> > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > be no valid metacopy inodes in the lower layer, so that configuration
> > should also be rather easy to support.
>
> Hi Amir,
>
> /me does not understand well the notion of disconnected dentries and
>  how nfs export stuff works. So please bear with my stupid questions.

No stupid questions ;-)

Without getting into the hairy details of nfs export there are a few basic
things to consider:
- A file handle does not encode the path, only an inode identifier
- A non-directory inode may have multiple paths (hardlinks)
- Most filesystems do not store path information in inode on-disk for
  non-directory inode (the ".." entry stores the path for a directory)
- When filesystem is asked to decode a file handle and does not find the
  inode in question in inode cache nor a dentry in dcache, the only resort
  is to instantiate a "disconnected" dentry with unknown path
- Later "normal" lookup() by path that resolves to the same inode, does not
  make that "disconnected" dentry connected. Istead, lookup() instantiates
  another connected dentry "alias" to the same inode

All this has some implications when enabling nfs_export for overlayfs:
1. ovl_obtain_dentry() needs to be able to cope with a disconnected
    'real' dentry
2. Since ovl_obtain_dentry() cannot assume to know the path of the
    'real' dentry, it needs to know how to instantiate a disconnected
    overlayfs dentry
3. Other overlayfs code needs to be able to cope with a disconnected
    overlayfs dentry (for example, copy up only to index)

>
> I am wondering why a lower inode can't be metacopy inode. For the
> normal lookup case, we can lookup in all lower layers and figure out
> which is actual data inode and which inodes are metacopy inodes.
>
> For the case of disconnected dentry, we probably can't do lookup. So
> are calling underlying filesystem to decode. (Using origin?). If yes,
> will intermediate lower not have origin xattr which we can use
> to follow the complete lower chain and reconstruct all real lower
> dentries and use lower data dentry and latest lower meatacopy dentry
> (in the same way we do as for lookup).

We can do that. I did not say we cannot.
I just said it would be simpler if we can avoid this complication
and I listed the guidelines for the "simple" implementation.

But beyond the complexity, what is the benefit?
I was under the impression that container manager do not know how
to build images with metacopy, so what are the chances of actually
seeing metacopy in middle layers in the wild?

IOW, if we implemented metacopy=upper (only allow metacopy in
upper layer), would it be sufficient for the use cases that need to enable
nfs_export?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-10  4:17       ` Amir Goldstein
@ 2021-08-10 19:48         ` Vivek Goyal
  2021-08-10 20:00           ` Vivek Goyal
  2021-08-11  6:51           ` Amir Goldstein
  0 siblings, 2 replies; 13+ messages in thread
From: Vivek Goyal @ 2021-08-10 19:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Tue, Aug 10, 2021 at 07:17:23AM +0300, Amir Goldstein wrote:
> On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > > >
> > > > > Hi, all.
> > > > >
> > > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > > with '-o nfs_export=on,metacopy=on'.
> > > > >
> > > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > > overlayfs can still work fine under nfs_v4.
> > > > >
> > > > > Besides, I can get no more information about the reason from source
> > > > > code, maybe I missed something.
> > > > >
> > > >
> > > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > > does not know how to construct a metacopy overlayfs inode.
> > > >
> > > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > > the features mutually exclusive.
> > > >
> > > > I don't remember any other reason.
> > > >
> > >
> > > I remembered some more details...
> > >
> > > I think the main complication discussed w.r.t decoding a metacopy
> > > inode was for the case where ovl_inode_lowerdata() differs from
> > > ovl_inode_lower().
> > >
> > > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > > only allows creating and following metacopy inodes in the upper layer,
> > > it would have been simpler to implement ovl_obtain_alias().
> > >
> > > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > > be no valid metacopy inodes in the lower layer, so that configuration
> > > should also be rather easy to support.
> >
> > Hi Amir,
> >
> > /me does not understand well the notion of disconnected dentries and
> >  how nfs export stuff works. So please bear with my stupid questions.
> 
> No stupid questions ;-)
> 
> Without getting into the hairy details of nfs export there are a few basic
> things to consider:
> - A file handle does not encode the path, only an inode identifier
> - A non-directory inode may have multiple paths (hardlinks)
> - Most filesystems do not store path information in inode on-disk for
>   non-directory inode (the ".." entry stores the path for a directory)
> - When filesystem is asked to decode a file handle and does not find the
>   inode in question in inode cache nor a dentry in dcache, the only resort
>   is to instantiate a "disconnected" dentry with unknown path
> - Later "normal" lookup() by path that resolves to the same inode, does not
>   make that "disconnected" dentry connected. Istead, lookup() instantiates
>   another connected dentry "alias" to the same inode
> 
> All this has some implications when enabling nfs_export for overlayfs:
> 1. ovl_obtain_dentry() needs to be able to cope with a disconnected
>     'real' dentry
> 2. Since ovl_obtain_dentry() cannot assume to know the path of the
>     'real' dentry, it needs to know how to instantiate a disconnected
>     overlayfs dentry
> 3. Other overlayfs code needs to be able to cope with a disconnected
>     overlayfs dentry (for example, copy up only to index)

Hi Amir,

Thanks for giving a summary. It helps a lot.

> 
> >
> > I am wondering why a lower inode can't be metacopy inode. For the
> > normal lookup case, we can lookup in all lower layers and figure out
> > which is actual data inode and which inodes are metacopy inodes.
> >
> > For the case of disconnected dentry, we probably can't do lookup. So
> > are calling underlying filesystem to decode. (Using origin?). If yes,
> > will intermediate lower not have origin xattr which we can use
> > to follow the complete lower chain and reconstruct all real lower
> > dentries and use lower data dentry and latest lower meatacopy dentry
> > (in the same way we do as for lookup).
> 
> We can do that. I did not say we cannot.
> I just said it would be simpler if we can avoid this complication
> and I listed the guidelines for the "simple" implementation.
> 
> But beyond the complexity, what is the benefit?
> I was under the impression that container manager do not know how
> to build images with metacopy, so what are the chances of actually
> seeing metacopy in middle layers in the wild?

Sure, we don't put metacopy inodes into portable images. But I thougt
this could be part of a lower directory on same system. For example,
docker devicemapper driver used to take an image and explode that
on a thin volume. Then it will take a snaphost, modify some files
and prefix that intermediate state with "-init". And then containers
will use this "-init" as base for container rootfs and take snapshot
of this.

I am not sure if container managers are doing this for overlayfs
or not on same system. But I will not be surprised if somebody
decides to do that. That's is change some metadata in image
(which triggers metacopy) and then use upper layer as lower layer
for container rootfs.

[ CC Dan Walsh and Nalin Dahyabhai ]

Dan, Nalin, I think now metacopy feature is being used in podman and
container/storage. Do we ever create lower layers in such a way that
metacopy inodes can be present in lower layers?

> 
> IOW, if we implemented metacopy=upper (only allow metacopy in
> upper layer), would it be sufficient for the use cases that need to enable
> nfs_export?

IMHO, it will be good if we don't add one more variation to metacopy
option. Even if container managers will find it hard to ensure that
there are no metacopy inodes in any of the lower layers. And will
find it difficult to use this option.

IIRC, while adding metacopy option, you had mentioned that it is possible
that intermediate layers have metacopy inodes. So I changed implementation
to take care of it. :-) And now you are the one suggesting not allowing
metacopy inodes in lower layers with nfs_export. :-)

If given a choice, I would prefer that we support metacopy inodes
in lower layers as well with nfs_export and not create a new option
metacopy=upper.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-10 19:48         ` Vivek Goyal
@ 2021-08-10 20:00           ` Vivek Goyal
  2021-08-11  6:51           ` Amir Goldstein
  1 sibling, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2021-08-10 20:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Tue, Aug 10, 2021 at 03:48:55PM -0400, Vivek Goyal wrote:

[..]
> > But beyond the complexity, what is the benefit?
> > I was under the impression that container manager do not know how
> > to build images with metacopy, so what are the chances of actually
> > seeing metacopy in middle layers in the wild?
> 
> Sure, we don't put metacopy inodes into portable images. But I thougt
> this could be part of a lower directory on same system. For example,
> docker devicemapper driver used to take an image and explode that
> on a thin volume. Then it will take a snaphost, modify some files
> and prefix that intermediate state with "-init". And then containers
> will use this "-init" as base for container rootfs and take snapshot
> of this.
> 
> I am not sure if container managers are doing this for overlayfs
> or not on same system. But I will not be surprised if somebody
> decides to do that. That's is change some metadata in image
> (which triggers metacopy) and then use upper layer as lower layer
> for container rootfs.
> 

In the past we were experimenting with chowning all files in image
according to user namespace of container using metacopy and use chowned
dir as lower layer for container rootfs. Ofcourse now, idmapped mounts
are available (not sure if it works with overlayfs yet or not) so chowning
images should not be required.

But point I am trying to make it is that I will not be surprised
that somebody comes up with use case of metacopy inode in lower
layer. So will be nice we take care of that configuration as well
while enabling metacopy with nfs_export (and deal with extra
complexity).

Vivek


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-10 19:48         ` Vivek Goyal
  2021-08-10 20:00           ` Vivek Goyal
@ 2021-08-11  6:51           ` Amir Goldstein
  2021-08-16 18:27             ` Vivek Goyal
  1 sibling, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2021-08-11  6:51 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Tue, Aug 10, 2021 at 10:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Aug 10, 2021 at 07:17:23AM +0300, Amir Goldstein wrote:
> > On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > > > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > > > >
> > > > > > Hi, all.
> > > > > >
> > > > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > > > with '-o nfs_export=on,metacopy=on'.
> > > > > >
> > > > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > > > overlayfs can still work fine under nfs_v4.
> > > > > >
> > > > > > Besides, I can get no more information about the reason from source
> > > > > > code, maybe I missed something.
> > > > > >
> > > > >
> > > > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > > > does not know how to construct a metacopy overlayfs inode.
> > > > >
> > > > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > > > the features mutually exclusive.
> > > > >
> > > > > I don't remember any other reason.
> > > > >
> > > >
> > > > I remembered some more details...
> > > >
> > > > I think the main complication discussed w.r.t decoding a metacopy
> > > > inode was for the case where ovl_inode_lowerdata() differs from
> > > > ovl_inode_lower().
> > > >
> > > > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > > > only allows creating and following metacopy inodes in the upper layer,
> > > > it would have been simpler to implement ovl_obtain_alias().
> > > >
> > > > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > > > be no valid metacopy inodes in the lower layer, so that configuration
> > > > should also be rather easy to support.
> > >
> > > Hi Amir,
> > >
> > > /me does not understand well the notion of disconnected dentries and
> > >  how nfs export stuff works. So please bear with my stupid questions.
> >
> > No stupid questions ;-)
> >
> > Without getting into the hairy details of nfs export there are a few basic
> > things to consider:
> > - A file handle does not encode the path, only an inode identifier
> > - A non-directory inode may have multiple paths (hardlinks)
> > - Most filesystems do not store path information in inode on-disk for
> >   non-directory inode (the ".." entry stores the path for a directory)
> > - When filesystem is asked to decode a file handle and does not find the
> >   inode in question in inode cache nor a dentry in dcache, the only resort
> >   is to instantiate a "disconnected" dentry with unknown path
> > - Later "normal" lookup() by path that resolves to the same inode, does not
> >   make that "disconnected" dentry connected. Istead, lookup() instantiates
> >   another connected dentry "alias" to the same inode
> >
> > All this has some implications when enabling nfs_export for overlayfs:
> > 1. ovl_obtain_dentry() needs to be able to cope with a disconnected
> >     'real' dentry
> > 2. Since ovl_obtain_dentry() cannot assume to know the path of the
> >     'real' dentry, it needs to know how to instantiate a disconnected
> >     overlayfs dentry
> > 3. Other overlayfs code needs to be able to cope with a disconnected
> >     overlayfs dentry (for example, copy up only to index)
>
> Hi Amir,
>
> Thanks for giving a summary. It helps a lot.
>
> >
> > >
> > > I am wondering why a lower inode can't be metacopy inode. For the
> > > normal lookup case, we can lookup in all lower layers and figure out
> > > which is actual data inode and which inodes are metacopy inodes.
> > >
> > > For the case of disconnected dentry, we probably can't do lookup. So
> > > are calling underlying filesystem to decode. (Using origin?). If yes,
> > > will intermediate lower not have origin xattr which we can use
> > > to follow the complete lower chain and reconstruct all real lower
> > > dentries and use lower data dentry and latest lower meatacopy dentry
> > > (in the same way we do as for lookup).
> >
> > We can do that. I did not say we cannot.
> > I just said it would be simpler if we can avoid this complication
> > and I listed the guidelines for the "simple" implementation.
> >
> > But beyond the complexity, what is the benefit?
> > I was under the impression that container manager do not know how
> > to build images with metacopy, so what are the chances of actually
> > seeing metacopy in middle layers in the wild?
>
> Sure, we don't put metacopy inodes into portable images. But I thougt
> this could be part of a lower directory on same system. For example,
> docker devicemapper driver used to take an image and explode that
> on a thin volume. Then it will take a snaphost, modify some files
> and prefix that intermediate state with "-init". And then containers
> will use this "-init" as base for container rootfs and take snapshot
> of this.
>
> I am not sure if container managers are doing this for overlayfs
> or not on same system. But I will not be surprised if somebody
> decides to do that. That's is change some metadata in image
> (which triggers metacopy) and then use upper layer as lower layer
> for container rootfs.
>
> [ CC Dan Walsh and Nalin Dahyabhai ]
>
> Dan, Nalin, I think now metacopy feature is being used in podman and
> container/storage. Do we ever create lower layers in such a way that
> metacopy inodes can be present in lower layers?
>
> >
> > IOW, if we implemented metacopy=upper (only allow metacopy in
> > upper layer), would it be sufficient for the use cases that need to enable
> > nfs_export?
>
> IMHO, it will be good if we don't add one more variation to metacopy
> option. Even if container managers will find it hard to ensure that
> there are no metacopy inodes in any of the lower layers. And will
> find it difficult to use this option.
>
> IIRC, while adding metacopy option, you had mentioned that it is possible
> that intermediate layers have metacopy inodes. So I changed implementation
> to take care of it. :-) And now you are the one suggesting not allowing
> metacopy inodes in lower layers with nfs_export. :-)
>

Maybe. It wouldn't be the first time ;-)
IIRC the idea to lookup metacopy by path+redirect was Miklos' not mine.
I was thinking in terms of following by origin back then.

The eventual code in ovl_lookup() follows upper by origin and all layers
by path+redirect and only checks for agreement of origin and redirect from
upper layer.

Implementing "follow metacopy in middle layers by origin" in ovl_obtain_alias()
is possible and even not too complicated, but it would be inconsistent
with ovl_lookup() and if somebody mangaled with lower layers (i.e. rename
lower file), decode file handle can result in a different inode than the encoded
one.

In that case we have several options:
1. Whatever - documentation already claims that modifying lower layer
    after using index/metacopy results in undefined behavior
2. Fortify ovl_lookup() - also follow origin from middle layers and check
    for agreement with follow by path+redirect

> If given a choice, I would prefer that we support metacopy inodes
> in lower layers as well with nfs_export and not create a new option
> metacopy=upper.
>

Certainly. I agree with that POV. We just need to understand the
consequences.

I personally have no problem with the "Whatever" option above.
And the "Fortify" option isn't that complex either.

I just thought that if Zhihao Cheng wanted to start with minimal
testable implementation, that limiting nfs_export+metacopy to a single
lower layer, may be an easier start.

Overlayfs nfs_export is quite likely being used in OpenWrt and IIUC,
OpenWrt uses a single (squashfs) lower layer.

Therefore, enabling metacopy for the single lower layer case may be
valuable to some actual users and not only as a stepping stone before a
complete solution.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-11  6:51           ` Amir Goldstein
@ 2021-08-16 18:27             ` Vivek Goyal
  2021-08-16 18:44               ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2021-08-16 18:27 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Wed, Aug 11, 2021 at 09:51:09AM +0300, Amir Goldstein wrote:
> On Tue, Aug 10, 2021 at 10:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Aug 10, 2021 at 07:17:23AM +0300, Amir Goldstein wrote:
> > > On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > > > > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > > > > >
> > > > > > > Hi, all.
> > > > > > >
> > > > > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > > > > with '-o nfs_export=on,metacopy=on'.
> > > > > > >
> > > > > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > > > > overlayfs can still work fine under nfs_v4.
> > > > > > >
> > > > > > > Besides, I can get no more information about the reason from source
> > > > > > > code, maybe I missed something.
> > > > > > >
> > > > > >
> > > > > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > > > > does not know how to construct a metacopy overlayfs inode.
> > > > > >
> > > > > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > > > > the features mutually exclusive.
> > > > > >
> > > > > > I don't remember any other reason.
> > > > > >
> > > > >
> > > > > I remembered some more details...
> > > > >
> > > > > I think the main complication discussed w.r.t decoding a metacopy
> > > > > inode was for the case where ovl_inode_lowerdata() differs from
> > > > > ovl_inode_lower().
> > > > >
> > > > > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > > > > only allows creating and following metacopy inodes in the upper layer,
> > > > > it would have been simpler to implement ovl_obtain_alias().
> > > > >
> > > > > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > > > > be no valid metacopy inodes in the lower layer, so that configuration
> > > > > should also be rather easy to support.
> > > >
> > > > Hi Amir,
> > > >
> > > > /me does not understand well the notion of disconnected dentries and
> > > >  how nfs export stuff works. So please bear with my stupid questions.
> > >
> > > No stupid questions ;-)
> > >
> > > Without getting into the hairy details of nfs export there are a few basic
> > > things to consider:
> > > - A file handle does not encode the path, only an inode identifier
> > > - A non-directory inode may have multiple paths (hardlinks)
> > > - Most filesystems do not store path information in inode on-disk for
> > >   non-directory inode (the ".." entry stores the path for a directory)
> > > - When filesystem is asked to decode a file handle and does not find the
> > >   inode in question in inode cache nor a dentry in dcache, the only resort
> > >   is to instantiate a "disconnected" dentry with unknown path
> > > - Later "normal" lookup() by path that resolves to the same inode, does not
> > >   make that "disconnected" dentry connected. Istead, lookup() instantiates
> > >   another connected dentry "alias" to the same inode
> > >
> > > All this has some implications when enabling nfs_export for overlayfs:
> > > 1. ovl_obtain_dentry() needs to be able to cope with a disconnected
> > >     'real' dentry
> > > 2. Since ovl_obtain_dentry() cannot assume to know the path of the
> > >     'real' dentry, it needs to know how to instantiate a disconnected
> > >     overlayfs dentry
> > > 3. Other overlayfs code needs to be able to cope with a disconnected
> > >     overlayfs dentry (for example, copy up only to index)
> >
> > Hi Amir,
> >
> > Thanks for giving a summary. It helps a lot.
> >
> > >
> > > >
> > > > I am wondering why a lower inode can't be metacopy inode. For the
> > > > normal lookup case, we can lookup in all lower layers and figure out
> > > > which is actual data inode and which inodes are metacopy inodes.
> > > >
> > > > For the case of disconnected dentry, we probably can't do lookup. So
> > > > are calling underlying filesystem to decode. (Using origin?). If yes,
> > > > will intermediate lower not have origin xattr which we can use
> > > > to follow the complete lower chain and reconstruct all real lower
> > > > dentries and use lower data dentry and latest lower meatacopy dentry
> > > > (in the same way we do as for lookup).
> > >
> > > We can do that. I did not say we cannot.
> > > I just said it would be simpler if we can avoid this complication
> > > and I listed the guidelines for the "simple" implementation.
> > >
> > > But beyond the complexity, what is the benefit?
> > > I was under the impression that container manager do not know how
> > > to build images with metacopy, so what are the chances of actually
> > > seeing metacopy in middle layers in the wild?
> >
> > Sure, we don't put metacopy inodes into portable images. But I thougt
> > this could be part of a lower directory on same system. For example,
> > docker devicemapper driver used to take an image and explode that
> > on a thin volume. Then it will take a snaphost, modify some files
> > and prefix that intermediate state with "-init". And then containers
> > will use this "-init" as base for container rootfs and take snapshot
> > of this.
> >
> > I am not sure if container managers are doing this for overlayfs
> > or not on same system. But I will not be surprised if somebody
> > decides to do that. That's is change some metadata in image
> > (which triggers metacopy) and then use upper layer as lower layer
> > for container rootfs.
> >
> > [ CC Dan Walsh and Nalin Dahyabhai ]
> >
> > Dan, Nalin, I think now metacopy feature is being used in podman and
> > container/storage. Do we ever create lower layers in such a way that
> > metacopy inodes can be present in lower layers?
> >
> > >
> > > IOW, if we implemented metacopy=upper (only allow metacopy in
> > > upper layer), would it be sufficient for the use cases that need to enable
> > > nfs_export?
> >
> > IMHO, it will be good if we don't add one more variation to metacopy
> > option. Even if container managers will find it hard to ensure that
> > there are no metacopy inodes in any of the lower layers. And will
> > find it difficult to use this option.
> >
> > IIRC, while adding metacopy option, you had mentioned that it is possible
> > that intermediate layers have metacopy inodes. So I changed implementation
> > to take care of it. :-) And now you are the one suggesting not allowing
> > metacopy inodes in lower layers with nfs_export. :-)
> >
> 
> Maybe. It wouldn't be the first time ;-)
> IIRC the idea to lookup metacopy by path+redirect was Miklos' not mine.
> I was thinking in terms of following by origin back then.

Yes. I think we had lot of discussion on this. You wanted to use origin.
And I preferred not to use it primarily because I wanted to make use
of metacopy even without index enabled also because did not want 
to depend on lower layers supporting nfs_export. So lookup approach
seemed simpler and using origin did not seem necessary.

> 
> The eventual code in ovl_lookup() follows upper by origin and all layers
> by path+redirect and only checks for agreement of origin and redirect from
> upper layer.
> 
> Implementing "follow metacopy in middle layers by origin" in ovl_obtain_alias()
> is possible and even not too complicated, but it would be inconsistent
> with ovl_lookup() and if somebody mangaled with lower layers (i.e. rename
> lower file), decode file handle can result in a different inode than the encoded
> one.
> 
> In that case we have several options:
> 1. Whatever - documentation already claims that modifying lower layer
>     after using index/metacopy results in undefined behavior
> 2. Fortify ovl_lookup() - also follow origin from middle layers and check
>     for agreement with follow by path+redirect

The idea of fortifying that lower we found matches origin sounds
reasonable to me. I thought that's what verify_lower was doing. But may
be in limited cases.

We probably will have to again make it opt-in so that we don't break
some existing use cases (not opting in for origin created so many
issues on squashfs that I have lost track of all the issues and fixes :-))

> 
> > If given a choice, I would prefer that we support metacopy inodes
> > in lower layers as well with nfs_export and not create a new option
> > metacopy=upper.
> >
> 
> Certainly. I agree with that POV. We just need to understand the
> consequences.
> 
> I personally have no problem with the "Whatever" option above.
> And the "Fortify" option isn't that complex either.
> 
> I just thought that if Zhihao Cheng wanted to start with minimal
> testable implementation, that limiting nfs_export+metacopy to a single
> lower layer, may be an easier start.
> 
> Overlayfs nfs_export is quite likely being used in OpenWrt and IIUC,
> OpenWrt uses a single (squashfs) lower layer.
> 
> Therefore, enabling metacopy for the single lower layer case may be
> valuable to some actual users and not only as a stepping stone before a
> complete solution.

I agree that metacopy=upper as it is will be useful to some people
in some configurations. Do you really want to support two options.
More options lead to more confusion for users and more configurations can
be harder to support. So if it was me, I would rather target just making
metacopy=on/off work with nfs_export. Given I am not doing the actual
work, I can have only so much say.. :-)

Thanks
Vivek


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-16 18:27             ` Vivek Goyal
@ 2021-08-16 18:44               ` Amir Goldstein
  2021-08-16 19:00                 ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2021-08-16 18:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Mon, Aug 16, 2021 at 9:27 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Aug 11, 2021 at 09:51:09AM +0300, Amir Goldstein wrote:
> > On Tue, Aug 10, 2021 at 10:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Tue, Aug 10, 2021 at 07:17:23AM +0300, Amir Goldstein wrote:
> > > > On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > > > > > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > > > > > >
> > > > > > > > Hi, all.
> > > > > > > >
> > > > > > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > > > > > with '-o nfs_export=on,metacopy=on'.
> > > > > > > >
> > > > > > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > > > > > overlayfs can still work fine under nfs_v4.
> > > > > > > >
> > > > > > > > Besides, I can get no more information about the reason from source
> > > > > > > > code, maybe I missed something.
> > > > > > > >
> > > > > > >
> > > > > > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > > > > > does not know how to construct a metacopy overlayfs inode.
> > > > > > >
> > > > > > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > > > > > the features mutually exclusive.
> > > > > > >
> > > > > > > I don't remember any other reason.
> > > > > > >
> > > > > >
> > > > > > I remembered some more details...
> > > > > >
> > > > > > I think the main complication discussed w.r.t decoding a metacopy
> > > > > > inode was for the case where ovl_inode_lowerdata() differs from
> > > > > > ovl_inode_lower().
> > > > > >
> > > > > > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > > > > > only allows creating and following metacopy inodes in the upper layer,
> > > > > > it would have been simpler to implement ovl_obtain_alias().
> > > > > >
> > > > > > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > > > > > be no valid metacopy inodes in the lower layer, so that configuration
> > > > > > should also be rather easy to support.
> > > > >
> > > > > Hi Amir,
> > > > >
> > > > > /me does not understand well the notion of disconnected dentries and
> > > > >  how nfs export stuff works. So please bear with my stupid questions.
> > > >
> > > > No stupid questions ;-)
> > > >
> > > > Without getting into the hairy details of nfs export there are a few basic
> > > > things to consider:
> > > > - A file handle does not encode the path, only an inode identifier
> > > > - A non-directory inode may have multiple paths (hardlinks)
> > > > - Most filesystems do not store path information in inode on-disk for
> > > >   non-directory inode (the ".." entry stores the path for a directory)
> > > > - When filesystem is asked to decode a file handle and does not find the
> > > >   inode in question in inode cache nor a dentry in dcache, the only resort
> > > >   is to instantiate a "disconnected" dentry with unknown path
> > > > - Later "normal" lookup() by path that resolves to the same inode, does not
> > > >   make that "disconnected" dentry connected. Istead, lookup() instantiates
> > > >   another connected dentry "alias" to the same inode
> > > >
> > > > All this has some implications when enabling nfs_export for overlayfs:
> > > > 1. ovl_obtain_dentry() needs to be able to cope with a disconnected
> > > >     'real' dentry
> > > > 2. Since ovl_obtain_dentry() cannot assume to know the path of the
> > > >     'real' dentry, it needs to know how to instantiate a disconnected
> > > >     overlayfs dentry
> > > > 3. Other overlayfs code needs to be able to cope with a disconnected
> > > >     overlayfs dentry (for example, copy up only to index)
> > >
> > > Hi Amir,
> > >
> > > Thanks for giving a summary. It helps a lot.
> > >
> > > >
> > > > >
> > > > > I am wondering why a lower inode can't be metacopy inode. For the
> > > > > normal lookup case, we can lookup in all lower layers and figure out
> > > > > which is actual data inode and which inodes are metacopy inodes.
> > > > >
> > > > > For the case of disconnected dentry, we probably can't do lookup. So
> > > > > are calling underlying filesystem to decode. (Using origin?). If yes,
> > > > > will intermediate lower not have origin xattr which we can use
> > > > > to follow the complete lower chain and reconstruct all real lower
> > > > > dentries and use lower data dentry and latest lower meatacopy dentry
> > > > > (in the same way we do as for lookup).
> > > >
> > > > We can do that. I did not say we cannot.
> > > > I just said it would be simpler if we can avoid this complication
> > > > and I listed the guidelines for the "simple" implementation.
> > > >
> > > > But beyond the complexity, what is the benefit?
> > > > I was under the impression that container manager do not know how
> > > > to build images with metacopy, so what are the chances of actually
> > > > seeing metacopy in middle layers in the wild?
> > >
> > > Sure, we don't put metacopy inodes into portable images. But I thougt
> > > this could be part of a lower directory on same system. For example,
> > > docker devicemapper driver used to take an image and explode that
> > > on a thin volume. Then it will take a snaphost, modify some files
> > > and prefix that intermediate state with "-init". And then containers
> > > will use this "-init" as base for container rootfs and take snapshot
> > > of this.
> > >
> > > I am not sure if container managers are doing this for overlayfs
> > > or not on same system. But I will not be surprised if somebody
> > > decides to do that. That's is change some metadata in image
> > > (which triggers metacopy) and then use upper layer as lower layer
> > > for container rootfs.
> > >
> > > [ CC Dan Walsh and Nalin Dahyabhai ]
> > >
> > > Dan, Nalin, I think now metacopy feature is being used in podman and
> > > container/storage. Do we ever create lower layers in such a way that
> > > metacopy inodes can be present in lower layers?
> > >
> > > >
> > > > IOW, if we implemented metacopy=upper (only allow metacopy in
> > > > upper layer), would it be sufficient for the use cases that need to enable
> > > > nfs_export?
> > >
> > > IMHO, it will be good if we don't add one more variation to metacopy
> > > option. Even if container managers will find it hard to ensure that
> > > there are no metacopy inodes in any of the lower layers. And will
> > > find it difficult to use this option.
> > >
> > > IIRC, while adding metacopy option, you had mentioned that it is possible
> > > that intermediate layers have metacopy inodes. So I changed implementation
> > > to take care of it. :-) And now you are the one suggesting not allowing
> > > metacopy inodes in lower layers with nfs_export. :-)
> > >
> >
> > Maybe. It wouldn't be the first time ;-)
> > IIRC the idea to lookup metacopy by path+redirect was Miklos' not mine.
> > I was thinking in terms of following by origin back then.
>
> Yes. I think we had lot of discussion on this. You wanted to use origin.
> And I preferred not to use it primarily because I wanted to make use
> of metacopy even without index enabled also because did not want
> to depend on lower layers supporting nfs_export. So lookup approach
> seemed simpler and using origin did not seem necessary.
>
> >
> > The eventual code in ovl_lookup() follows upper by origin and all layers
> > by path+redirect and only checks for agreement of origin and redirect from
> > upper layer.
> >
> > Implementing "follow metacopy in middle layers by origin" in ovl_obtain_alias()
> > is possible and even not too complicated, but it would be inconsistent
> > with ovl_lookup() and if somebody mangaled with lower layers (i.e. rename
> > lower file), decode file handle can result in a different inode than the encoded
> > one.
> >
> > In that case we have several options:
> > 1. Whatever - documentation already claims that modifying lower layer
> >     after using index/metacopy results in undefined behavior
> > 2. Fortify ovl_lookup() - also follow origin from middle layers and check
> >     for agreement with follow by path+redirect
>
> The idea of fortifying that lower we found matches origin sounds
> reasonable to me. I thought that's what verify_lower was doing. But may
> be in limited cases.
>
> We probably will have to again make it opt-in so that we don't break
> some existing use cases (not opting in for origin created so many
> issues on squashfs that I have lost track of all the issues and fixes :-))
>
> >
> > > If given a choice, I would prefer that we support metacopy inodes
> > > in lower layers as well with nfs_export and not create a new option
> > > metacopy=upper.
> > >
> >
> > Certainly. I agree with that POV. We just need to understand the
> > consequences.
> >
> > I personally have no problem with the "Whatever" option above.
> > And the "Fortify" option isn't that complex either.
> >
> > I just thought that if Zhihao Cheng wanted to start with minimal
> > testable implementation, that limiting nfs_export+metacopy to a single
> > lower layer, may be an easier start.
> >
> > Overlayfs nfs_export is quite likely being used in OpenWrt and IIUC,
> > OpenWrt uses a single (squashfs) lower layer.
> >
> > Therefore, enabling metacopy for the single lower layer case may be
> > valuable to some actual users and not only as a stepping stone before a
> > complete solution.
>
> I agree that metacopy=upper as it is will be useful to some people
> in some configurations. Do you really want to support two options.
> More options lead to more confusion for users and more configurations can
> be harder to support. So if it was me, I would rather target just making
> metacopy=on/off work with nfs_export. Given I am not doing the actual
> work, I can have only so much say.. :-)
>

Maybe I was not clear, so let me try again.
The most simple thing to implement would be to allow
nfs_export=on along with metacopy=on when there is one single lower layer.

Assuming that configuration is useful for whoever intends to implement it.
From there we can gather some experience and move on to supporting
multi lower layers if needed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy?
  2021-08-16 18:44               ` Amir Goldstein
@ 2021-08-16 19:00                 ` Vivek Goyal
  0 siblings, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2021-08-16 19:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Zhihao Cheng, Miklos Szeredi, overlayfs, Daniel J Walsh, Nalin Dahyabhai

On Mon, Aug 16, 2021 at 09:44:11PM +0300, Amir Goldstein wrote:
> On Mon, Aug 16, 2021 at 9:27 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Aug 11, 2021 at 09:51:09AM +0300, Amir Goldstein wrote:
> > > On Tue, Aug 10, 2021 at 10:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Tue, Aug 10, 2021 at 07:17:23AM +0300, Amir Goldstein wrote:
> > > > > On Tue, Aug 10, 2021 at 12:35 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > >
> > > > > > On Sat, Aug 07, 2021 at 07:37:00PM +0300, Amir Goldstein wrote:
> > > > > > > On Sat, Aug 7, 2021 at 2:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Sat, Aug 7, 2021 at 1:17 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi, all.
> > > > > > > > >
> > > > > > > > > As title said. I wonder to know the reason for overlayfs mount failure
> > > > > > > > > with '-o nfs_export=on,metacopy=on'.
> > > > > > > > >
> > > > > > > > > I modified kernel to enable these two options 'on',  it looks like that
> > > > > > > > > overlayfs can still work fine under nfs_v4.
> > > > > > > > >
> > > > > > > > > Besides, I can get no more information about the reason from source
> > > > > > > > > code, maybe I missed something.
> > > > > > > > >
> > > > > > > >
> > > > > > > > It's because ovl_obtain_alias() (decoding a disconnected non-dir file handle)
> > > > > > > > does not know how to construct a metacopy overlayfs inode.
> > > > > > > >
> > > > > > > > Maybe Vivek will be able to point you to the discussion that lead to making
> > > > > > > > the features mutually exclusive.
> > > > > > > >
> > > > > > > > I don't remember any other reason.
> > > > > > > >
> > > > > > >
> > > > > > > I remembered some more details...
> > > > > > >
> > > > > > > I think the main complication discussed w.r.t decoding a metacopy
> > > > > > > inode was for the case where ovl_inode_lowerdata() differs from
> > > > > > > ovl_inode_lower().
> > > > > > >
> > > > > > > If we had a weaker variant of metacopy (e.g. metacopy=upper) that
> > > > > > > only allows creating and following metacopy inodes in the upper layer,
> > > > > > > it would have been simpler to implement ovl_obtain_alias().
> > > > > > >
> > > > > > > Specifically, when ofs->numlayer == 2 (single lower layer), there can
> > > > > > > be no valid metacopy inodes in the lower layer, so that configuration
> > > > > > > should also be rather easy to support.
> > > > > >
> > > > > > Hi Amir,
> > > > > >
> > > > > > /me does not understand well the notion of disconnected dentries and
> > > > > >  how nfs export stuff works. So please bear with my stupid questions.
> > > > >
> > > > > No stupid questions ;-)
> > > > >
> > > > > Without getting into the hairy details of nfs export there are a few basic
> > > > > things to consider:
> > > > > - A file handle does not encode the path, only an inode identifier
> > > > > - A non-directory inode may have multiple paths (hardlinks)
> > > > > - Most filesystems do not store path information in inode on-disk for
> > > > >   non-directory inode (the ".." entry stores the path for a directory)
> > > > > - When filesystem is asked to decode a file handle and does not find the
> > > > >   inode in question in inode cache nor a dentry in dcache, the only resort
> > > > >   is to instantiate a "disconnected" dentry with unknown path
> > > > > - Later "normal" lookup() by path that resolves to the same inode, does not
> > > > >   make that "disconnected" dentry connected. Istead, lookup() instantiates
> > > > >   another connected dentry "alias" to the same inode
> > > > >
> > > > > All this has some implications when enabling nfs_export for overlayfs:
> > > > > 1. ovl_obtain_dentry() needs to be able to cope with a disconnected
> > > > >     'real' dentry
> > > > > 2. Since ovl_obtain_dentry() cannot assume to know the path of the
> > > > >     'real' dentry, it needs to know how to instantiate a disconnected
> > > > >     overlayfs dentry
> > > > > 3. Other overlayfs code needs to be able to cope with a disconnected
> > > > >     overlayfs dentry (for example, copy up only to index)
> > > >
> > > > Hi Amir,
> > > >
> > > > Thanks for giving a summary. It helps a lot.
> > > >
> > > > >
> > > > > >
> > > > > > I am wondering why a lower inode can't be metacopy inode. For the
> > > > > > normal lookup case, we can lookup in all lower layers and figure out
> > > > > > which is actual data inode and which inodes are metacopy inodes.
> > > > > >
> > > > > > For the case of disconnected dentry, we probably can't do lookup. So
> > > > > > are calling underlying filesystem to decode. (Using origin?). If yes,
> > > > > > will intermediate lower not have origin xattr which we can use
> > > > > > to follow the complete lower chain and reconstruct all real lower
> > > > > > dentries and use lower data dentry and latest lower meatacopy dentry
> > > > > > (in the same way we do as for lookup).
> > > > >
> > > > > We can do that. I did not say we cannot.
> > > > > I just said it would be simpler if we can avoid this complication
> > > > > and I listed the guidelines for the "simple" implementation.
> > > > >
> > > > > But beyond the complexity, what is the benefit?
> > > > > I was under the impression that container manager do not know how
> > > > > to build images with metacopy, so what are the chances of actually
> > > > > seeing metacopy in middle layers in the wild?
> > > >
> > > > Sure, we don't put metacopy inodes into portable images. But I thougt
> > > > this could be part of a lower directory on same system. For example,
> > > > docker devicemapper driver used to take an image and explode that
> > > > on a thin volume. Then it will take a snaphost, modify some files
> > > > and prefix that intermediate state with "-init". And then containers
> > > > will use this "-init" as base for container rootfs and take snapshot
> > > > of this.
> > > >
> > > > I am not sure if container managers are doing this for overlayfs
> > > > or not on same system. But I will not be surprised if somebody
> > > > decides to do that. That's is change some metadata in image
> > > > (which triggers metacopy) and then use upper layer as lower layer
> > > > for container rootfs.
> > > >
> > > > [ CC Dan Walsh and Nalin Dahyabhai ]
> > > >
> > > > Dan, Nalin, I think now metacopy feature is being used in podman and
> > > > container/storage. Do we ever create lower layers in such a way that
> > > > metacopy inodes can be present in lower layers?
> > > >
> > > > >
> > > > > IOW, if we implemented metacopy=upper (only allow metacopy in
> > > > > upper layer), would it be sufficient for the use cases that need to enable
> > > > > nfs_export?
> > > >
> > > > IMHO, it will be good if we don't add one more variation to metacopy
> > > > option. Even if container managers will find it hard to ensure that
> > > > there are no metacopy inodes in any of the lower layers. And will
> > > > find it difficult to use this option.
> > > >
> > > > IIRC, while adding metacopy option, you had mentioned that it is possible
> > > > that intermediate layers have metacopy inodes. So I changed implementation
> > > > to take care of it. :-) And now you are the one suggesting not allowing
> > > > metacopy inodes in lower layers with nfs_export. :-)
> > > >
> > >
> > > Maybe. It wouldn't be the first time ;-)
> > > IIRC the idea to lookup metacopy by path+redirect was Miklos' not mine.
> > > I was thinking in terms of following by origin back then.
> >
> > Yes. I think we had lot of discussion on this. You wanted to use origin.
> > And I preferred not to use it primarily because I wanted to make use
> > of metacopy even without index enabled also because did not want
> > to depend on lower layers supporting nfs_export. So lookup approach
> > seemed simpler and using origin did not seem necessary.
> >
> > >
> > > The eventual code in ovl_lookup() follows upper by origin and all layers
> > > by path+redirect and only checks for agreement of origin and redirect from
> > > upper layer.
> > >
> > > Implementing "follow metacopy in middle layers by origin" in ovl_obtain_alias()
> > > is possible and even not too complicated, but it would be inconsistent
> > > with ovl_lookup() and if somebody mangaled with lower layers (i.e. rename
> > > lower file), decode file handle can result in a different inode than the encoded
> > > one.
> > >
> > > In that case we have several options:
> > > 1. Whatever - documentation already claims that modifying lower layer
> > >     after using index/metacopy results in undefined behavior
> > > 2. Fortify ovl_lookup() - also follow origin from middle layers and check
> > >     for agreement with follow by path+redirect
> >
> > The idea of fortifying that lower we found matches origin sounds
> > reasonable to me. I thought that's what verify_lower was doing. But may
> > be in limited cases.
> >
> > We probably will have to again make it opt-in so that we don't break
> > some existing use cases (not opting in for origin created so many
> > issues on squashfs that I have lost track of all the issues and fixes :-))
> >
> > >
> > > > If given a choice, I would prefer that we support metacopy inodes
> > > > in lower layers as well with nfs_export and not create a new option
> > > > metacopy=upper.
> > > >
> > >
> > > Certainly. I agree with that POV. We just need to understand the
> > > consequences.
> > >
> > > I personally have no problem with the "Whatever" option above.
> > > And the "Fortify" option isn't that complex either.
> > >
> > > I just thought that if Zhihao Cheng wanted to start with minimal
> > > testable implementation, that limiting nfs_export+metacopy to a single
> > > lower layer, may be an easier start.
> > >
> > > Overlayfs nfs_export is quite likely being used in OpenWrt and IIUC,
> > > OpenWrt uses a single (squashfs) lower layer.
> > >
> > > Therefore, enabling metacopy for the single lower layer case may be
> > > valuable to some actual users and not only as a stepping stone before a
> > > complete solution.
> >
> > I agree that metacopy=upper as it is will be useful to some people
> > in some configurations. Do you really want to support two options.
> > More options lead to more confusion for users and more configurations can
> > be harder to support. So if it was me, I would rather target just making
> > metacopy=on/off work with nfs_export. Given I am not doing the actual
> > work, I can have only so much say.. :-)
> >
> 
> Maybe I was not clear, so let me try again.
> The most simple thing to implement would be to allow
> nfs_export=on along with metacopy=on when there is one single lower layer.

Ok, got it. So if there are more than one lower layer, nfs_export +
metacopy mount will fail.

It is still one more configuration because without nfs_export we
do not have any such restriction. But I am not opposed to the idea
if you find this intermediate configuration valuable given the
complexity of work.

Thanks
Vivek

> 
> Assuming that configuration is useful for whoever intends to implement it.
> From there we can gather some experience and move on to supporting
> multi lower layers if needed.
> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-16 19:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-07 10:17 [QUESTION] Why overlayfs cannot mounted as nfs_export and metacopy? Zhihao Cheng
2021-08-07 11:05 ` Amir Goldstein
2021-08-07 16:37   ` Amir Goldstein
2021-08-09  9:21     ` Zhihao Cheng
2021-08-09 21:35     ` Vivek Goyal
2021-08-10  4:17       ` Amir Goldstein
2021-08-10 19:48         ` Vivek Goyal
2021-08-10 20:00           ` Vivek Goyal
2021-08-11  6:51           ` Amir Goldstein
2021-08-16 18:27             ` Vivek Goyal
2021-08-16 18:44               ` Amir Goldstein
2021-08-16 19:00                 ` Vivek Goyal
2021-08-09 19:56   ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).