linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Virtio-fs as upper layer for overlayfs
       [not found] <7904C889-F0AC-4473-8C02-887EF6593564@intel.com>
@ 2020-01-06 18:35 ` Vivek Goyal
  2020-01-06 19:58   ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2020-01-06 18:35 UTC (permalink / raw)
  To: Ernst, Eric
  Cc: mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein, linux-unionfs

On Mon, Jan 06, 2020 at 05:27:05PM +0000, Ernst, Eric wrote:

[CC linux-unionfs@vger.kernel.org and amir]

> Hi Miklos,
> 
> One of the popular use cases for Kata Containers is running docker-in-docker.  That is, a container image is run which in turn will make use of a container runtime to do a container build.
> 
> When combined with virtio-fs, we end up with a configuration like:
> xfs/ext4 -> overlayfs -> virtio-fs -> overlayfs 
> 
> As discussed in [1], per overlayfs spec: 
> "The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable."
> 

I don't know exaactly the reasons why NFS is not supported as upper. Are
above only two reasons? These might work with virtio-fs depending on
underlying filesystem. If yes, should we check for these properties
at mount time (instead of relying on dentry flags only,
ovl_dentry_remote()).

I feel there is more to it. Just that I don't know. Miklos and Amir
will probably have more thoughts on this.

Vivek

> At this point, with virtio-fs this, [2], check fails.  
> 
> Vivek mentioned that bypassing this check *may* be feasible, [3].  Can you help identify if this is feasible, and rationale for NFS not being available as an upper (though, more importantly, understanding what needs to be done to add proper support for virtio-fs as upper layer).
> 
> Thanks,
> Eric 
> 
> [1] - https://github.com/kata-containers/runtime/issues/1888
> [2] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/overlayfs/super.c#n753
> [3] - https://github.com/kata-containers/runtime/issues/1888#issuecomment-518259095
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-06 18:35 ` Virtio-fs as upper layer for overlayfs Vivek Goyal
@ 2020-01-06 19:58   ` Miklos Szeredi
  2020-01-06 22:24     ` eric.ernst
  2020-01-07 16:09     ` Vivek Goyal
  0 siblings, 2 replies; 7+ messages in thread
From: Miklos Szeredi @ 2020-01-06 19:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Ernst, Eric, mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein,
	overlayfs

On Mon, Jan 6, 2020 at 7:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Jan 06, 2020 at 05:27:05PM +0000, Ernst, Eric wrote:
>
> [CC linux-unionfs@vger.kernel.org and amir]
>
> > Hi Miklos,
> >
> > One of the popular use cases for Kata Containers is running docker-in-docker.  That is, a container image is run which in turn will make use of a container runtime to do a container build.
> >
> > When combined with virtio-fs, we end up with a configuration like:
> > xfs/ext4 -> overlayfs -> virtio-fs -> overlayfs
> >
> > As discussed in [1], per overlayfs spec:
> > "The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable."
> >
>
> I don't know exaactly the reasons why NFS is not supported as upper. Are
> above only two reasons? These might work with virtio-fs depending on
> underlying filesystem. If yes, should we check for these properties
> at mount time (instead of relying on dentry flags only,
> ovl_dentry_remote()).
>
> I feel there is more to it.

NFS also has these automount points, that we currently can't cope with
in overlayfs.  And there's revalidation, which we reject on upper
simply because overlayfs currently doesn't call ->revalidate() on
upper.   Not that we would not be able to, it's just something that
probably needs some thought.

Virtio-fs does not yet have the magic automount thing (which would be
useful to resolve inode number conflicts), but it does have
revalidate. In the virtio-fs case, not calling ->revalidate() should
not be a problem, so it's safe to try out this configuration by adding
a hack to disable the remote check in case of a virtio-fs upper.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-06 19:58   ` Miklos Szeredi
@ 2020-01-06 22:24     ` eric.ernst
  2020-01-07  9:48       ` Miklos Szeredi
  2020-01-07 16:09     ` Vivek Goyal
  1 sibling, 1 reply; 7+ messages in thread
From: eric.ernst @ 2020-01-06 22:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein,
	overlayfs

On Mon, Jan 06, 2020 at 08:58:23PM +0100, Miklos Szeredi wrote:
> On Mon, Jan 6, 2020 at 7:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Jan 06, 2020 at 05:27:05PM +0000, Ernst, Eric wrote:
> >
> > [CC linux-unionfs@vger.kernel.org and amir]
> >
> > > Hi Miklos,
> > >
> > > One of the popular use cases for Kata Containers is running docker-in-docker.  That is, a container image is run which in turn will make use of a container runtime to do a container build.
> > >
> > > When combined with virtio-fs, we end up with a configuration like:
> > > xfs/ext4 -> overlayfs -> virtio-fs -> overlayfs
> > >
> > > As discussed in [1], per overlayfs spec:
> > > "The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable."
> > >
> >
> > I don't know exaactly the reasons why NFS is not supported as upper. Are
> > above only two reasons? These might work with virtio-fs depending on
> > underlying filesystem. If yes, should we check for these properties
> > at mount time (instead of relying on dentry flags only,
> > ovl_dentry_remote()).
> >
> > I feel there is more to it.
> 
> NFS also has these automount points, that we currently can't cope with
> in overlayfs.  And there's revalidation, which we reject on upper
> simply because overlayfs currently doesn't call ->revalidate() on
> upper.   Not that we would not be able to, it's just something that
> probably needs some thought.
> 
> Virtio-fs does not yet have the magic automount thing (which would be
> useful to resolve inode number conflicts), but it does have
> revalidate. In the virtio-fs case, not calling ->revalidate() should
> not be a problem, so it's safe to try out this configuration by adding
> a hack to disable the remote check in case of a virtio-fs upper.
>

Miklos, I'm still learning a bit more about fs implementations, so my
apologies if this should be obvious. For virtio-fs, one of the use cases
that is described is sharing memory between two guests (not necessarily
the Kata use case). I was guessing the dcache would be within the guest,
and that in at least the shared memory case, there's potential that a 
revalidate may be neccesary, in case any changes are made by the second guest?
(I could be mixing up the intended use for revalidate, though).

Can you clarify that "not calling ->revalidate() should not be a
problem?"

Thanks for the help.

-Eric


> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-06 22:24     ` eric.ernst
@ 2020-01-07  9:48       ` Miklos Szeredi
  2020-01-07 13:37         ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2020-01-07  9:48 UTC (permalink / raw)
  To: Ernst, Eric
  Cc: Vivek Goyal, mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein,
	overlayfs

On Mon, Jan 6, 2020 at 11:18 PM <eric.ernst@intel.com> wrote:

> Miklos, I'm still learning a bit more about fs implementations, so my
> apologies if this should be obvious. For virtio-fs, one of the use cases
> that is described is sharing memory between two guests (not necessarily
> the Kata use case). I was guessing the dcache would be within the guest,
> and that in at least the shared memory case, there's potential that a
> revalidate may be neccesary, in case any changes are made by the second guest?

Exactly.

> (I could be mixing up the intended use for revalidate, though).
>
> Can you clarify that "not calling ->revalidate() should not be a
> problem?"

I was referring specifically to the overlayfs case.  Overlayfs stacks
on top of some other filesystems, i.e. when ->d_revalidate() is called
on overlayfs it calls ->d_revalidate() on underlying fs.  This only
happens for the lower (read-only) layers, not the upper (read-write)
layer.  So if the underlying upper fs is modified from another guest,
than that modification is not going to be reflected on the overlayfs.
However, overlayfs documents any changes to underlying layers as
resulting in undefined behavior.  It would be strange if docker was
relying on undefined behavior of overlayfs, so not doing the
revalidation should not make a difference.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-07  9:48       ` Miklos Szeredi
@ 2020-01-07 13:37         ` Vivek Goyal
  0 siblings, 0 replies; 7+ messages in thread
From: Vivek Goyal @ 2020-01-07 13:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Ernst, Eric, mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein,
	overlayfs

On Tue, Jan 07, 2020 at 10:48:37AM +0100, Miklos Szeredi wrote:
> On Mon, Jan 6, 2020 at 11:18 PM <eric.ernst@intel.com> wrote:
> 
> > Miklos, I'm still learning a bit more about fs implementations, so my
> > apologies if this should be obvious. For virtio-fs, one of the use cases
> > that is described is sharing memory between two guests (not necessarily
> > the Kata use case). I was guessing the dcache would be within the guest,
> > and that in at least the shared memory case, there's potential that a
> > revalidate may be neccesary, in case any changes are made by the second guest?
> 
> Exactly.
> 
> > (I could be mixing up the intended use for revalidate, though).
> >
> > Can you clarify that "not calling ->revalidate() should not be a
> > problem?"
> 
> I was referring specifically to the overlayfs case.  Overlayfs stacks
> on top of some other filesystems, i.e. when ->d_revalidate() is called
> on overlayfs it calls ->d_revalidate() on underlying fs.  This only
> happens for the lower (read-only) layers, not the upper (read-write)
> layer.  So if the underlying upper fs is modified from another guest,
> than that modification is not going to be reflected on the overlayfs.
> However, overlayfs documents any changes to underlying layers as
> resulting in undefined behavior.  It would be strange if docker was
> relying on undefined behavior of overlayfs, so not doing the
> revalidation should not make a difference.

Hi Miklos,

Thanks for the explanation. I had the same question as Eric. So we will
basically rely on assumption that overlayfs upper (virtio-fs in this
case) is not shared and will not be modified underneath. Which probably
is true in this specific case. In fact I think even overlayfs lower will
not be modified as well because once docker prepares a rootfs for a 
container, it is not shared by any other container. IOW, following
seems to be the setup.

xfs/ext4 --> overlayfs1 --> virtiofs --->overlayfs2

Docker on host will prepare container root overlayfs1 on host and export
that into container using virtiofs. IIUC, overlayfs1 mount point will
not be shared with any other container. Only overlayfs1/lower will be
shared with other overlayfs mount point to enable container image
sharing. 

Given overlayfs1 will not be shared with other containers, that means
virtiofs is not changing outside this container. And docker will prepare
overlayfs2 inside contiainer for nested container. There also upper will
not be shared by other nested containers so even if we don't call
revalidate on upper, it should be fine for this configuration.

This is now all dependent on whether user has done the configuration right
and kernel can't enforce correct configuration.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-06 19:58   ` Miklos Szeredi
  2020-01-06 22:24     ` eric.ernst
@ 2020-01-07 16:09     ` Vivek Goyal
  2020-01-13 19:56       ` Vivek Goyal
  1 sibling, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2020-01-07 16:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Ernst, Eric, mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein,
	overlayfs

On Mon, Jan 06, 2020 at 08:58:23PM +0100, Miklos Szeredi wrote:
> On Mon, Jan 6, 2020 at 7:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Jan 06, 2020 at 05:27:05PM +0000, Ernst, Eric wrote:
> >
> > [CC linux-unionfs@vger.kernel.org and amir]
> >
> > > Hi Miklos,
> > >
> > > One of the popular use cases for Kata Containers is running docker-in-docker.  That is, a container image is run which in turn will make use of a container runtime to do a container build.
> > >
> > > When combined with virtio-fs, we end up with a configuration like:
> > > xfs/ext4 -> overlayfs -> virtio-fs -> overlayfs
> > >
> > > As discussed in [1], per overlayfs spec:
> > > "The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable."
> > >
> >
> > I don't know exaactly the reasons why NFS is not supported as upper. Are
> > above only two reasons? These might work with virtio-fs depending on
> > underlying filesystem. If yes, should we check for these properties
> > at mount time (instead of relying on dentry flags only,
> > ovl_dentry_remote()).
> >
> > I feel there is more to it.
> 
> NFS also has these automount points, that we currently can't cope with
> in overlayfs.  And there's revalidation, which we reject on upper
> simply because overlayfs currently doesn't call ->revalidate() on
> upper.   Not that we would not be able to, it's just something that
> probably needs some thought.
> 
> Virtio-fs does not yet have the magic automount thing (which would be
> useful to resolve inode number conflicts), but it does have
> revalidate. In the virtio-fs case, not calling ->revalidate() should
> not be a problem, so it's safe to try out this configuration by adding
> a hack to disable the remote check in case of a virtio-fs upper.

Here is a hack patch to provide an exception to allow virtiofs as upper
filesystem for overlayfs. 

I can mount now but I get warning that upper does not support xattr, hence
disabling index and metaocopy. Still need to test why that's the case. I
thought xattr are supported on virtiofs.


Subject: overlayfs: Allow virtiofs as overlayfs upper

This is a hack patch to allow virtiofs as overlayfs upper filesystem. At
this point of time it is meant for testing and see what issues crop up.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/namei.c     |    3 ++-
 fs/overlayfs/overlayfs.h |    2 ++
 fs/overlayfs/super.c     |    4 ++--
 fs/overlayfs/util.c      |   24 ++++++++++++++++++++++--
 4 files changed, 28 insertions(+), 5 deletions(-)

Index: rhvgoyal-linux/fs/overlayfs/util.c
===================================================================
--- rhvgoyal-linux.orig/fs/overlayfs/util.c	2020-01-07 11:03:22.584732137 -0500
+++ rhvgoyal-linux/fs/overlayfs/util.c	2020-01-07 11:03:55.424732137 -0500
@@ -102,11 +102,31 @@ struct ovl_entry *ovl_alloc_entry(unsign
 	return oe;
 }
 
+bool ovl_dentry_union(struct dentry *dentry)
+{
+	return dentry->d_flags & DCACHE_OP_REAL;
+}
+
 bool ovl_dentry_remote(struct dentry *dentry)
 {
 	return dentry->d_flags &
-		(DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE
-		 DCACHE_OP_REAL);
+		(DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE);
+}
+
+bool ovl_dentry_valid_upper(struct dentry *dentry)
+{
+	struct file_system_type *fs_type;
+
+	if (ovl_dentry_union(dentry))
+		return false;
+
+	fs_type = dentry->d_sb->s_type;
+
+	/* Provide an exception for virtiofs */
+	if (ovl_dentry_remote(dentry) && strcmp(fs_type->name, "virtiofs"))
+		return false;
+
+	return true;
 }
 
 bool ovl_dentry_weird(struct dentry *dentry)
Index: rhvgoyal-linux/fs/overlayfs/super.c
===================================================================
--- rhvgoyal-linux.orig/fs/overlayfs/super.c	2020-01-07 11:03:22.584732137 -0500
+++ rhvgoyal-linux/fs/overlayfs/super.c	2020-01-07 11:03:55.427732137 -0500
@@ -751,7 +751,7 @@ static int ovl_mount_dir(const char *nam
 		err = ovl_mount_dir_noesc(tmp, path);
 
 		if (!err)
-			if (ovl_dentry_remote(path->dentry)) {
+			if (!ovl_dentry_valid_upper(path->dentry)) {
 				pr_err("overlayfs: filesystem on '%s' not supported as upperdir\n",
 				       tmp);
 				path_put_init(path);
@@ -792,7 +792,7 @@ static int ovl_lower_dir(const char *nam
 
 	*stack_depth = max(*stack_depth, path->mnt->mnt_sb->s_stack_depth);
 
-	if (ovl_dentry_remote(path->dentry))
+	if (ovl_dentry_remote(path->dentry) || ovl_dentry_union(path->dentry))
 		*remote = true;
 
 	/*
Index: rhvgoyal-linux/fs/overlayfs/overlayfs.h
===================================================================
--- rhvgoyal-linux.orig/fs/overlayfs/overlayfs.h	2020-01-07 11:03:22.584732137 -0500
+++ rhvgoyal-linux/fs/overlayfs/overlayfs.h	2020-01-07 11:03:55.426732137 -0500
@@ -228,6 +228,8 @@ bool ovl_index_all(struct super_block *s
 bool ovl_verify_lower(struct super_block *sb);
 struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
 bool ovl_dentry_remote(struct dentry *dentry);
+bool ovl_dentry_union(struct dentry *dentry);
+bool ovl_dentry_valid_upper(struct dentry *dentry);
 bool ovl_dentry_weird(struct dentry *dentry);
 enum ovl_path_type ovl_path_type(struct dentry *dentry);
 void ovl_path_upper(struct dentry *dentry, struct path *path);
Index: rhvgoyal-linux/fs/overlayfs/namei.c
===================================================================
--- rhvgoyal-linux.orig/fs/overlayfs/namei.c	2020-01-07 11:03:22.584732137 -0500
+++ rhvgoyal-linux/fs/overlayfs/namei.c	2020-01-07 11:03:55.428732137 -0500
@@ -845,7 +845,8 @@ struct dentry *ovl_lookup(struct inode *
 		if (err)
 			goto out;
 
-		if (upperdentry && unlikely(ovl_dentry_remote(upperdentry))) {
+		if (upperdentry &&
+		    unlikely(!ovl_dentry_valid_upper(upperdentry))) {
 			dput(upperdentry);
 			err = -EREMOTE;
 			goto out;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Virtio-fs as upper layer for overlayfs
  2020-01-07 16:09     ` Vivek Goyal
@ 2020-01-13 19:56       ` Vivek Goyal
  0 siblings, 0 replies; 7+ messages in thread
From: Vivek Goyal @ 2020-01-13 19:56 UTC (permalink / raw)
  To: Miklos Szeredi, Ernst, Eric
  Cc: mszeredi, kata-dev, Stefan Hajnoczi, Amir Goldstein, overlayfs

On Tue, Jan 07, 2020 at 11:09:18AM -0500, Vivek Goyal wrote:
> On Mon, Jan 06, 2020 at 08:58:23PM +0100, Miklos Szeredi wrote:
> > On Mon, Jan 6, 2020 at 7:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Mon, Jan 06, 2020 at 05:27:05PM +0000, Ernst, Eric wrote:
> > >
> > > [CC linux-unionfs@vger.kernel.org and amir]
> > >
> > > > Hi Miklos,
> > > >
> > > > One of the popular use cases for Kata Containers is running docker-in-docker.  That is, a container image is run which in turn will make use of a container runtime to do a container build.
> > > >
> > > > When combined with virtio-fs, we end up with a configuration like:
> > > > xfs/ext4 -> overlayfs -> virtio-fs -> overlayfs
> > > >
> > > > As discussed in [1], per overlayfs spec:
> > > > "The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir responses, so NFS is not suitable."
> > > >
> > >
> > > I don't know exaactly the reasons why NFS is not supported as upper. Are
> > > above only two reasons? These might work with virtio-fs depending on
> > > underlying filesystem. If yes, should we check for these properties
> > > at mount time (instead of relying on dentry flags only,
> > > ovl_dentry_remote()).
> > >
> > > I feel there is more to it.
> > 
> > NFS also has these automount points, that we currently can't cope with
> > in overlayfs.  And there's revalidation, which we reject on upper
> > simply because overlayfs currently doesn't call ->revalidate() on
> > upper.   Not that we would not be able to, it's just something that
> > probably needs some thought.
> > 
> > Virtio-fs does not yet have the magic automount thing (which would be
> > useful to resolve inode number conflicts), but it does have
> > revalidate. In the virtio-fs case, not calling ->revalidate() should
> > not be a problem, so it's safe to try out this configuration by adding
> > a hack to disable the remote check in case of a virtio-fs upper.
> 
> Here is a hack patch to provide an exception to allow virtiofs as upper
> filesystem for overlayfs. 
> 
> I can mount now but I get warning that upper does not support xattr, hence
> disabling index and metaocopy. Still need to test why that's the case. I
> thought xattr are supported on virtiofs.

I have pushed this patch on a branch in my repo for testing.

https://github.com/rhvgoyal/linux/commit/0a0c0e2d9986ecf445e1fdff45b51f37b98ac1e6

I can now mount overlayfs on top of virtiofs with this patch. It needs to
run virtiofsd with option "-o xattr" and also needs following patch for it
to work.

https://www.redhat.com/archives/virtio-fs/2020-January/msg00047.html

Thanks
Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-13 19:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <7904C889-F0AC-4473-8C02-887EF6593564@intel.com>
2020-01-06 18:35 ` Virtio-fs as upper layer for overlayfs Vivek Goyal
2020-01-06 19:58   ` Miklos Szeredi
2020-01-06 22:24     ` eric.ernst
2020-01-07  9:48       ` Miklos Szeredi
2020-01-07 13:37         ` Vivek Goyal
2020-01-07 16:09     ` Vivek Goyal
2020-01-13 19:56       ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).