linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: OverlaysFS offline tools
@ 2020-01-08  7:27 Amir Goldstein
  2020-01-08 14:06 ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2020-01-08  7:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi

[-fsdevel,+containers]

> On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> >
> > Apols to ask here but are there any tools for overlayFS?
> >
> > https://github.com/kmxz/overlayfs-tools is just about the only thing I
> > can find.
>
> There is also https://github.com/hisilicon/overlayfs-progs which
> can check and fix overlay layers, but it hasn't been updated in a while.
>

Hi Vivek (and containers folks),

Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
to ask about the status of overlayfs offline tools.

Quoting my answer here for visibility to more container developers:

I have been involved with implementing many overlayfs features in the
kernel in the
past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
All of these features bring benefits to end users, but AFAIK, they are
all still disabled
by default in containers runtimes (?) because lack of tools support
(e.g. migration
/import/export). I cannot force anyone to use the new overlayfs
features nor to write
offline tools support for them.

So how can we improve this situation?

If the problem is development resources then I've had great experience
in the past
with OSS internship programs like Google summer of code (GSoC):
Organizations, such as Redhat or mobyproject.org, can participate in the program
by posting proposals for open source projects.
Developers, such as myself, volunteer to mentors projects and students apply
to work on them.

IIRC, the timeline for GSoC for project proposals in around April. Applying as
an organization could be before that.

Vivek, since you are the only developer I know involved in containers runtime
projects I am asking you, but really its a question for all container developers
out there.

Are you aware of missing features in containers that could be met by filling the
gaps with overlayfs offline tools?
Are you a part of an organization that could consider posting this sort of
project proposals to GSoC or other internship programs?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-08  7:27 OverlaysFS offline tools Amir Goldstein
@ 2020-01-08 14:06 ` Vivek Goyal
  2020-01-08 15:29   ` Tycho Andersen
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-01-08 14:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi

On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
> [-fsdevel,+containers]
> 
> > On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> > >
> > > Apols to ask here but are there any tools for overlayFS?
> > >
> > > https://github.com/kmxz/overlayfs-tools is just about the only thing I
> > > can find.
> >
> > There is also https://github.com/hisilicon/overlayfs-progs which
> > can check and fix overlay layers, but it hasn't been updated in a while.
> >
> 
> Hi Vivek (and containers folks),
> 
> Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
> to ask about the status of overlayfs offline tools.
> 
> Quoting my answer here for visibility to more container developers:
> 
> I have been involved with implementing many overlayfs features in the
> kernel in the
> past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
> All of these features bring benefits to end users, but AFAIK, they are
> all still disabled
> by default in containers runtimes (?) because lack of tools support
> (e.g. migration
> /import/export). I cannot force anyone to use the new overlayfs
> features nor to write
> offline tools support for them.
> 
> So how can we improve this situation?
> 
> If the problem is development resources then I've had great experience
> in the past
> with OSS internship programs like Google summer of code (GSoC):
> Organizations, such as Redhat or mobyproject.org, can participate in the program
> by posting proposals for open source projects.
> Developers, such as myself, volunteer to mentors projects and students apply
> to work on them.
> 
> IIRC, the timeline for GSoC for project proposals in around April. Applying as
> an organization could be before that.
> 
> Vivek, since you are the only developer I know involved in containers runtime
> projects I am asking you, but really its a question for all container developers
> out there.
> 
> Are you aware of missing features in containers that could be met by filling the
> gaps with overlayfs offline tools?

CCing Dan Walsh as he is taking care of podman and often I hear some of
the the complaints from him w.r.t what he thinks is missing. This is
not necessarily related to overlayfs offline tools.

- Unpriviliged mounting of overlayfs.
 
  He wants to launch containers unpriviliged and hence wants to be able
  to mount overlayfs without being root in init_user_ns. I think Miklos
  posted some patches for that but not much progress after that.

  https://patchwork.kernel.org/cover/11212091/

- shiftfs

  As of now they are relying on doing chown of the image but will really
  like to see the ability to shift uid/gids using shiftfs or using
  VFS layer solution.

- Overlayfs redirect_dir is not compatible with image building

  redirect_dir is not compatible with image building and I think that's
  one reason that its not used by default. And as metacopy is dependent
  on redirect_dir, its not used by default as well. It can be used for
  running containers though, but one needs to know that in advacnce.

  So it will be good if that's fixed with redirect_dir and metacopy
  features and then there is higher chance that these features are
  enabled by default.

  Miklos had some ides on how to tackle the issue of getting diff
  correctly with redirect_dir enabled.

  https://www.spinics.net/lists/linux-unionfs/msg06969.html

  Having said that, I think Dan Walsh has enabled metacopy by default
  in podman in certain configurations (for running containers and not
  for building images).

Thanks
Vivek


> Are you a part of an organization that could consider posting this sort of
> project proposals to GSoC or other internship programs?
> 
> Thanks,
> Amir.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-08 14:06 ` Vivek Goyal
@ 2020-01-08 15:29   ` Tycho Andersen
  2020-01-13 15:28   ` Daniel Walsh
  2020-06-05  5:33   ` Amir Goldstein
  2 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2020-01-08 15:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, StuartIanNaylor, Miklos Szeredi,
	Linux Containers, zhangyi (F),
	overlayfs, kmxz, James.Bottomley

On Wed, Jan 08, 2020 at 09:06:11AM -0500, Vivek Goyal wrote:
> - shiftfs
> 
>   As of now they are relying on doing chown of the image but will really
>   like to see the ability to shift uid/gids using shiftfs or using
>   VFS layer solution.

I think James is working on this:

https://lore.kernel.org/linux-fsdevel/20200104203946.27914-1-James.Bottomley@HansenPartnership.com/

Tycho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-08 14:06 ` Vivek Goyal
  2020-01-08 15:29   ` Tycho Andersen
@ 2020-01-13 15:28   ` Daniel Walsh
  2020-01-13 18:02     ` Amir Goldstein
  2020-01-13 20:07     ` Christian Brauner
  2020-06-05  5:33   ` Amir Goldstein
  2 siblings, 2 replies; 12+ messages in thread
From: Daniel Walsh @ 2020-01-13 15:28 UTC (permalink / raw)
  To: Vivek Goyal, Amir Goldstein
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi

On 1/8/20 9:06 AM, Vivek Goyal wrote:
> On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
>> [-fsdevel,+containers]
>>
>>> On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
>>>> Apols to ask here but are there any tools for overlayFS?
>>>>
>>>> https://github.com/kmxz/overlayfs-tools is just about the only thing I
>>>> can find.
>>> There is also https://github.com/hisilicon/overlayfs-progs which
>>> can check and fix overlay layers, but it hasn't been updated in a while.
>>>
>> Hi Vivek (and containers folks),
>>
>> Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
>> to ask about the status of overlayfs offline tools.
>>
>> Quoting my answer here for visibility to more container developers:
>>
>> I have been involved with implementing many overlayfs features in the
>> kernel in the
>> past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
>> All of these features bring benefits to end users, but AFAIK, they are
>> all still disabled
>> by default in containers runtimes (?) because lack of tools support
>> (e.g. migration
>> /import/export). I cannot force anyone to use the new overlayfs
>> features nor to write
>> offline tools support for them.
>>
>> So how can we improve this situation?
>>
>> If the problem is development resources then I've had great experience
>> in the past
>> with OSS internship programs like Google summer of code (GSoC):
>> Organizations, such as Redhat or mobyproject.org, can participate in the program
>> by posting proposals for open source projects.
>> Developers, such as myself, volunteer to mentors projects and students apply
>> to work on them.
>>
>> IIRC, the timeline for GSoC for project proposals in around April. Applying as
>> an organization could be before that.
>>
>> Vivek, since you are the only developer I know involved in containers runtime
>> projects I am asking you, but really its a question for all container developers
>> out there.
>>
>> Are you aware of missing features in containers that could be met by filling the
>> gaps with overlayfs offline tools?
> CCing Dan Walsh as he is taking care of podman and often I hear some of
> the the complaints from him w.r.t what he thinks is missing. This is
> not necessarily related to overlayfs offline tools.
>
> - Unpriviliged mounting of overlayfs.
>  
>   He wants to launch containers unpriviliged and hence wants to be able
>   to mount overlayfs without being root in init_user_ns. I think Miklos
>   posted some patches for that but not much progress after that.
>
>   https://patchwork.kernel.org/cover/11212091/
>
> - shiftfs
>
>   As of now they are relying on doing chown of the image but will really
>   like to see the ability to shift uid/gids using shiftfs or using
>   VFS layer solution.
>
> - Overlayfs redirect_dir is not compatible with image building
>
>   redirect_dir is not compatible with image building and I think that's
>   one reason that its not used by default. And as metacopy is dependent
>   on redirect_dir, its not used by default as well. It can be used for
>   running containers though, but one needs to know that in advacnce.
>
>   So it will be good if that's fixed with redirect_dir and metacopy
>   features and then there is higher chance that these features are
>   enabled by default.
>
>   Miklos had some ides on how to tackle the issue of getting diff
>   correctly with redirect_dir enabled.
>
>   https://www.spinics.net/lists/linux-unionfs/msg06969.html
>
>   Having said that, I think Dan Walsh has enabled metacopy by default
>   in podman in certain configurations (for running containers and not
>   for building images).
>
> Thanks
> Vivek

Amir, Vivek did an excellent job of describing what we are attempting to
do with OverlayFS in container tools.  My work centers around
github.com/containers Specifically in podman(libpod), buildah, CRI-O,
Skopeo, containers/storage and containers/image.

The Podman tool is our most popular tool and runs containers with
metacopyup turned on by default, in at least Fedora and soon in RHEL8. 
Not sure if it is turned on by default in Debian and Ubuntu releases, as
well as OpenSUSE and other distros.

On of the biggest features of these container engines (runtimes) is that
podman & Buildah can run rootless, using the user namespace. But sadly
we can not use overlayfs for this, since mounting of overlayfs requires
CAP_SYS_ADMIN.  As Vivek points out, Miklos is working to fix this.  For
now we use a FUSE version of overlay called fuse_overlayfs, which can
run rootless, but might not give us as good of performance as kernel
overlayfs. 

The biggest feature I want to push for in container technologies is
better support for User Namespace.  I want to use it for container
separation, IE Each container would run with a different User
Namespace.  This means that root in one container would be a different
UID then Root is a different container.  Currently almost no one uses
User Namespace for this kind of separation.  The difficulty is that the
kernel does not support a shifting file system, so if I want to share
the same base image image, (Lower directory) between multiple containers
in different User Namespaces, the UIDs end up wrong.  We have hoped for
a shifting file system for many years, but Overlay FS has never
developed it, (Fuse-overlay has some support for it).  There is an
effort in the kernel now to add a shifting file system, but I would bet
this will take a long time to get implemented.  

The other option that we have built into our container engines is a
"chowing" image.  Basically when a new container is started, in a new
User Namespace, the container engine chowns the lower level to match the
new user namespace and then sets up an overlay mount.  If the same image
is used a second time, the container engine is smart enough to use the
"chowned" image.  This chowning causes two problems on traditional
Overlay systems.  One it is slow, since it is copying up all of the
lower files to a new upper.  The second problem is now the kernel sees
each executable/shared library as being different so process/memory
sharing is broken in the kernel.  This means I get less containers
running on a system do to memory.  The metacopyup feature of overlay
solves both of these issues.  This is why we turn it on by default in
Podman.  If I run podman in a new user namespace, in stead of it taking
30 seconds to chown the file system, it now takes < 2 seconds.

Sadly still almost no one is using User Namespace separated containers,
because they are not on by default.  The issue is users need to pick out
unigue ranges of UIDs for each container they create/launch, and almost
no one does.  I would propose that we fix this by making Podman do it by
default. The idea would be to allocate 2 Billion UIDs on a system and
then have podman pick a range of 65K uids for each root running
container that it creates.  Container/storage would keep track of the
selection. 

This would cause the chowning to happen every time a container was
launched.  So I would like to continue to focus on the speed of
chowning.  https://github.com/rhatdan/tools/chown.go is an effort to
create a better tool for chowning that takes advantage of multi
threading.  I would like to get this functionality into
containers/storage to get container start times < 1 second, if possible. 

These features are currently back burnered and could be a good use of a
GSOC student.

>
>> Are you a part of an organization that could consider posting this sort of
>> project proposals to GSoC or other internship programs?
>>
>> Thanks,
>> Amir.
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-13 15:28   ` Daniel Walsh
@ 2020-01-13 18:02     ` Amir Goldstein
  2020-01-13 20:07     ` Christian Brauner
  1 sibling, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2020-01-13 18:02 UTC (permalink / raw)
  To: Daniel J Walsh
  Cc: Vivek Goyal, overlayfs, StuartIanNaylor, Linux Containers, kmxz,
	zhangyi (F),
	Miklos Szeredi, James Bottomley

> Amir, Vivek did an excellent job of describing what we are attempting to
> do with OverlayFS in container tools.  My work centers around
> github.com/containers Specifically in podman(libpod), buildah, CRI-O,
> Skopeo, containers/storage and containers/image.
>
> The Podman tool is our most popular tool and runs containers with
> metacopyup turned on by default, in at least Fedora and soon in RHEL8.
> Not sure if it is turned on by default in Debian and Ubuntu releases, as
> well as OpenSUSE and other distros.
>
> On of the biggest features of these container engines (runtimes) is that
> podman & Buildah can run rootless, using the user namespace. But sadly
> we can not use overlayfs for this, since mounting of overlayfs requires
> CAP_SYS_ADMIN.  As Vivek points out, Miklos is working to fix this.  For
> now we use a FUSE version of overlay called fuse_overlayfs, which can
> run rootless, but might not give us as good of performance as kernel
> overlayfs.
>
> The biggest feature I want to push for in container technologies is
> better support for User Namespace.  I want to use it for container
> separation, IE Each container would run with a different User
> Namespace.  This means that root in one container would be a different
> UID then Root is a different container.  Currently almost no one uses
> User Namespace for this kind of separation.  The difficulty is that the
> kernel does not support a shifting file system, so if I want to share
> the same base image image, (Lower directory) between multiple containers
> in different User Namespaces, the UIDs end up wrong.  We have hoped for
> a shifting file system for many years, but Overlay FS has never
> developed it, (Fuse-overlay has some support for it).  There is an
> effort in the kernel now to add a shifting file system, but I would bet
> this will take a long time to get implemented.
>
> The other option that we have built into our container engines is a
> "chowing" image.  Basically when a new container is started, in a new
> User Namespace, the container engine chowns the lower level to match the
> new user namespace and then sets up an overlay mount.  If the same image
> is used a second time, the container engine is smart enough to use the
> "chowned" image.  This chowning causes two problems on traditional
> Overlay systems.  One it is slow, since it is copying up all of the
> lower files to a new upper.  The second problem is now the kernel sees
> each executable/shared library as being different so process/memory
> sharing is broken in the kernel.  This means I get less containers
> running on a system do to memory.  The metacopyup feature of overlay
> solves both of these issues.  This is why we turn it on by default in
> Podman.  If I run podman in a new user namespace, in stead of it taking
> 30 seconds to chown the file system, it now takes < 2 seconds.
>
> Sadly still almost no one is using User Namespace separated containers,
> because they are not on by default.  The issue is users need to pick out
> unigue ranges of UIDs for each container they create/launch, and almost
> no one does.  I would propose that we fix this by making Podman do it by
> default. The idea would be to allocate 2 Billion UIDs on a system and
> then have podman pick a range of 65K uids for each root running
> container that it creates.  Container/storage would keep track of the
> selection.
>
> This would cause the chowning to happen every time a container was
> launched.  So I would like to continue to focus on the speed of
> chowning.  https://github.com/rhatdan/tools/chown.go is an effort to
> create a better tool for chowning that takes advantage of multi
> threading.  I would like to get this functionality into
> containers/storage to get container start times < 1 second, if possible.
>

Just to be clear, is Podman chowning all the files to a one specific
uig/gid? Or does it "shift" the values for each file for chown?

In any case, I imagine that integrating shiftfs logic into overlayfs should
be not that hard. If someone would do the work and the demand from
users exists I see nothing stopping that feature from getting upstream.
But seems that James and other shitfs developers need shifting to work
not only with overlayfs, so the way for shiftfs upstream is not yet paved.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-13 15:28   ` Daniel Walsh
  2020-01-13 18:02     ` Amir Goldstein
@ 2020-01-13 20:07     ` Christian Brauner
  1 sibling, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2020-01-13 20:07 UTC (permalink / raw)
  To: Daniel Walsh
  Cc: Vivek Goyal, Amir Goldstein, StuartIanNaylor, Miklos Szeredi,
	Linux Containers, zhangyi (F),
	overlayfs, kmxz

On Mon, Jan 13, 2020 at 10:28:24AM -0500, Daniel Walsh wrote:
> On 1/8/20 9:06 AM, Vivek Goyal wrote:
> > On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
> >> [-fsdevel,+containers]
> >>
> >>> On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> >>>> Apols to ask here but are there any tools for overlayFS?
> >>>>
> >>>> https://github.com/kmxz/overlayfs-tools is just about the only thing I
> >>>> can find.
> >>> There is also https://github.com/hisilicon/overlayfs-progs which
> >>> can check and fix overlay layers, but it hasn't been updated in a while.
> >>>
> >> Hi Vivek (and containers folks),
> >>
> >> Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
> >> to ask about the status of overlayfs offline tools.
> >>
> >> Quoting my answer here for visibility to more container developers:
> >>
> >> I have been involved with implementing many overlayfs features in the
> >> kernel in the
> >> past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
> >> All of these features bring benefits to end users, but AFAIK, they are
> >> all still disabled
> >> by default in containers runtimes (?) because lack of tools support
> >> (e.g. migration
> >> /import/export). I cannot force anyone to use the new overlayfs
> >> features nor to write
> >> offline tools support for them.
> >>
> >> So how can we improve this situation?
> >>
> >> If the problem is development resources then I've had great experience
> >> in the past
> >> with OSS internship programs like Google summer of code (GSoC):
> >> Organizations, such as Redhat or mobyproject.org, can participate in the program
> >> by posting proposals for open source projects.
> >> Developers, such as myself, volunteer to mentors projects and students apply
> >> to work on them.
> >>
> >> IIRC, the timeline for GSoC for project proposals in around April. Applying as
> >> an organization could be before that.
> >>
> >> Vivek, since you are the only developer I know involved in containers runtime
> >> projects I am asking you, but really its a question for all container developers
> >> out there.
> >>
> >> Are you aware of missing features in containers that could be met by filling the
> >> gaps with overlayfs offline tools?
> > CCing Dan Walsh as he is taking care of podman and often I hear some of
> > the the complaints from him w.r.t what he thinks is missing. This is
> > not necessarily related to overlayfs offline tools.
> >
> > - Unpriviliged mounting of overlayfs.
> >  
> >   He wants to launch containers unpriviliged and hence wants to be able
> >   to mount overlayfs without being root in init_user_ns. I think Miklos
> >   posted some patches for that but not much progress after that.
> >
> >   https://patchwork.kernel.org/cover/11212091/
> >
> > - shiftfs
> >
> >   As of now they are relying on doing chown of the image but will really
> >   like to see the ability to shift uid/gids using shiftfs or using
> >   VFS layer solution.
> >
> > - Overlayfs redirect_dir is not compatible with image building
> >
> >   redirect_dir is not compatible with image building and I think that's
> >   one reason that its not used by default. And as metacopy is dependent
> >   on redirect_dir, its not used by default as well. It can be used for
> >   running containers though, but one needs to know that in advacnce.
> >
> >   So it will be good if that's fixed with redirect_dir and metacopy
> >   features and then there is higher chance that these features are
> >   enabled by default.
> >
> >   Miklos had some ides on how to tackle the issue of getting diff
> >   correctly with redirect_dir enabled.
> >
> >   https://www.spinics.net/lists/linux-unionfs/msg06969.html
> >
> >   Having said that, I think Dan Walsh has enabled metacopy by default
> >   in podman in certain configurations (for running containers and not
> >   for building images).
> >
> > Thanks
> > Vivek
> 
> Amir, Vivek did an excellent job of describing what we are attempting to
> do with OverlayFS in container tools.  My work centers around
> github.com/containers Specifically in podman(libpod), buildah, CRI-O,
> Skopeo, containers/storage and containers/image.
> 
> The Podman tool is our most popular tool and runs containers with
> metacopyup turned on by default, in at least Fedora and soon in RHEL8. 
> Not sure if it is turned on by default in Debian and Ubuntu releases, as
> well as OpenSUSE and other distros.
> 
> On of the biggest features of these container engines (runtimes) is that
> podman & Buildah can run rootless, using the user namespace. But sadly
> we can not use overlayfs for this, since mounting of overlayfs requires
> CAP_SYS_ADMIN.  As Vivek points out, Miklos is working to fix this.  For
> now we use a FUSE version of overlay called fuse_overlayfs, which can
> run rootless, but might not give us as good of performance as kernel
> overlayfs. 
> 
> The biggest feature I want to push for in container technologies is
> better support for User Namespace.  I want to use it for container
> separation, IE Each container would run with a different User
> Namespace.  This means that root in one container would be a different
> UID then Root is a different container.  Currently almost no one uses
> User Namespace for this kind of separation.  The difficulty is that the

Just to add a few more details here that seem to have fallen under
table.
This is only true for the application container world. LXD has supported
this feature for years and we run millions of container in production
including all non-x86 workloads on travis.
We've supported isolated idmaps since at least 2016 [1]. Here's a demo
of that feature too:
https://asciinema.org/a/293463

[1]: https://github.com/lxc/lxd/commit/bfe7296daa4f89fabf0c41c21a54009dfb05a709

> kernel does not support a shifting file system, so if I want to share
> the same base image image, (Lower directory) between multiple containers
> in different User Namespaces, the UIDs end up wrong.  We have hoped for
> a shifting file system for many years, but Overlay FS has never
> developed it, (Fuse-overlay has some support for it).  There is an
> effort in the kernel now to add a shifting file system, but I would bet
> this will take a long time to get implemented.  
> 
> The other option that we have built into our container engines is a
> "chowing" image.  Basically when a new container is started, in a new
> User Namespace, the container engine chowns the lower level to match the
> new user namespace and then sets up an overlay mount.  If the same image
> is used a second time, the container engine is smart enough to use the
> "chowned" image.  This chowning causes two problems on traditional
> Overlay systems.  One it is slow, since it is copying up all of the
> lower files to a new upper.  The second problem is now the kernel sees
> each executable/shared library as being different so process/memory
> sharing is broken in the kernel.  This means I get less containers
> running on a system do to memory.  The metacopyup feature of overlay
> solves both of these issues.  This is why we turn it on by default in
> Podman.  If I run podman in a new user namespace, in stead of it taking
> 30 seconds to chown the file system, it now takes < 2 seconds.
> 
> Sadly still almost no one is using User Namespace separated containers,
> because they are not on by default.  The issue is users need to pick out
> unigue ranges of UIDs for each container they create/launch, and almost
> no one does.  I would propose that we fix this by making Podman do it by

Again, this is only true for the application container world. LXD does
this automatically for you.

> default. The idea would be to allocate 2 Billion UIDs on a system and
> then have podman pick a range of 65K uids for each root running
> container that it creates.  Container/storage would keep track of the
> selection. 

That's how we do it right now. Our range is 1 billion ids currently.

Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-01-08 14:06 ` Vivek Goyal
  2020-01-08 15:29   ` Tycho Andersen
  2020-01-13 15:28   ` Daniel Walsh
@ 2020-06-05  5:33   ` Amir Goldstein
  2020-06-05 14:32     ` Vivek Goyal
                       ` (2 more replies)
  2 siblings, 3 replies; 12+ messages in thread
From: Amir Goldstein @ 2020-06-05  5:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi, Christian Brauner

On Wed, Jan 8, 2020 at 4:06 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
> > [-fsdevel,+containers]
> >
> > > On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> > > >
> > > > Apols to ask here but are there any tools for overlayFS?
> > > >
> > > > https://github.com/kmxz/overlayfs-tools is just about the only thing I
> > > > can find.
> > >
> > > There is also https://github.com/hisilicon/overlayfs-progs which
> > > can check and fix overlay layers, but it hasn't been updated in a while.
> > >
> >
> > Hi Vivek (and containers folks),
> >
> > Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
> > to ask about the status of overlayfs offline tools.
> >
> > Quoting my answer here for visibility to more container developers:
> >
> > I have been involved with implementing many overlayfs features in the
> > kernel in the
> > past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
> > All of these features bring benefits to end users, but AFAIK, they are
> > all still disabled
> > by default in containers runtimes (?) because lack of tools support
> > (e.g. migration
> > /import/export). I cannot force anyone to use the new overlayfs
> > features nor to write
> > offline tools support for them.
> >
> > So how can we improve this situation?
> >
> > If the problem is development resources then I've had great experience
> > in the past
> > with OSS internship programs like Google summer of code (GSoC):
> > Organizations, such as Redhat or mobyproject.org, can participate in the program
> > by posting proposals for open source projects.
> > Developers, such as myself, volunteer to mentors projects and students apply
> > to work on them.
> >
> > IIRC, the timeline for GSoC for project proposals in around April. Applying as
> > an organization could be before that.
> >
> > Vivek, since you are the only developer I know involved in containers runtime
> > projects I am asking you, but really its a question for all container developers
> > out there.
> >
> > Are you aware of missing features in containers that could be met by filling the
> > gaps with overlayfs offline tools?
>
> CCing Dan Walsh as he is taking care of podman and often I hear some of
> the the complaints from him w.r.t what he thinks is missing. This is
> not necessarily related to overlayfs offline tools.
>
> - Unpriviliged mounting of overlayfs.
>
>   He wants to launch containers unpriviliged and hence wants to be able
>   to mount overlayfs without being root in init_user_ns. I think Miklos
>   posted some patches for that but not much progress after that.
>
>   https://patchwork.kernel.org/cover/11212091/
>
> - shiftfs
>
>   As of now they are relying on doing chown of the image but will really
>   like to see the ability to shift uid/gids using shiftfs or using
>   VFS layer solution.
>
> - Overlayfs redirect_dir is not compatible with image building
>
>   redirect_dir is not compatible with image building and I think that's
>   one reason that its not used by default. And as metacopy is dependent
>   on redirect_dir, its not used by default as well. It can be used for
>   running containers though, but one needs to know that in advacnce.
>
>   So it will be good if that's fixed with redirect_dir and metacopy
>   features and then there is higher chance that these features are
>   enabled by default.
>
>   Miklos had some ides on how to tackle the issue of getting diff
>   correctly with redirect_dir enabled.
>
>   https://www.spinics.net/lists/linux-unionfs/msg06969.html
>

FYI, I have been playing with kmxz's overlay (offline tools).
It's a nice little tool :)
Adding "awareness" to redirect and metacopy was easy [1].

It should be easy to add support for command "export"
that does what Miklos suggested in order to migrate an image with
metacopy/redirect.

As a first step, command "vacuum" (or a new one) could be run
on layers to check if layers are already portable and then the
heavy weight "export" is not needed.

>   Having said that, I think Dan Walsh has enabled metacopy by default
>   in podman in certain configurations (for running containers and not
>   for building images).
>

I submitted a talk proposal to plumbers containers track about
enabling overlayfs features in container runtimes [2].

The last time I brought up this topic [3] the discussion quickly shifted
to the hot shiftfs (pun intended) and the same thing happened with this
thread about offline tools [4].
Please resist the temptation to do that again!
I realize shiftfs and userns overlay are high priority features for containers.
I trust they will gets their own talk on containers track.

Vivek,

It would be great if you could co-author this talk with me, although
it is intended to be more of a round table discussion anyway.
The main idea is to exchange knowledge between overlayfs users
and overlayfs developers.

Thanks,
Amir.

[1] https://github.com/amir73il/overlayfs-tools/commits/metacopy
[2] https://linuxplumbersconf.org/event/7/abstracts/601/
[3] https://lore.kernel.org/linux-unionfs/CAOQ4uxiQEpofdS97kxnii8LtVW2QiKAGvjjaH0Px-Bj3eHVCFA@mail.gmail.com/
[4] https://lore.kernel.org/linux-unionfs/70a7e65d-40a5-7940-0d4d-14cdbfef39bd@redhat.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-06-05  5:33   ` Amir Goldstein
@ 2020-06-05 14:32     ` Vivek Goyal
  2020-06-05 14:38       ` Amir Goldstein
  2020-06-05 15:13     ` Christian Brauner
  2020-08-11  9:57     ` Amir Goldstein
  2 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-06-05 14:32 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi, Christian Brauner

On Fri, Jun 05, 2020 at 08:33:08AM +0300, Amir Goldstein wrote:
> On Wed, Jan 8, 2020 at 4:06 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
> > > [-fsdevel,+containers]
> > >
> > > > On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> > > > >
> > > > > Apols to ask here but are there any tools for overlayFS?
> > > > >
> > > > > https://github.com/kmxz/overlayfs-tools is just about the only thing I
> > > > > can find.
> > > >
> > > > There is also https://github.com/hisilicon/overlayfs-progs which
> > > > can check and fix overlay layers, but it hasn't been updated in a while.
> > > >
> > >
> > > Hi Vivek (and containers folks),
> > >
> > > Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
> > > to ask about the status of overlayfs offline tools.
> > >
> > > Quoting my answer here for visibility to more container developers:
> > >
> > > I have been involved with implementing many overlayfs features in the
> > > kernel in the
> > > past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
> > > All of these features bring benefits to end users, but AFAIK, they are
> > > all still disabled
> > > by default in containers runtimes (?) because lack of tools support
> > > (e.g. migration
> > > /import/export). I cannot force anyone to use the new overlayfs
> > > features nor to write
> > > offline tools support for them.
> > >
> > > So how can we improve this situation?
> > >
> > > If the problem is development resources then I've had great experience
> > > in the past
> > > with OSS internship programs like Google summer of code (GSoC):
> > > Organizations, such as Redhat or mobyproject.org, can participate in the program
> > > by posting proposals for open source projects.
> > > Developers, such as myself, volunteer to mentors projects and students apply
> > > to work on them.
> > >
> > > IIRC, the timeline for GSoC for project proposals in around April. Applying as
> > > an organization could be before that.
> > >
> > > Vivek, since you are the only developer I know involved in containers runtime
> > > projects I am asking you, but really its a question for all container developers
> > > out there.
> > >
> > > Are you aware of missing features in containers that could be met by filling the
> > > gaps with overlayfs offline tools?
> >
> > CCing Dan Walsh as he is taking care of podman and often I hear some of
> > the the complaints from him w.r.t what he thinks is missing. This is
> > not necessarily related to overlayfs offline tools.
> >
> > - Unpriviliged mounting of overlayfs.
> >
> >   He wants to launch containers unpriviliged and hence wants to be able
> >   to mount overlayfs without being root in init_user_ns. I think Miklos
> >   posted some patches for that but not much progress after that.
> >
> >   https://patchwork.kernel.org/cover/11212091/
> >
> > - shiftfs
> >
> >   As of now they are relying on doing chown of the image but will really
> >   like to see the ability to shift uid/gids using shiftfs or using
> >   VFS layer solution.
> >
> > - Overlayfs redirect_dir is not compatible with image building
> >
> >   redirect_dir is not compatible with image building and I think that's
> >   one reason that its not used by default. And as metacopy is dependent
> >   on redirect_dir, its not used by default as well. It can be used for
> >   running containers though, but one needs to know that in advacnce.
> >
> >   So it will be good if that's fixed with redirect_dir and metacopy
> >   features and then there is higher chance that these features are
> >   enabled by default.
> >
> >   Miklos had some ides on how to tackle the issue of getting diff
> >   correctly with redirect_dir enabled.
> >
> >   https://www.spinics.net/lists/linux-unionfs/msg06969.html
> >
> 
> FYI, I have been playing with kmxz's overlay (offline tools).
> It's a nice little tool :)
> Adding "awareness" to redirect and metacopy was easy [1].
> 
> It should be easy to add support for command "export"
> that does what Miklos suggested in order to migrate an image with
> metacopy/redirect.
> 
> As a first step, command "vacuum" (or a new one) could be run
> on layers to check if layers are already portable and then the
> heavy weight "export" is not needed.
> 
> >   Having said that, I think Dan Walsh has enabled metacopy by default
> >   in podman in certain configurations (for running containers and not
> >   for building images).
> >
> 
> I submitted a talk proposal to plumbers containers track about
> enabling overlayfs features in container runtimes [2].

Hi Amir,

I can't seem to access this abstract proposal (Despite the fact I 
created a new login id).

> 
> The last time I brought up this topic [3] the discussion quickly shifted
> to the hot shiftfs (pun intended) and the same thing happened with this
> thread about offline tools [4].
> Please resist the temptation to do that again!
> I realize shiftfs and userns overlay are high priority features for containers.
> I trust they will gets their own talk on containers track.

Miklos already posted patches once for allowing mounting overlayfs from
inside user namespace. He might have more to say on this.

> 
> Vivek,
> 
> It would be great if you could co-author this talk with me, although
> it is intended to be more of a round table discussion anyway.
> The main idea is to exchange knowledge between overlayfs users
> and overlayfs developers.

Sure. I can help write some sections of that talk (especially metacopy
feature) and anything else you want.

Thanks
Vivek

> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/overlayfs-tools/commits/metacopy
> [2] https://linuxplumbersconf.org/event/7/abstracts/601/
> [3] https://lore.kernel.org/linux-unionfs/CAOQ4uxiQEpofdS97kxnii8LtVW2QiKAGvjjaH0Px-Bj3eHVCFA@mail.gmail.com/
> [4] https://lore.kernel.org/linux-unionfs/70a7e65d-40a5-7940-0d4d-14cdbfef39bd@redhat.com/
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-06-05 14:32     ` Vivek Goyal
@ 2020-06-05 14:38       ` Amir Goldstein
  2020-06-05 15:19         ` Christian Brauner
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2020-06-05 14:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi, Christian Brauner

> Hi Amir,
>
> I can't seem to access this abstract proposal (Despite the fact I
> created a new login id).
>

Maybe it needs to be accepted to become public, anyway:

Containers are by far the biggest use case for overlayfs.
Yet, there seems to be very little cross talk between overlayfs and
containers mailing lists.

This talk is going to present some opt-in overlayfs features that were
added in recent years (redirect_dir, index, nfs_export, xino,
metacopy).

Most of those features have not been enabled by most container
runtimes, because of various reasons:

* Requires more development in userspace (image migration)
* Unrelated runtime bugs (mount leaks)
* Mismatch for containers needs
* Lack of promotion

This talk is about giving the opportunity to container runtime
developers to better understand what they may get from overlayfs.

This talk is not about containers wish list from overlayfs, because
userns overlayfs mount needs 45 minutes on its own...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-06-05  5:33   ` Amir Goldstein
  2020-06-05 14:32     ` Vivek Goyal
@ 2020-06-05 15:13     ` Christian Brauner
  2020-08-11  9:57     ` Amir Goldstein
  2 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2020-06-05 15:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Vivek Goyal, overlayfs, StuartIanNaylor, Linux Containers, kmxz,
	zhangyi (F),
	Miklos Szeredi

On Fri, Jun 05, 2020 at 08:33:08AM +0300, Amir Goldstein wrote:
> On Wed, Jan 8, 2020 at 4:06 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jan 08, 2020 at 09:27:12AM +0200, Amir Goldstein wrote:
> > > [-fsdevel,+containers]
> > >
> > > > On Thu, Apr 18, 2019 at 1:58 PM StuartIanNaylor <rolyantrauts@gmail.com> wrote:
> > > > >
> > > > > Apols to ask here but are there any tools for overlayFS?
> > > > >
> > > > > https://github.com/kmxz/overlayfs-tools is just about the only thing I
> > > > > can find.
> > > >
> > > > There is also https://github.com/hisilicon/overlayfs-progs which
> > > > can check and fix overlay layers, but it hasn't been updated in a while.
> > > >
> > >
> > > Hi Vivek (and containers folks),
> > >
> > > Stuart has pinged me on https://github.com/StuartIanNaylor/zram-config/issues/4
> > > to ask about the status of overlayfs offline tools.
> > >
> > > Quoting my answer here for visibility to more container developers:
> > >
> > > I have been involved with implementing many overlayfs features in the
> > > kernel in the
> > > past couple of years (redirect_dir,index,nfs_export,xino,metacopy).
> > > All of these features bring benefits to end users, but AFAIK, they are
> > > all still disabled
> > > by default in containers runtimes (?) because lack of tools support
> > > (e.g. migration
> > > /import/export). I cannot force anyone to use the new overlayfs
> > > features nor to write
> > > offline tools support for them.
> > >
> > > So how can we improve this situation?
> > >
> > > If the problem is development resources then I've had great experience
> > > in the past
> > > with OSS internship programs like Google summer of code (GSoC):
> > > Organizations, such as Redhat or mobyproject.org, can participate in the program
> > > by posting proposals for open source projects.
> > > Developers, such as myself, volunteer to mentors projects and students apply
> > > to work on them.
> > >
> > > IIRC, the timeline for GSoC for project proposals in around April. Applying as
> > > an organization could be before that.
> > >
> > > Vivek, since you are the only developer I know involved in containers runtime
> > > projects I am asking you, but really its a question for all container developers
> > > out there.
> > >
> > > Are you aware of missing features in containers that could be met by filling the
> > > gaps with overlayfs offline tools?
> >
> > CCing Dan Walsh as he is taking care of podman and often I hear some of
> > the the complaints from him w.r.t what he thinks is missing. This is
> > not necessarily related to overlayfs offline tools.
> >
> > - Unpriviliged mounting of overlayfs.
> >
> >   He wants to launch containers unpriviliged and hence wants to be able
> >   to mount overlayfs without being root in init_user_ns. I think Miklos
> >   posted some patches for that but not much progress after that.
> >
> >   https://patchwork.kernel.org/cover/11212091/
> >
> > - shiftfs
> >
> >   As of now they are relying on doing chown of the image but will really
> >   like to see the ability to shift uid/gids using shiftfs or using
> >   VFS layer solution.
> >
> > - Overlayfs redirect_dir is not compatible with image building
> >
> >   redirect_dir is not compatible with image building and I think that's
> >   one reason that its not used by default. And as metacopy is dependent
> >   on redirect_dir, its not used by default as well. It can be used for
> >   running containers though, but one needs to know that in advacnce.
> >
> >   So it will be good if that's fixed with redirect_dir and metacopy
> >   features and then there is higher chance that these features are
> >   enabled by default.
> >
> >   Miklos had some ides on how to tackle the issue of getting diff
> >   correctly with redirect_dir enabled.
> >
> >   https://www.spinics.net/lists/linux-unionfs/msg06969.html
> >
> 
> FYI, I have been playing with kmxz's overlay (offline tools).
> It's a nice little tool :)
> Adding "awareness" to redirect and metacopy was easy [1].
> 
> It should be easy to add support for command "export"
> that does what Miklos suggested in order to migrate an image with
> metacopy/redirect.
> 
> As a first step, command "vacuum" (or a new one) could be run
> on layers to check if layers are already portable and then the
> heavy weight "export" is not needed.
> 
> >   Having said that, I think Dan Walsh has enabled metacopy by default
> >   in podman in certain configurations (for running containers and not
> >   for building images).
> >
> 
> I submitted a talk proposal to plumbers containers track about
> enabling overlayfs features in container runtimes [2].

Amir, excellent! Looking forward to this!
Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-06-05 14:38       ` Amir Goldstein
@ 2020-06-05 15:19         ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2020-06-05 15:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Vivek Goyal, StuartIanNaylor, Miklos Szeredi, Linux Containers,
	zhangyi (F),
	overlayfs, kmxz

On Fri, Jun 05, 2020 at 05:38:40PM +0300, Amir Goldstein wrote:
> > Hi Amir,
> >
> > I can't seem to access this abstract proposal (Despite the fact I
> > created a new login id).
> >
> 
> Maybe it needs to be accepted to become public, anyway:

The talks handed in via the plumber's website are not visible until
accepted. I think you can add Vivek as a co-author by editing your
submission if I'm not mistaken.

Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: OverlaysFS offline tools
  2020-06-05  5:33   ` Amir Goldstein
  2020-06-05 14:32     ` Vivek Goyal
  2020-06-05 15:13     ` Christian Brauner
@ 2020-08-11  9:57     ` Amir Goldstein
  2 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2020-08-11  9:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: overlayfs, StuartIanNaylor, Linux Containers, kmxz, zhangyi (F),
	Miklos Szeredi, Christian Brauner

> > - Overlayfs redirect_dir is not compatible with image building
> >
> >   redirect_dir is not compatible with image building and I think that's
> >   one reason that its not used by default. And as metacopy is dependent
> >   on redirect_dir, its not used by default as well. It can be used for
> >   running containers though, but one needs to know that in advacnce.
> >
> >   So it will be good if that's fixed with redirect_dir and metacopy
> >   features and then there is higher chance that these features are
> >   enabled by default.
> >
> >   Miklos had some ides on how to tackle the issue of getting diff
> >   correctly with redirect_dir enabled.
> >
> >   https://www.spinics.net/lists/linux-unionfs/msg06969.html
> >
>
> FYI, I have been playing with kmxz's overlay (offline tools).
> It's a nice little tool :)
> Adding "awareness" to redirect and metacopy was easy [1].
>
> It should be easy to add support for command "export"
> that does what Miklos suggested in order to migrate an image with
> metacopy/redirect.
>

FYI, I took a swing at that and implemented the command "deref"
that implements Miklos' suggestion [1].
I am sure we could think of a better name, but whatever...

> I submitted a talk proposal to plumbers containers track about
> enabling overlayfs features in container runtimes.
>

FYI2, this is a draft of talking points for my talk on Plumbers [2].
My hope is that I could use 15min to introduce the new overlayfs features
and the challenges to integrate them with container runtimes and that we
will be able to spend the remaining 30min for open discussion.

Thanks,
Amir.

[1] https://github.com/kmxz/overlayfs-tools/pull/11
[2] https://github.com/amir73il/overlayfs/wiki/Overlayfs-and-containers

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-08-11  9:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08  7:27 OverlaysFS offline tools Amir Goldstein
2020-01-08 14:06 ` Vivek Goyal
2020-01-08 15:29   ` Tycho Andersen
2020-01-13 15:28   ` Daniel Walsh
2020-01-13 18:02     ` Amir Goldstein
2020-01-13 20:07     ` Christian Brauner
2020-06-05  5:33   ` Amir Goldstein
2020-06-05 14:32     ` Vivek Goyal
2020-06-05 14:38       ` Amir Goldstein
2020-06-05 15:19         ` Christian Brauner
2020-06-05 15:13     ` Christian Brauner
2020-08-11  9:57     ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).