All of lore.kernel.org
 help / color / mirror / Atom feed
* ovl: Ephemeral Mounts
@ 2018-10-11 15:28 Sargun Dhillon
  2018-10-11 18:20 ` Amir Goldstein
  2018-10-11 18:36 ` Miklos Szeredi
  0 siblings, 2 replies; 3+ messages in thread
From: Sargun Dhillon @ 2018-10-11 15:28 UTC (permalink / raw)
  To: linux-fsdevel, linux-unionfs; +Cc: amir73il, ghartmann, miklos

We recently upgraded our kernel from 4.9 to 4.18 and were surprised to
find a behaviour change in overlayfs. Overlayfs now calls sync on the
upper dir's superblock on shutdown. This causes all of our containers
to stall out for a little bit.

We run lots of ephemeral "containers" with overlayfs (Docker) on XFS.
A given XFS filesystem could be host to 50+ containers. We block our
users from calling syncfs on their overlayfs mount. Unfortunately, on
filesystem shutdown, syncfs gets called on the overlayfs, which calls
syncfs on the upperdir, causing a ton of I/O on the block device. This
is useless, because all of the data they wrote to the upperdir is
subsequently removed.

We believe that we're not going to be the only ones surprised by this behaviour.

Since we don't control shutdown of the mount namespace, and therefore
control shutdown of the mount, it's not easy to add an ioctl to
shutdown the filesystem cleanly, instead we need something at mount
time we can use to indicate that syncfs shouldn't happen.

I propose that we add a mount option "ephemeral" to the overlayfs
mount which tells overlayfs to not syncfs at shutdown time.

It might also be nice to extend this mount option to tell overlayfs to
drop all syncfs calls, or return EIO.

Does anyone else have any other suggestions?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ovl: Ephemeral Mounts
  2018-10-11 15:28 ovl: Ephemeral Mounts Sargun Dhillon
@ 2018-10-11 18:20 ` Amir Goldstein
  2018-10-11 18:36 ` Miklos Szeredi
  1 sibling, 0 replies; 3+ messages in thread
From: Amir Goldstein @ 2018-10-11 18:20 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: linux-fsdevel, overlayfs, ghartmann, Miklos Szeredi

On Thu, Oct 11, 2018 at 6:28 PM Sargun Dhillon <sargun@sargun.me> wrote:
>
> We recently upgraded our kernel from 4.9 to 4.18 and were surprised to
> find a behaviour change in overlayfs. Overlayfs now calls sync on the
> upper dir's superblock on shutdown. This causes all of our containers
> to stall out for a little bit.
>
> We run lots of ephemeral "containers" with overlayfs (Docker) on XFS.
> A given XFS filesystem could be host to 50+ containers. We block our
> users from calling syncfs on their overlayfs mount. Unfortunately, on
> filesystem shutdown, syncfs gets called on the overlayfs, which calls
> syncfs on the upperdir, causing a ton of I/O on the block device. This
> is useless, because all of the data they wrote to the upperdir is
> subsequently removed.
>
> We believe that we're not going to be the only ones surprised by this behaviour.

Yeh, like all the users unhappy with EBUSY mounts because
of leaked old mounts... Sigh!
We should have named this filesystem unionsfs, because the unions won't
let us change anything in the contract ;-)

Without escaping responsibility for legacy behavior, which is very simple
to do technically, I will try to suggest "proper" solutions to fit your needs
until other users come along with other needs.

>
> Since we don't control shutdown of the mount namespace, and therefore
> control shutdown of the mount, it's not easy to add an ioctl to
> shutdown the filesystem cleanly, instead we need something at mount
> time we can use to indicate that syncfs shouldn't happen.
>
> I propose that we add a mount option "ephemeral" to the overlayfs
> mount which tells overlayfs to not syncfs at shutdown time.
>
> It might also be nice to extend this mount option to tell overlayfs to
> drop all syncfs calls, or return EIO.
>

umount(2) and syncfs(2) end up calling the same fs sync_fs method
so it wont be pleasant to have different behavior in the two cases.

> Does anyone else have any other suggestions?

I tried to find a similar semantics through the storage stack (e.g noflush)
but couldn't find anything like that in the kernel. The only thing that closely
resembles is the SHUTDOWN ioctl (xfs, ext4, f2fs), but you say that this
is not an option for your application.

Can you explain what prevents you from holding a reference (e.g via
master/slave mount propagation) on the overlay mount (that your
application created?) and when you need to tear down the container,
use that bind mount to issue the SHUTDOWN command before final
umount?

Implementing the "write-side" of the SHUTDOWN is quite simple
because the check for ofs->shutdown could go into ovl_want_write()
and that should actually be enough to provide the semantics you need.

Problems with SHUTDOWN ioctl solution are:
- Maybe you really cannot implement a shutdown hook?
- Other users may come complaining about a regression without
  the ability or intention to change their application
- Implementing just the "write-side" of the SHUTDOWN is not
  conforming to existing convention, but not sure if anybody has
  a need for the "full shutdown" (read and write return EIO).

Another "conventional" solution would be to fix ovl_sync_fs()
to only sync the upper inodes that belong to upper layer and not the
entire upper fs. There are already patches for this solution:
https://lkml.org/lkml/2018/6/10/4

The problem with this solution is that it is taking an opposite
direction of the plan to move upper inodes page cache to overlay
inodes, so it's unlikely that is going to happen and you will have to
wait for the overlay inode page cache.

Will that sort of solution be adequate for your application?
meaning each container tear-down only flushed its own upper inodes
but not the rest of the upper filesystem dirty inodes?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ovl: Ephemeral Mounts
  2018-10-11 15:28 ovl: Ephemeral Mounts Sargun Dhillon
  2018-10-11 18:20 ` Amir Goldstein
@ 2018-10-11 18:36 ` Miklos Szeredi
  1 sibling, 0 replies; 3+ messages in thread
From: Miklos Szeredi @ 2018-10-11 18:36 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: linux-fsdevel, overlayfs, Amir Goldstein, ghartmann

On Thu, Oct 11, 2018 at 5:28 PM, Sargun Dhillon <sargun@sargun.me> wrote:
> We recently upgraded our kernel from 4.9 to 4.18 and were surprised to
> find a behaviour change in overlayfs. Overlayfs now calls sync on the
> upper dir's superblock on shutdown. This causes all of our containers
> to stall out for a little bit.
>
> We run lots of ephemeral "containers" with overlayfs (Docker) on XFS.
> A given XFS filesystem could be host to 50+ containers. We block our
> users from calling syncfs on their overlayfs mount. Unfortunately, on
> filesystem shutdown, syncfs gets called on the overlayfs, which calls
> syncfs on the upperdir, causing a ton of I/O on the block device. This
> is useless, because all of the data they wrote to the upperdir is
> subsequently removed.
>
> We believe that we're not going to be the only ones surprised by this behaviour.
>
> Since we don't control shutdown of the mount namespace, and therefore
> control shutdown of the mount, it's not easy to add an ioctl to
> shutdown the filesystem cleanly, instead we need something at mount
> time we can use to indicate that syncfs shouldn't happen.
>
> I propose that we add a mount option "ephemeral" to the overlayfs
> mount which tells overlayfs to not syncfs at shutdown time.
>
> It might also be nice to extend this mount option to tell overlayfs to
> drop all syncfs calls, or return EIO.
>
> Does anyone else have any other suggestions?

I would guess tmpfs as upper layer might be a good fit when there's
lots of memory.  Tmpfs can also swap out data on memory pressure (not
dentries or inodes though).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-11 18:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11 15:28 ovl: Ephemeral Mounts Sargun Dhillon
2018-10-11 18:20 ` Amir Goldstein
2018-10-11 18:36 ` Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.