All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Andrey Vagin <avagin@virtuozzo.com>,
	Konstantin Khorenko <khorenko@virtuozzo.com>,
	Vasiliy Averin <vvs@virtuozzo.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] overlayfs: C/R enhancements
Date: Fri, 5 Jun 2020 13:21:40 +0300	[thread overview]
Message-ID: <CAOQ4uxhhNx0VxJB=eLoPX+wt15tH3-KLjGuQem4h_R=0nfkAiA@mail.gmail.com> (raw)
In-Reply-To: <fb79be2c-4fc8-5a9d-9b07-e0464fca9c3f@virtuozzo.com>

On Fri, Jun 5, 2020 at 11:41 AM Pavel Tikhomirov
<ptikhomirov@virtuozzo.com> wrote:
>
>
>
> On 6/5/20 5:35 AM, Amir Goldstein wrote:
> > On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
> > <alexander.mikhalitsyn@virtuozzo.com> wrote:
> >>
> >> Hello,
> >>
> >>> But overlayfs won't accept these "output only" options as input args,
> >> which is a problem.
> >>
> >> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
> >>
> >
> > That would solve this small problem.
>
> This is not a big problem actually as these options shown in mountinfo
> for overlay had been "output only" forever,
> please see these two examples below:
>
> a) Imagine you've mounted overlay with relative paths and forgot (or you
> never known as you are another user) where your cwd was at the moment of
> mount syscall. - How would you use those options as "input" to create
> the same overlay mount somethere else (bind-mounting not involved)?
>
> b) Imagine you've mounted overlay with absolute paths and someone (other
> user) overmounted lower (upper/workdir) paths for you, all directory
> structure would be the same on overmount but yet files are different. -
> How would you use those options from mountinfo as "input" ones?
>
> We try to make them much closer to "input" ones.


That is not what I meant by "output only"
I meant invalid input options as in EINVAL not ENOENT

>
> Agreed, we should ignore *_mnt_id on mount because paths identify mounts
> at the time of mount call.
>
> >
> >>> Wouldn't it be better for C/R to implement mount options
> >> that overlayfs can parse and pass it mntid and fhandle instead
> >> of paths? >>
> >> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.
> >
> > As you already know from inotify/fanotify C/R fhandle is timeless, so
> > there would be no distinction between mount time and dump time.
>
> Pair of fhandle+mnt_id looks an equivalent to path+mnt_id pair, CRIU
> will just need to open fhandle+mnt_id with open_by_handle_at and
> readlink to get path on dump and continue to use path+mnt_id as before.
> (not too common with fhandles but it's my current understanding)
>
> But if you take a look on (a) and (b) again, the regular user does not
> see full information about overlay mount in /proc/pid/mountinfo, they
> can't just take a look on it and understand from there it comes from.
> Resolving fhandle looks like a too hard task for a user.
>

Right, and we need to provide regular user needs in parallel to
providing C/R needs. understood.

> > About mnt_id, your patches will cause the original mount-time mounts to be busy.
> > That is a problem as well.
>
> Children mounts lock parent, open files lock parent. Another analogy is
> a loop device which locks the backing file mount (AFAICS). Anyway one
> can lazy umount, can't they? But I'm not too sure for this one, maybe
> you can share more implications of this problem?
>

Overlayfs mounts are internal not children mounts in the namespace,
so no open files hold reference the mounts in the namespace (AFAICS).

This use case will break:

  mount /dev/vdf /vdf
  mkdir /vdf/{l,u,w} /tmp/m
  mount -t overlay overlay /tmp/m -o
lowerdir=/vdf/l,upperdir=/vdf/u,workdir=/vdf/w
  umount /vdf

Yes users can lazy unmount, the filesystem itself on /dev/vdf is not actually
unmounted, only the /vdf mount goes away from the namespace, but the use case
without lazy unmount will still break.

Maybe its fine since distro/admin needs to opt-in for this change of behavior.
I have to wonder though, why did you add two different config/module
options for this feature? Yes, its two different sub-functionalities, but
which real user (not imaginary one) will really turn on just half the feature?
While at it, you copy pasted the text:
          For more information, see Documentation/filesystems/overlayfs.txt
but there is no more information to be found.

> >
> > I think you should describe the use case is more details.
> > Is your goal to C/R any overlayfs mount that the process has open
> > files on? visible to process
> We wan't to dump a container, not a simple process, if the container
> process has access to some resource CRIU needs to restore this resource.
>
> Imagine the process in container mounts it's own overlay inside
> container, for instance to imulate write access to readonly mount or
> just to implement some snapshots, don't know exact use case. And we want
> to checkpoint/restore this container. (Currently CRIU only supports
> overlay as external mount, e.g. for docker continers docker engine
> pre-creates overlay for us and we just bind from it - it's a different
> case.) If the in-container process creates the in-container mount we
> need to recreate it on restore so that the in-container view of the
> filesystem persists.
>

Understood. but how do you *know* which mounts the container
created and need to be migrated?
Which loop devices the user has created?
As opposed to the ones that docker engine re-created?
It is the found from diff between mountinfo of process and host?

> > For NFS export, we use the persistent descriptor {uuid;fhandle}
> > (a.k.a. struct ovl_fh) to encode
> > an underlying layer object.
> >
> > CRIU can look for an existing mount to a filesystem with uuid as restore stage
> > (or even mount this filesystem) and use open_by_handle_at() to open a
> > path to layer.
>
> On restore we can be on another physical node, so I doubt we have same
> uuid's, sorry I don't fully understand here already.
>

I see, so what about inotify/fanotify?
fhandle and uuid can be looked up/resolved to mnt/path at "dump" time.
The difference between mnt_id/uuid is who keeps the reference on the
mount. If overlayfs provides uuid, then you rely on docker to keep the
reference on the mount and use the reference-less uuid to find the
mount that docker is holding for you.

> > After mounting overlay, that mount to underlying fs can even be discarded.
> >
> > And if this works for you, you don't have to export the layers ovl_fh in
> > /proc/mounts, you can export them in numerous other ways.
> > One way from the top of my head, getxattr on overlay root dir.
> > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > for example could work.
>
> Thanks xattr might be a good option, but still don't forget about (a)
> and (b), users like to know all information about mount from
> /proc/pid/mountinfo.
>

Let's stick to your use cases requirements. If you have other use cases
for this functionality lay them out explicitly.

I went to see what losetup does and I see that LOOP_SET_STATUS
ioctl stores a path string that LOOP_GET_STATUS gets back in return.
Does not seem C/R friendly either. Are you not handling loop devices?

It's strange because loop driver keeps an open backing file so
LOOP_GET_STATUS could have returned the uptodate path.

Thanks,
Amir.

  reply	other threads:[~2020-06-05 10:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 2/2] overlayfs: add mnt_id paths options Alexander Mikhalitsyn
2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
2020-06-04 21:34   ` Alexander Mikhalitsyn
2020-06-05  2:35     ` Amir Goldstein
2020-06-05  8:41       ` Pavel Tikhomirov
2020-06-05 10:21         ` Amir Goldstein [this message]
2020-06-05 12:44           ` Alexander Mikhalitsyn
2020-06-05 14:36             ` Amir Goldstein
2020-06-05 14:56               ` Alexander Mikhalitsyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhhNx0VxJB=eLoPX+wt15tH3-KLjGuQem4h_R=0nfkAiA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=alexander.mikhalitsyn@virtuozzo.com \
    --cc=avagin@virtuozzo.com \
    --cc=khorenko@virtuozzo.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=vvs@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.