linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Andrey Vagin <avagin@virtuozzo.com>,
	Konstantin Khorenko <khorenko@virtuozzo.com>,
	Vasiliy Averin <vvs@virtuozzo.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] overlayfs: C/R enhancements
Date: Fri, 5 Jun 2020 13:21:40 +0300	[thread overview]
Message-ID: <CAOQ4uxhhNx0VxJB=eLoPX+wt15tH3-KLjGuQem4h_R=0nfkAiA@mail.gmail.com> (raw)
In-Reply-To: <fb79be2c-4fc8-5a9d-9b07-e0464fca9c3f@virtuozzo.com>

On Fri, Jun 5, 2020 at 11:41 AM Pavel Tikhomirov
<ptikhomirov@virtuozzo.com> wrote:
>
>
>
> On 6/5/20 5:35 AM, Amir Goldstein wrote:
> > On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
> > <alexander.mikhalitsyn@virtuozzo.com> wrote:
> >>
> >> Hello,
> >>
> >>> But overlayfs won't accept these "output only" options as input args,
> >> which is a problem.
> >>
> >> Will it be problematic if we simply ignore "lowerdir_mnt_id" and "upperdir_mnt_id" options in ovl_parse_opt()?
> >>
> >
> > That would solve this small problem.
>
> This is not a big problem actually as these options shown in mountinfo
> for overlay had been "output only" forever,
> please see these two examples below:
>
> a) Imagine you've mounted overlay with relative paths and forgot (or you
> never known as you are another user) where your cwd was at the moment of
> mount syscall. - How would you use those options as "input" to create
> the same overlay mount somethere else (bind-mounting not involved)?
>
> b) Imagine you've mounted overlay with absolute paths and someone (other
> user) overmounted lower (upper/workdir) paths for you, all directory
> structure would be the same on overmount but yet files are different. -
> How would you use those options from mountinfo as "input" ones?
>
> We try to make them much closer to "input" ones.


That is not what I meant by "output only"
I meant invalid input options as in EINVAL not ENOENT

>
> Agreed, we should ignore *_mnt_id on mount because paths identify mounts
> at the time of mount call.
>
> >
> >>> Wouldn't it be better for C/R to implement mount options
> >> that overlayfs can parse and pass it mntid and fhandle instead
> >> of paths? >>
> >> Problem is that we need to know on C/R "dump stage" which mounts are used on lower layers and upper layer. Most likely I don't understand something but I can't catch how "mount-time" options will help us.
> >
> > As you already know from inotify/fanotify C/R fhandle is timeless, so
> > there would be no distinction between mount time and dump time.
>
> Pair of fhandle+mnt_id looks an equivalent to path+mnt_id pair, CRIU
> will just need to open fhandle+mnt_id with open_by_handle_at and
> readlink to get path on dump and continue to use path+mnt_id as before.
> (not too common with fhandles but it's my current understanding)
>
> But if you take a look on (a) and (b) again, the regular user does not
> see full information about overlay mount in /proc/pid/mountinfo, they
> can't just take a look on it and understand from there it comes from.
> Resolving fhandle looks like a too hard task for a user.
>

Right, and we need to provide regular user needs in parallel to
providing C/R needs. understood.

> > About mnt_id, your patches will cause the original mount-time mounts to be busy.
> > That is a problem as well.
>
> Children mounts lock parent, open files lock parent. Another analogy is
> a loop device which locks the backing file mount (AFAICS). Anyway one
> can lazy umount, can't they? But I'm not too sure for this one, maybe
> you can share more implications of this problem?
>

Overlayfs mounts are internal not children mounts in the namespace,
so no open files hold reference the mounts in the namespace (AFAICS).

This use case will break:

  mount /dev/vdf /vdf
  mkdir /vdf/{l,u,w} /tmp/m
  mount -t overlay overlay /tmp/m -o
lowerdir=/vdf/l,upperdir=/vdf/u,workdir=/vdf/w
  umount /vdf

Yes users can lazy unmount, the filesystem itself on /dev/vdf is not actually
unmounted, only the /vdf mount goes away from the namespace, but the use case
without lazy unmount will still break.

Maybe its fine since distro/admin needs to opt-in for this change of behavior.
I have to wonder though, why did you add two different config/module
options for this feature? Yes, its two different sub-functionalities, but
which real user (not imaginary one) will really turn on just half the feature?
While at it, you copy pasted the text:
          For more information, see Documentation/filesystems/overlayfs.txt
but there is no more information to be found.

> >
> > I think you should describe the use case is more details.
> > Is your goal to C/R any overlayfs mount that the process has open
> > files on? visible to process
> We wan't to dump a container, not a simple process, if the container
> process has access to some resource CRIU needs to restore this resource.
>
> Imagine the process in container mounts it's own overlay inside
> container, for instance to imulate write access to readonly mount or
> just to implement some snapshots, don't know exact use case. And we want
> to checkpoint/restore this container. (Currently CRIU only supports
> overlay as external mount, e.g. for docker continers docker engine
> pre-creates overlay for us and we just bind from it - it's a different
> case.) If the in-container process creates the in-container mount we
> need to recreate it on restore so that the in-container view of the
> filesystem persists.
>

Understood. but how do you *know* which mounts the container
created and need to be migrated?
Which loop devices the user has created?
As opposed to the ones that docker engine re-created?
It is the found from diff between mountinfo of process and host?

> > For NFS export, we use the persistent descriptor {uuid;fhandle}
> > (a.k.a. struct ovl_fh) to encode
> > an underlying layer object.
> >
> > CRIU can look for an existing mount to a filesystem with uuid as restore stage
> > (or even mount this filesystem) and use open_by_handle_at() to open a
> > path to layer.
>
> On restore we can be on another physical node, so I doubt we have same
> uuid's, sorry I don't fully understand here already.
>

I see, so what about inotify/fanotify?
fhandle and uuid can be looked up/resolved to mnt/path at "dump" time.
The difference between mnt_id/uuid is who keeps the reference on the
mount. If overlayfs provides uuid, then you rely on docker to keep the
reference on the mount and use the reference-less uuid to find the
mount that docker is holding for you.

> > After mounting overlay, that mount to underlying fs can even be discarded.
> >
> > And if this works for you, you don't have to export the layers ovl_fh in
> > /proc/mounts, you can export them in numerous other ways.
> > One way from the top of my head, getxattr on overlay root dir.
> > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > for example could work.
>
> Thanks xattr might be a good option, but still don't forget about (a)
> and (b), users like to know all information about mount from
> /proc/pid/mountinfo.
>

Let's stick to your use cases requirements. If you have other use cases
for this functionality lay them out explicitly.

I went to see what losetup does and I see that LOOP_SET_STATUS
ioctl stores a path string that LOOP_GET_STATUS gets back in return.
Does not seem C/R friendly either. Are you not handling loop devices?

It's strange because loop driver keeps an open backing file so
LOOP_GET_STATUS could have returned the uptodate path.

Thanks,
Amir.

  reply	other threads:[~2020-06-05 10:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 16:11 [PATCH 0/2] overlayfs: C/R enhancements Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 1/2] overlayfs: add dynamic path resolving in mount options Alexander Mikhalitsyn
2020-06-04 16:11 ` [PATCH 2/2] overlayfs: add mnt_id paths options Alexander Mikhalitsyn
2020-06-04 18:04 ` [PATCH 0/2] overlayfs: C/R enhancements Amir Goldstein
2020-06-04 21:34   ` Alexander Mikhalitsyn
2020-06-05  2:35     ` Amir Goldstein
2020-06-05  8:41       ` Pavel Tikhomirov
2020-06-05 10:21         ` Amir Goldstein [this message]
2020-06-05 12:44           ` Alexander Mikhalitsyn
2020-06-05 14:36             ` Amir Goldstein
2020-06-05 14:56               ` Alexander Mikhalitsyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhhNx0VxJB=eLoPX+wt15tH3-KLjGuQem4h_R=0nfkAiA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=alexander.mikhalitsyn@virtuozzo.com \
    --cc=avagin@virtuozzo.com \
    --cc=khorenko@virtuozzo.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=vvs@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).