linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Kevin Locke <kevin@kevinlocke.name>,
	Amir Goldstein <amir73il@gmail.com>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>
Cc: kmxz <kxzkxz7139@gmail.com>
Subject: Re: EIO for removed redirected files?
Date: Fri, 14 Aug 2020 09:20:08 +0300	[thread overview]
Message-ID: <CAOQ4uxjA-+EsUz2LuEc2PYQzZkO+aBK4dBMsoSvBgNjaU9Nj_Q@mail.gmail.com> (raw)
In-Reply-To: <20200813172218.GA298313@kevinolos>

On Thu, Aug 13, 2020 at 8:22 PM Kevin Locke <kevin@kevinlocke.name> wrote:
>
> Thanks again Amir!  I'll work on patches for the docs and adding
> pr_warn_ratelimited() for invalid metacopy/redirect as soon as I get a
> chance.
>
> On Wed, 2020-08-12 at 20:06 +0300, Amir Goldstein wrote:
> > On Wed, Aug 12, 2020 at 7:05 PM Kevin Locke <kevin@kevinlocke.name> wrote:
> >> On Wed, 2020-08-12 at 18:21 +0300, Amir Goldstein wrote:
> >>> I guess the only thing we could document is that changes to underlying
> >>> layers with metacopy and redirects have undefined results.
> >>> Vivek was a proponent of making the statements about outcome of
> >>> changes to underlying layers sound more harsh.
> >>
> >> That sounds good to me.  My current use case involves offline changes to
> >> the lower layer on a routine basis, and I interpreted the current
> >
> > You are not the only one, I hear of many users that do that, but nobody ever
> > bothered to sit down and document the requirements - what exactly is the
> > use case and what is the expected outcome.
>
> I can elaborate a bit.  Keep in mind that it's a personal use case which
> is flexible, so it's probably not worth supporting specifically, but may
> be useful to discuss/consider:
>
> A few machines that I manage are dual-boot between Windows and Linux,
> with software that runs on both OSes (Steam).  This software installs a
> lot (>100GB) of semi-static data which is mostly (>90%) the same between
> OSes, but not partitioned by folder or designed to be shared between
> them.  The software includes mechanisms for validating the data files
> and automatically updating/repairing any files which do not match
> expectations.
>
> I currently mount an overlayfs of the Windows data directory on the
> Linux data directory to avoid storing multiple copies of common data.
> After any data changes in Windows, I re-run the data file validation in
> Linux to ensure the data is consistent.  I also occasionally run a
> deduplication script[1] to remove files which may have been updated on
> Linux and later updated to the same contents on Windows.
>

Nice use case.
It may be a niche use case the way to describe it, but the general concept
of "updatable software" at the lower layer is not unique to your use case.
See this [1] recent example that spawned the thread about updating the
documentation w.r.t changing underlying layers.

[1] https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/

> To support this use, I'm looking for a way to configure overlayfs such
> that offline changes to the lower dir do not break things in a way that
> can't be recovered by naive file content validation.  Beyond that, any
> performance-enhancing and space-saving features are great.
>
> metacopy and redirection would be nice to have, but are not essential as
> the program does not frequently move data files or modify their
> metadata.

That's what I figured.

> If accessing an invalid metacopy behaved like a 0-length
> file, it would be ideal for my use case (since it would be deleted and
> re-created by file validation) but I can understand why this would be
> undesirable for other cases and problematic to implement.  (I'm

I wouldn't say it is "problematic" to implement. It is simple to convert the
EIO to warning (with opt-in option). What would be a challenge to implement
is the behavior, where metadata access is allowed for broken metacopy,
but data access results in EIO.

> experimenting with seccomp to prevent/ignore metadata changes, since the
> program should run on filesystems which do not support them.  An option
> to ignore/reject metadata changes would be handy, but may not be
> justified.)
>
> Does that explain?  Does it seem reasonable?  Is disabling metacopy and
> redirect_dir likely to be sufficient?

Yes, disabling metacopy and redirect_dir sounds like the right thing to do,
because I don't think they gain you too much anyway.

>
> Best,
> Kevin
>
> [1]: Do you know of any overlayfs-aware deduplication programs?  If not,
> I may consider cleaning up and publishing mine at some point.

I know about overlayfs-tools's "merge" command.
I do not know if anyone is using this tool besides perhaps it's author (?).
Incidentally, I recently implemented the "deref" command for overlayfs-tools [2]
which unfolds metacopy and redirect_dir and creates an upper layer without
them. The resulting layer can then be deduped with lower layer using the
"merge" command.

[2] https://github.com/kmxz/overlayfs-tools/pull/11

I also implemented (in the same pull request) awareness of overlayfs-tools
to metacopy and redirect_dir with existing commands. "merge" command
simply aborts when they are encountered, but "vacuum" and "diff" commands
work correctly. I also added the "overlay diff -b" variant, which
creates an output
equivalent to that of the standard diff tool (diffutils) just by
analyzing the layers.

Thanks,
Amir.

  reply	other threads:[~2020-08-14  6:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12 13:55 EIO for removed redirected files? Kevin Locke
2020-08-12 15:21 ` Amir Goldstein
2020-08-12 16:05   ` Kevin Locke
2020-08-12 17:06     ` Amir Goldstein
2020-08-13 17:22       ` Kevin Locke
2020-08-14  6:20         ` Amir Goldstein [this message]
2020-08-17 13:56       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxjA-+EsUz2LuEc2PYQzZkO+aBK4dBMsoSvBgNjaU9Nj_Q@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=kevin@kevinlocke.name \
    --cc=kxzkxz7139@gmail.com \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).