From mboxrd@z Thu Jan  1 00:00:00 1970
From: Amir Goldstein <amir73il@gmail.com>
Subject: Re: [RFC PATCH 0/7] overlayfs: Delayed copy up of data
Date: Tue, 3 Oct 2017 08:36:07 +0300
Message-ID: <CAOQ4uxjQtgARYCkG=OKZCc9+PRZy_NJX6zqUoOTRecgUNOkM2g@mail.gmail.com>
References: <1506951605-31440-1-git-send-email-vgoyal@redhat.com>
 <CAOQ4uxgmvukQPbAWxA9Mfs_9-qBJkOV+oEwjsJBr0bARzyki9g@mail.gmail.com> <20171002194852.GB18630@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Return-path: <linux-unionfs-owner@vger.kernel.org>
Received: from mail-qt0-f194.google.com ([209.85.216.194]:33125 "EHLO
        mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750787AbdJCFgJ (ORCPT
        <rfc822;linux-unionfs@vger.kernel.org>);
        Tue, 3 Oct 2017 01:36:09 -0400
Received: by mail-qt0-f194.google.com with SMTP id x54so1124592qth.0
        for <linux-unionfs@vger.kernel.org>; Mon, 02 Oct 2017 22:36:08 -0700 (PDT)
In-Reply-To: <20171002194852.GB18630@redhat.com>
Sender: linux-unionfs-owner@vger.kernel.org
List-Id: linux-unionfs@vger.kernel.org
To: Vivek Goyal <vgoyal@redhat.com>
Cc: overlayfs <linux-unionfs@vger.kernel.org>, Miklos Szeredi <miklos@szeredi.hu>

On Mon, Oct 2, 2017 at 10:48 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Oct 02, 2017 at 10:03:13PM +0300, Amir Goldstein wrote:
>> On Mon, Oct 2, 2017 at 4:39 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > Hi,
>> >
>> > In one of the recent converstions, people mentioned that chown/chmod
>> > lead to copy up files as well as data. We could optimize it so that
>> > only metadata is copied up during chown/chmod and data is copied up when
>> > file is opened for WRITE.
>> >
>> > This optimization potentially could be useful with containers and user
>> > namespaces. In popular scenario, people end up doing chown() on whole
>> > image directory tree based on container mappings. And this chown copies
>> > up everything, breaking sharing of page cache between containers.
>> >
>> > With these patches, only metadat is copied up during chown() and if file
>> > is opened for READ, d_real() returns lower dentry/inode. That way,
>> > different containers can still continue to use page cache. That's the
>> > use case I have in mind. I have not tested it though.
>> >
>> > So here are very crude RFC patch. I have done bare minimal testing on
>> > these and there are many unaddressed issues I can see.
>> >
>> > Before I go any further, I wanted to send these out for some feedback
>> > and see if I am moving in right direction or this appraoch is completely
>> > broken.
>> >
>>
>> I like the direction this is going :)
>> Beyond the important use case you listed, this could be useful also for:
>> 1. copy up of lower hardlink in ovl_nlink_start(), just to have a place holder
>>     inode for OVL_XATTR_NLINK
>> 2. similar case as above needed for NFS export of lower hardlink
>> 3. possible starting point for consistent ro/rw file descriptor, see POC:
>>     https://github.com/amir73il/linux/commits/ovl-rocopyup
>>     your patches actually take off where my patches stop
>>
>> > Basically, I am relying on storing OVL_XATTR_ORIGIN in upper inode
>> > during copy up. I use that information to get to lower inode later and
>> > do data copy up when necessary.
>>
>> Your feature is relying on OVL_XATTR_ORIGIN, and so does index feature.
>> There are several places in your patches were you wonder what happens
>> in cases there is no index or there is an index.
>> Why not make life simpler and make METACOPY depend on index?
>
> Hi Amir,
>
> I am not sure. Both index and METACOPY rely on OVL_XATTR_ORIGIN.
> But conceptually metacopy does not seem to depend on index feature. We
> could very well have index disabled while still having metacopy enabled.

*conceptually* you may be right, but as I pointed out in on some of the
METACOPY patches, supporting metacopy=on,index=off gets you into
an implementation corner that I think would be best to avoid.

We need to ask ourselves why someone would *want* to have the feature
combination metacopy=on,index=off. It only makes sense if index=on
brings extra constrains that metacopy=on doesn't have, but does it?
index=on requires:
1. upper xattr support
2. lower file handle support
3. upper file handle support
4. no reuse of upperdir with new lowerdir
5. no reuse of workdir with new upperdir

METACOPY requires most of the above, except 3 and 5,
but these extra constrains are so easy to meet, so compared to the
tradeoff of simplifying the implementation corners of metacopy=on,index=off
relying on index=on seems like a win to me.

I can't imagine that feature wise someone would *want* to break hardlinks
with metacopy=on is an argument.

In theory, METACOPY could be made safe to migrate to new filesystem
and then won't require prereq 2 and 4 above if OVL_XATTR_REDIRECT
would be set on regular files as well and then it would make sense to decouple
it from index=on, but I am not sure if this is a direction worth pursuing.

My plan for the future is to have a userland migration tool for layer+index that
sets   OVL_XATTR_REDIRECT from OVL_XATTR_ORIGIN before export to
tarball and restores uptodate  OVL_XATTR_ORIGIN from
OVL_XATTR_REDIRECT after import layers+index from tarball.

>
>> METACOPY is not backward compat, not even readonly backward compat.
>
> Do you mean forward compatible? IIUC, I can take existing overlay
> directories and mount them with newer kernel with metadata copy up
> enabled. Just that metadata copy up will apply to only copy ups which
> happen with new kernels. Files which have already been copied up
> with old kernels will remain unaffected. So this way it is backward
> compatible.
>
> But once a metadata copy up has taken up, I can't go back to old kernel
> and expect it to work. It will show an upper file of zero size. So this
> looks like a forward compatibility issue. Metadata created by newer
> version of software is not expected to be handled by older version of
> software.
>
> Am I misunderstanding the issue?
>

Not misunderstanding just mixing up the terms as they are used in file systems:
Backward compat = new on-disk format understood by old kernels
Forward compat = old on-disk format understood by new kernel

>
>> It may be easy for you to base on my index=all patches:
>> https://github.com/amir73il/linux/commits/ovl-index-all
>> and make the life cycle of copy up go through the following stages:
>> - create metadata copy index
>
> So this is same index which you are creating for hardlink with index=on?
> With my pathces, now when copy up happens, only metadata will be copied
> up.
>

Correct.

>> - copy data to index
>
> As of now, this will happen when file is opened for WRITE.
>
>> - link index to upper
>
> So this step happens after first one. We create index with metadata copy
> up and then index is hard linked to upper (alias).
>
>>
>> AFAICT there is never any reason to actually have an upper alias as
>> a result of metadata copy up.
>
> I am not able to understand this point. So if a file foo.txt is hard
> linked and I do a "chown vivek foo.txt", then I would like to have.
>
> - Index created with metadata copy up only.
> - Create upper alias.
>
> Isn't it?
>
> IOW why not create upper alias with metadata copy up.
>

Because chown/chmod is a property of the inode, not the directory entry,
so if you have
ln foo.txt bar.txt
chown vivek foo.txt

The optimal result is that both foo.txt and bar.txt are now owned by vivek

Specifically, with index=on and hardlinks not broken on copy up
ovl_dentry_upper(dentry) of bar.txt will return the index denrty
with !ovl_has_upper_alias(dentry) and overlay inode as well as
ovl_inode_real() will have the new ownership for all union aliases
(bar.txt and foo.txt) whether they have upper alias or not.

>>
>> >
>> > I also store OVL_XATTR_METACOPY in upper inode to mark that only
>> > metadata has been copied up and data copy up still might be required.
>> >
>>
>> And that is not backward compat so need a new opt-in config option.
>> I don't like it so much that we keep adding config options and complicate
>> the compatibility matrix, that is why I prefer if we bundle several new
>> functionalities into a single new opt-in config option if possible, but
>> Miklos seems to feel differently about this.
>
> Making user opt-in for this feature is fine. It especially makes sense
> because user can't downgrade its kernel now to older version. So those
> who are used to downgrading (because upgraded kernel had issues), will
> complain, saying we did not ask for this optimzation and don't break
> our downgrade.
>
> So a mount option "-o metacopy=on/off" sounds reasonable?
>

Sounds reasonable to me, but I campaign for have mount reject
metacopy=on or fallback to metacopy=off if neither index=on or
CONFIG_OVERLAY_FS_INDEX=y are set.

Amir.