All of lore.kernel.org
 help / color / mirror / Atom feed
* [WIP][RFC] questions about overlay index dir design
@ 2017-05-17 23:08 Amir Goldstein
  2017-05-18  7:11 ` J. R. Okajima
  2017-05-18 14:15 ` Amir Goldstein
  0 siblings, 2 replies; 6+ messages in thread
From: Amir Goldstein @ 2017-05-17 23:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: open list:OVERLAY FILESYSTEM

Miklos,

I'm trying to hold back posting review of WIP, especially during rc2 time,
but I would appreciate intermediate feedback on the design of this WIP,
so you won't need to 'simplify' it too much later...

WIP is at https://github.com/amir73il/linux/commits/ovl-index-dir.
I could post patches, but I rather not spam you now unless asked.

So far it does the job of creating the non-dir upper hardlinks in index
dir and unbreaking hardlinks on copy up (ignoring concurrent copy up
of 2 lower hardlinks for now).

One thing I would appreciate feedback on is my 'simplification' of
ovl_copy_up_one/locked() using actors.
Those functions got very convoluted with the tmpfile copy up method
and now I needed to add another copy up method (indexdir), so that
seemed like the right way to go.
What do you think?

The other thing I wanted to ask and it doesn't require you looking at
any code is how to handle backward compat of mount behavior when
workdir is reused for mount with a different upperdir.

The use case may seem odd, but I already ran into it with unionmount
testsuite ./run --ov=1 and with xfstest overlay/014.
Maybe those are the only examples in the world, but maybe not...

The problem is that if workdir/index contains hardlinks to a past upper
which has been rotated to lower, bad things can happen on copy up.
What I did so far is to store origin fh of upper root in index dir.
Mount forces read-only if index dir origin does not match upper root dir.

So with current WIP the behavior is that ./run --ov=1 fails after
rotating upper:

overlayfs: failed to verify origin dir (ino=19025, ret=-116) - were
layers copied?
overlayfs: failed to create directory /upper/work/index (errno: 17);
mounting read-only
 ./run --rename /mnt/a/no_foo104 /mnt/a/no_foo105
/mnt/a/no_foo104: Unexpected error: Read-only file system

It's easy to fix the testsuite, but it is doing something that was legal before
(provide the same workdir option with different upperdir options) and now
it fails.

The alternative is to blow away the index dir and recreate it when mounting
with an unverified upperdir. This will work fine for the testsuite, but not for
the obscure users out there using the same workdir to mount with few different
upperdir (not at the same time) - do we care?

I guess the best is to blow away index dir unless user explicitly opted-in
to not blow it away, e.g. by using mount option indexdir= in place of workdir=
(mutually excl.).

Thoughts?

Amir.

------- Sneak Preview of index dir layout ---------------------

root@kvm-xfstests:~/unionmount-testsuite# ls -li /upper/0/a/
total 60
18957 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo100
18967 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo101
18977 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo102
18985 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo103
18993 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo104
18998 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo105
19008 -rw-r--r-- 2 daemon daemon 12 May 17 22:27 foo106
19013 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 foo107
18957 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo100
18967 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo101
18977 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo102
18985 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo103
18993 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo104
18998 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo106
19013 -rw-r--r-- 3 daemon daemon 12 May 17 22:27 no_foo107
root@kvm-xfstests:~/unionmount-testsuite# ls -li /upper/work/index/
total 32
18957 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c597a48000000000000
18967 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c598748000000000000
18977 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c599448000000000000
18985 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59a148000000000000
18993 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59ae48000000000000
18998 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59bb48000000000000
19008 -rw-r--r-- 2 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59c848000000000000
19013 -rw-r--r-- 3 daemon daemon 12 May 17 22:27
00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59d548000000000000
root@kvm-xfstests:~/unionmount-testsuite# getfattr -m overlay -d -e
hex /upper/work/index/*
getfattr: Removing leading '/' from absolute path names
# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c597a48000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c597a48000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c598748000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c598748000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c599448000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c599448000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59a148000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59a148000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59ae48000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59ae48000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59bb48000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59bb48000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59c848000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59c848000000000000

# file: upper/work/index/00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59d548000000000000
trusted.overlay.origin=0x00fb210001d42610c5ec7847a48b42cc54df7fb72658ce1c59d548000000000000

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [WIP][RFC] questions about overlay index dir design
  2017-05-17 23:08 [WIP][RFC] questions about overlay index dir design Amir Goldstein
@ 2017-05-18  7:11 ` J. R. Okajima
  2017-05-18  7:47   ` Amir Goldstein
  2017-05-18 14:15 ` Amir Goldstein
  1 sibling, 1 reply; 6+ messages in thread
From: J. R. Okajima @ 2017-05-18  7:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, open list:OVERLAY FILESYSTEM

Amir Goldstein:
> WIP is at https://github.com/amir73il/linux/commits/ovl-index-dir.
> I could post patches, but I rather not spam you now unless asked.
>
> So far it does the job of creating the non-dir upper hardlinks in index
> dir and unbreaking hardlinks on copy up (ignoring concurrent copy up
> of 2 lower hardlinks for now).

The very basic design (as a first step) is close to aufs' "pseudo-link".

(from linux/Documentation/filesystems/aufs/design/0strcut.txt)
----------------------------------------
Pseudo-link
----------------------------------------------------------------------
Assume "fileA" exists on the lower readonly branch only and it is
hardlinked to "fileB" on the branch. When you write something to fileA,
aufs copies-up it to the upper writable branch. Additionally aufs
creates a hardlink under the Pseudo-link Directory of the writable
branch. The inode of a pseudo-link is kept in aufs super_block as a
simple list. If fileB is read after unlinking fileA, aufs returns
filedata from the pseudo-link instead of the lower readonly
branch. Because the pseudo-link is based upon the inode, to keep the
inode number by xino (see above) is essentially necessary.

All the hardlinks under the Pseudo-link Directory of the writable branch
should be restored in a proper location later. Aufs provides a utility
to do this. The userspace helpers executed at remounting and unmounting
aufs by default.
During this utility is running, it puts aufs into the pseudo-link
maintenance mode. In this mode, only the process which began the
maintenance mode (and its child processes) is allowed to operate in
aufs. Some other processes which are not related to the pseudo-link will
be allowed to run too, but the rest have to return an error or wait
until the maintenance mode ends. If a process already acquires an inode
mutex (in VFS), it has to return an error.
----------------------------------------

How do you think about the lifetime of the entries under ovl-index-dir?
Aufs has a user-space utility to restore/reproduce all hardlinks on the
upper writable layer, and it removes the entries under Pseudo-link
Directory.


J. R. Okajima

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [WIP][RFC] questions about overlay index dir design
  2017-05-18  7:11 ` J. R. Okajima
@ 2017-05-18  7:47   ` Amir Goldstein
  0 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2017-05-18  7:47 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: Miklos Szeredi, open list:OVERLAY FILESYSTEM

On Thu, May 18, 2017 at 10:11 AM, J. R. Okajima <hooanon05g@gmail.com> wrote:
> Amir Goldstein:
>> WIP is at https://github.com/amir73il/linux/commits/ovl-index-dir.
>> I could post patches, but I rather not spam you now unless asked.
>>
>> So far it does the job of creating the non-dir upper hardlinks in index
>> dir and unbreaking hardlinks on copy up (ignoring concurrent copy up
>> of 2 lower hardlinks for now).
>
> The very basic design (as a first step) is close to aufs' "pseudo-link".

Thanks for taking the time to review the design!

I am half ashamed that I did not study the prior art, half proud
that we reached a similar design independently. This is a strong
indication for getting something right ;)

>
> (from linux/Documentation/filesystems/aufs/design/0strcut.txt)
> ----------------------------------------
> Pseudo-link
> ----------------------------------------------------------------------
> Assume "fileA" exists on the lower readonly branch only and it is
> hardlinked to "fileB" on the branch. When you write something to fileA,
> aufs copies-up it to the upper writable branch. Additionally aufs
> creates a hardlink under the Pseudo-link Directory of the writable
> branch. The inode of a pseudo-link is kept in aufs super_block as a
> simple list. If fileB is read after unlinking fileA, aufs returns
> filedata from the pseudo-link instead of the lower readonly
> branch. Because the pseudo-link is based upon the inode, to keep the
> inode number by xino (see above) is essentially necessary.
>
> All the hardlinks under the Pseudo-link Directory of the writable branch
> should be restored in a proper location later. Aufs provides a utility
> to do this. The userspace helpers executed at remounting and unmounting
> aufs by default.
> During this utility is running, it puts aufs into the pseudo-link
> maintenance mode. In this mode, only the process which began the
> maintenance mode (and its child processes) is allowed to operate in
> aufs. Some other processes which are not related to the pseudo-link will
> be allowed to run too, but the rest have to return an error or wait
> until the maintenance mode ends. If a process already acquires an inode
> mutex (in VFS), it has to return an error.
> ----------------------------------------
>
> How do you think about the lifetime of the entries under ovl-index-dir?
> Aufs has a user-space utility to restore/reproduce all hardlinks on the
> upper writable layer, and it removes the entries under Pseudo-link
> Directory.
>

Good question.
At first glance, my plan was to keep the indexed entries forever (*).
We need those entries, as well as similar entries for lower directories,
for mapping lower file handle to overlay inode in order to implement
NFS export of overlayfs.

I don't see an immediate need to restore all upper hardlinks.
Instead I plan to find the index entry lazily at overlay dentry lookup
time and store it in the overlay dentry for lower hardlinks that have not
been copied up yet (i.e. as __roupperdentry) as link the upper hardlink
on attempt to copy it up. Something along the lines of this WIP:
https://github.com/amir73il/linux/commits/ovl-rocopyup

(*) The only problem I have with this design is knowing when the index
entry has become 'orphan', meaning that all lower hardlinks have been
whited out, no new upper hardlinks remain and no open file descriptors
to upper inode remain that may still be linked.
It is easy to handle some of the simpler cases (lower nlink=1 or isdir),
but for the lower hardlink case, I guess we will have to keep negative
nlink count in the upper inode, increment it on whiteout of lower hardlinks
and check when negative nlink reaches lower nlink.
This can also help sort out the value of nlink returned by stat(2).

Let me know if the above sounds sane to you.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [WIP][RFC] questions about overlay index dir design
  2017-05-17 23:08 [WIP][RFC] questions about overlay index dir design Amir Goldstein
  2017-05-18  7:11 ` J. R. Okajima
@ 2017-05-18 14:15 ` Amir Goldstein
  2017-05-19 14:16   ` Miklos Szeredi
  1 sibling, 1 reply; 6+ messages in thread
From: Amir Goldstein @ 2017-05-18 14:15 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: open list:OVERLAY FILESYSTEM

On Thu, May 18, 2017 at 2:08 AM, Amir Goldstein <amir73il@gmail.com> wrote:

>
> The other thing I wanted to ask and it doesn't require you looking at
> any code is how to handle backward compat of mount behavior when
> workdir is reused for mount with a different upperdir.
>
> The use case may seem odd, but I already ran into it with unionmount
> testsuite ./run --ov=1 and with xfstest overlay/014.
> Maybe those are the only examples in the world, but maybe not...
>
> The problem is that if workdir/index contains hardlinks to a past upper
> which has been rotated to lower, bad things can happen on copy up.
> What I did so far is to store origin fh of upper root in index dir.
> Mount forces read-only if index dir origin does not match upper root dir.
>
> So with current WIP the behavior is that ./run --ov=1 fails after
> rotating upper:
>
> overlayfs: failed to verify origin dir (ino=19025, ret=-116) - were
> layers copied?
> overlayfs: failed to create directory /upper/work/index (errno: 17);
> mounting read-only
>  ./run --rename /mnt/a/no_foo104 /mnt/a/no_foo105
> /mnt/a/no_foo104: Unexpected error: Read-only file system
>
> It's easy to fix the testsuite, but it is doing something that was legal before
> (provide the same workdir option with different upperdir options) and now
> it fails.
>
> The alternative is to blow away the index dir and recreate it when mounting
> with an unverified upperdir. This will work fine for the testsuite, but not for
> the obscure users out there using the same workdir to mount with few different
> upperdir (not at the same time) - do we care?
>
> I guess the best is to blow away index dir unless user explicitly opted-in
> to not blow it away, e.g. by using mount option indexdir= in place of workdir=
> (mutually excl.).
>

Actually, it may make sense to have -o verify_lower control this behavior.
Specifying -o verify_lower clearly states the intention of the user to mount
overlay layers that were not copied, therefore it makes sense to fail rw mount.

OTHO, with this wider meaning, I no longer like the name 'verify_lower',
for 2 reasons:
1. It is only used to verify directories
2. It is now also used to verify upper root dir

So how about 'verify_dir'/'verify_fh'/'verify_origin'?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [WIP][RFC] questions about overlay index dir design
  2017-05-18 14:15 ` Amir Goldstein
@ 2017-05-19 14:16   ` Miklos Szeredi
  2017-05-23 10:04     ` Amir Goldstein
  0 siblings, 1 reply; 6+ messages in thread
From: Miklos Szeredi @ 2017-05-19 14:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: open list:OVERLAY FILESYSTEM

On Thu, May 18, 2017 at 4:15 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, May 18, 2017 at 2:08 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>>
>> The other thing I wanted to ask and it doesn't require you looking at
>> any code is how to handle backward compat of mount behavior when
>> workdir is reused for mount with a different upperdir.
>>
>> The use case may seem odd, but I already ran into it with unionmount
>> testsuite ./run --ov=1 and with xfstest overlay/014.
>> Maybe those are the only examples in the world, but maybe not...
>>
>> The problem is that if workdir/index contains hardlinks to a past upper
>> which has been rotated to lower, bad things can happen on copy up.
>> What I did so far is to store origin fh of upper root in index dir.
>> Mount forces read-only if index dir origin does not match upper root dir.
>>
>> So with current WIP the behavior is that ./run --ov=1 fails after
>> rotating upper:
>>
>> overlayfs: failed to verify origin dir (ino=19025, ret=-116) - were
>> layers copied?
>> overlayfs: failed to create directory /upper/work/index (errno: 17);
>> mounting read-only
>>  ./run --rename /mnt/a/no_foo104 /mnt/a/no_foo105
>> /mnt/a/no_foo104: Unexpected error: Read-only file system
>>
>> It's easy to fix the testsuite, but it is doing something that was legal before
>> (provide the same workdir option with different upperdir options) and now
>> it fails.
>>
>> The alternative is to blow away the index dir and recreate it when mounting
>> with an unverified upperdir. This will work fine for the testsuite, but not for
>> the obscure users out there using the same workdir to mount with few different
>> upperdir (not at the same time) - do we care?

Actually, the "at the same time" case is interesting as well.  We
should probably error out at mount time in that case, instead we
destroy the first user's work directory.  Not good.

>>
>> I guess the best is to blow away index dir unless user explicitly opted-in
>> to not blow it away, e.g. by using mount option indexdir= in place of workdir=
>> (mutually excl.).
>>
>
> Actually, it may make sense to have -o verify_lower control this behavior.
> Specifying -o verify_lower clearly states the intention of the user to mount
> overlay layers that were not copied, therefore it makes sense to fail rw mount.
>
> OTHO, with this wider meaning, I no longer like the name 'verify_lower',
> for 2 reasons:
> 1. It is only used to verify directories
> 2. It is now also used to verify upper root dir
>
> So how about 'verify_dir'/'verify_fh'/'verify_origin'?

Head spins from all these separate functions (snapshot, NFS export,
hard link unbreaking) and what they need.  There's probably no "one
size fits all" behavior.  Will need to think about this a bit...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [WIP][RFC] questions about overlay index dir design
  2017-05-19 14:16   ` Miklos Szeredi
@ 2017-05-23 10:04     ` Amir Goldstein
  0 siblings, 0 replies; 6+ messages in thread
From: Amir Goldstein @ 2017-05-23 10:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: open list:OVERLAY FILESYSTEM

On Fri, May 19, 2017 at 5:16 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, May 18, 2017 at 4:15 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Thu, May 18, 2017 at 2:08 AM, Amir Goldstein <amir73il@gmail.com> wrote:

>>> The alternative is to blow away the index dir and recreate it when mounting
>>> with an unverified upperdir. This will work fine for the testsuite, but not for
>>> the obscure users out there using the same workdir to mount with few different
>>> upperdir (not at the same time) - do we care?
>
> Actually, the "at the same time" case is interesting as well.  We
> should probably error out at mount time in that case, instead we
> destroy the first user's work directory.  Not good.
>

You are right. Posted a suggested fix for that:
https://github.com/amir73il/linux/commits/ovl-dir-lock

>>>
>>> I guess the best is to blow away index dir unless user explicitly opted-in
>>> to not blow it away, e.g. by using mount option indexdir= in place of workdir=
>>> (mutually excl.).
>>>
>>
>> Actually, it may make sense to have -o verify_lower control this behavior.
>> Specifying -o verify_lower clearly states the intention of the user to mount
>> overlay layers that were not copied, therefore it makes sense to fail rw mount.
>>
>> OTHO, with this wider meaning, I no longer like the name 'verify_lower',
>> for 2 reasons:
>> 1. It is only used to verify directories
>> 2. It is now also used to verify upper root dir
>>
>> So how about 'verify_dir'/'verify_fh'/'verify_origin'?
>
> Head spins from all these separate functions (snapshot, NFS export,
> hard link unbreaking) and what they need.  There's probably no "one
> size fits all" behavior.  Will need to think about this a bit...
>

Right. So in an effort to make the head spin a bit slower,
I implemented the different VERIFY tests internally with a bitmask -
Currently -o verify_lower is a preset that is equivalent to
-o verify_dir=3 (verify lower merge dir on lookup and verify lower
root dir on mount).

The snapshot functionality of following the decoded merge dir file handle
can be enabled with mount option -o verify_dir=7, but is currently not
in the default -o verify_lower preset.

See: https://github.com/amir73il/linux/commits/ovl-verify-dir

Bigger headache?
In the end, it will give you the flexibility to modify the presets before
applying my patches, let them be configurable or blow  away the internal
bitmask altogether.

Thoughts?
Amir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-23 10:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17 23:08 [WIP][RFC] questions about overlay index dir design Amir Goldstein
2017-05-18  7:11 ` J. R. Okajima
2017-05-18  7:47   ` Amir Goldstein
2017-05-18 14:15 ` Amir Goldstein
2017-05-19 14:16   ` Miklos Szeredi
2017-05-23 10:04     ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.