* [PATCH v2 01/11] ovl: store path type in dentry
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 12:59 ` Vivek Goyal
2017-04-24 9:14 ` [PATCH v2 02/11] ovl: cram opaque boolean into type flags Amir Goldstein
` (13 subsequent siblings)
14 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
We would like to add more state info to ovl_entry soon (for const ino)
and this state info would be added as type flags.
Store the type value in ovl_entry and update the UPPER and MERGE type
flags when needed, so ovl_path_type() just returns the stored value.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/namei.c | 1 +
fs/overlayfs/overlayfs.h | 1 +
fs/overlayfs/ovl_entry.h | 3 +++
fs/overlayfs/super.c | 1 +
fs/overlayfs/util.c | 30 ++++++++++++++++++++++++------
5 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index b8b0778..8788fd7 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -338,6 +338,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
kfree(stack);
kfree(d.redirect);
dentry->d_fsdata = oe;
+ ovl_update_type(dentry, d.is_dir);
d_add(dentry, inode);
return NULL;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 741dc0b..e90a548 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -155,6 +155,7 @@ struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
bool ovl_dentry_remote(struct dentry *dentry);
bool ovl_dentry_weird(struct dentry *dentry);
enum ovl_path_type ovl_path_type(struct dentry *dentry);
+enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir);
void ovl_path_upper(struct dentry *dentry, struct path *path);
void ovl_path_lower(struct dentry *dentry, struct path *path);
enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 59614fa..293be5f 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -31,6 +31,8 @@ struct ovl_fs {
wait_queue_head_t copyup_wq;
};
+enum ovl_path_type;
+
/* private information held for every overlayfs dentry */
struct ovl_entry {
struct dentry *__upperdentry;
@@ -44,6 +46,7 @@ struct ovl_entry {
};
struct rcu_head rcu;
};
+ enum ovl_path_type __type;
unsigned numlower;
struct path lowerstack[];
};
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index c072a0c..671bac0 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
kfree(stack);
root_dentry->d_fsdata = oe;
+ ovl_update_type(root_dentry, true);
realinode = d_inode(ovl_dentry_real(root_dentry));
ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 1953986..6a857fb 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
enum ovl_path_type ovl_path_type(struct dentry *dentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
- enum ovl_path_type type = 0;
+ enum ovl_path_type type = oe->__type;
- if (oe->__upperdentry) {
- type = __OVL_PATH_UPPER;
+ /* Matches smp_wmb() in ovl_update_type() */
+ smp_rmb();
+ return type;
+}
+
+enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+ enum ovl_path_type type = oe->__type;
+ /* Update UPPER/MERGE flags and preserve the rest */
+ type &= ~(__OVL_PATH_UPPER | __OVL_PATH_MERGE);
+ if (oe->__upperdentry) {
+ type |= __OVL_PATH_UPPER;
/*
- * Non-dir dentry can hold lower dentry from previous
- * location.
+ * Non-dir dentry can hold lower dentry from before
+ * copy-up.
*/
- if (oe->numlower && d_is_dir(dentry))
+ if (oe->numlower && is_dir)
type |= __OVL_PATH_MERGE;
} else {
if (oe->numlower > 1)
type |= __OVL_PATH_MERGE;
}
+ /*
+ * Make sure type is consistent with __upperdentry before making it
+ * visible to ovl_path_type().
+ */
+ smp_wmb();
+ oe->__type = type;
return type;
}
@@ -220,6 +237,7 @@ void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
*/
smp_wmb();
oe->__upperdentry = upperdentry;
+ ovl_update_type(dentry, d_is_dir(dentry));
}
void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper)
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v2 01/11] ovl: store path type in dentry
2017-04-24 9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
@ 2017-04-24 12:59 ` Vivek Goyal
2017-04-24 13:10 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-24 12:59 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
[..]
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index c072a0c..671bac0 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> kfree(stack);
>
> root_dentry->d_fsdata = oe;
> + ovl_update_type(root_dentry, true);
>
> realinode = d_inode(ovl_dentry_real(root_dentry));
> ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 1953986..6a857fb 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
> enum ovl_path_type ovl_path_type(struct dentry *dentry)
> {
> struct ovl_entry *oe = dentry->d_fsdata;
> - enum ovl_path_type type = 0;
> + enum ovl_path_type type = oe->__type;
>
> - if (oe->__upperdentry) {
> - type = __OVL_PATH_UPPER;
> + /* Matches smp_wmb() in ovl_update_type() */
> + smp_rmb();
> + return type;
Hi Amir,
I never manage to understand barriers so I will ask. Why this barrier is
required and what can go wrong if we don't use this barrier.
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 01/11] ovl: store path type in dentry
2017-04-24 12:59 ` Vivek Goyal
@ 2017-04-24 13:10 ` Amir Goldstein
2017-04-24 13:36 ` Vivek Goyal
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:10 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
>
> [..]
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index c072a0c..671bac0 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > kfree(stack);
> >
> > root_dentry->d_fsdata = oe;
> > + ovl_update_type(root_dentry, true);
> >
> > realinode = d_inode(ovl_dentry_real(root_dentry));
> > ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 1953986..6a857fb 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
> > enum ovl_path_type ovl_path_type(struct dentry *dentry)
> > {
> > struct ovl_entry *oe = dentry->d_fsdata;
> > - enum ovl_path_type type = 0;
> > + enum ovl_path_type type = oe->__type;
> >
> > - if (oe->__upperdentry) {
> > - type = __OVL_PATH_UPPER;
> > + /* Matches smp_wmb() in ovl_update_type() */
> > + smp_rmb();
> > + return type;
>
> Hi Amir,
>
> I never manage to understand barriers so I will ask. Why this barrier is
> required and what can go wrong if we don't use this barrier.
>
Hi Vivek,
Miklos was kind enough to answer that question for me when he made
the comment about missing memmory barrier on v1 of the patch:
http://www.spinics.net/lists/linux-unionfs/msg01687.html
Whether or not I got it right, we shall see shortly ;-)
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 01/11] ovl: store path type in dentry
2017-04-24 13:10 ` Amir Goldstein
@ 2017-04-24 13:36 ` Vivek Goyal
2017-04-24 13:41 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-24 13:36 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 04:10:30PM +0300, Amir Goldstein wrote:
> On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
> >
> > [..]
> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > index c072a0c..671bac0 100644
> > > --- a/fs/overlayfs/super.c
> > > +++ b/fs/overlayfs/super.c
> > > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > kfree(stack);
> > >
> > > root_dentry->d_fsdata = oe;
> > > + ovl_update_type(root_dentry, true);
> > >
> > > realinode = d_inode(ovl_dentry_real(root_dentry));
> > > ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > > index 1953986..6a857fb 100644
> > > --- a/fs/overlayfs/util.c
> > > +++ b/fs/overlayfs/util.c
> > > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
> > > enum ovl_path_type ovl_path_type(struct dentry *dentry)
> > > {
> > > struct ovl_entry *oe = dentry->d_fsdata;
> > > - enum ovl_path_type type = 0;
> > > + enum ovl_path_type type = oe->__type;
> > >
> > > - if (oe->__upperdentry) {
> > > - type = __OVL_PATH_UPPER;
> > > + /* Matches smp_wmb() in ovl_update_type() */
> > > + smp_rmb();
> > > + return type;
> >
> > Hi Amir,
> >
> > I never manage to understand barriers so I will ask. Why this barrier is
> > required and what can go wrong if we don't use this barrier.
> >
>
> Hi Vivek,
>
> Miklos was kind enough to answer that question for me when he made
> the comment about missing memmory barrier on v1 of the patch:
> http://www.spinics.net/lists/linux-unionfs/msg01687.html
>
> Whether or not I got it right, we shall see shortly ;-)
Hi Amir,
Thanks. Ok, so we are making sure if other cpu sees updated ->type, then
it is guaranteed that it isses updated ->upperdentry as well.
I feel it is worth to put a shortened version of explanation from miklos
in the comments. It will help to recall why did we put it. But it is just
me. May be it is obvious to others.
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 01/11] ovl: store path type in dentry
2017-04-24 13:36 ` Vivek Goyal
@ 2017-04-24 13:41 ` Amir Goldstein
0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:41 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 4:36 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 04:10:30PM +0300, Amir Goldstein wrote:
>> On Mon, Apr 24, 2017 at 3:59 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >
>> > On Mon, Apr 24, 2017 at 12:14:06PM +0300, Amir Goldstein wrote:
>> >
>> > [..]
>> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> > > index c072a0c..671bac0 100644
>> > > --- a/fs/overlayfs/super.c
>> > > +++ b/fs/overlayfs/super.c
>> > > @@ -961,6 +961,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>> > > kfree(stack);
>> > >
>> > > root_dentry->d_fsdata = oe;
>> > > + ovl_update_type(root_dentry, true);
>> > >
>> > > realinode = d_inode(ovl_dentry_real(root_dentry));
>> > > ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry);
>> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> > > index 1953986..6a857fb 100644
>> > > --- a/fs/overlayfs/util.c
>> > > +++ b/fs/overlayfs/util.c
>> > > @@ -70,21 +70,38 @@ bool ovl_dentry_weird(struct dentry *dentry)
>> > > enum ovl_path_type ovl_path_type(struct dentry *dentry)
>> > > {
>> > > struct ovl_entry *oe = dentry->d_fsdata;
>> > > - enum ovl_path_type type = 0;
>> > > + enum ovl_path_type type = oe->__type;
>> > >
>> > > - if (oe->__upperdentry) {
>> > > - type = __OVL_PATH_UPPER;
>> > > + /* Matches smp_wmb() in ovl_update_type() */
>> > > + smp_rmb();
>> > > + return type;
>> >
>> > Hi Amir,
>> >
>> > I never manage to understand barriers so I will ask. Why this barrier is
>> > required and what can go wrong if we don't use this barrier.
>> >
>>
>> Hi Vivek,
>>
>> Miklos was kind enough to answer that question for me when he made
>> the comment about missing memmory barrier on v1 of the patch:
>> http://www.spinics.net/lists/linux-unionfs/msg01687.html
>>
>> Whether or not I got it right, we shall see shortly ;-)
>
> Hi Amir,
>
> Thanks. Ok, so we are making sure if other cpu sees updated ->type, then
> it is guaranteed that it isses updated ->upperdentry as well.
>
> I feel it is worth to put a shortened version of explanation from miklos
> in the comments. It will help to recall why did we put it. But it is just
> me. May be it is obvious to others.
>
I recon memory barriers are obvious to few.
However, I think the documentation I left is quite standard practice:
- smp_rmb() has a comment to point to the matching smp_wmb()
- smp_wmb() has a comment to explain what the barrier protects:
* Make sure type is consistent with __upperdentry before making it
* visible to ovl_path_type()
(i.e. to lockless readers of oe->__type)
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH v2 02/11] ovl: cram opaque boolean into type flags
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 03/11] ovl: check if all layers are on the same fs Amir Goldstein
` (12 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
We are going to add more state info to ovl_entry soon (for const ino)
and this state info would be added as type flags.
It makes sense to treat 'opaque' in a similar way, so instead of using
a boolean member in ovl_entry use a type bit to represent opaqueness.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/namei.c | 9 +++++----
fs/overlayfs/overlayfs.h | 2 ++
fs/overlayfs/ovl_entry.h | 1 -
fs/overlayfs/util.c | 5 +++--
4 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 8788fd7..d660177 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -224,7 +224,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
struct dentry *upperdir, *upperdentry = NULL;
unsigned int ctr = 0;
struct inode *inode = NULL;
- bool upperopaque = false;
+ enum ovl_path_type type = 0;
char *upperredirect = NULL;
struct dentry *this;
unsigned int i;
@@ -261,7 +261,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
if (d.redirect[0] == '/')
poe = dentry->d_sb->s_root->d_fsdata;
}
- upperopaque = d.opaque;
+ if (d.opaque)
+ type |= __OVL_PATH_OPAQUE;
}
if (!d.stop && poe->numlower) {
@@ -331,7 +332,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
}
revert_creds(old_cred);
- oe->opaque = upperopaque;
+ oe->__type = type;
oe->redirect = upperredirect;
oe->__upperdentry = upperdentry;
memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr);
@@ -372,7 +373,7 @@ bool ovl_lower_positive(struct dentry *dentry)
* whiteout.
*/
if (!dentry->d_inode)
- return oe->opaque;
+ return OVL_TYPE_OPAQUE(oe->__type);
/* Negative upper -> positive lower */
if (!oe->__upperdentry)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e90a548..9420101 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -12,10 +12,12 @@
enum ovl_path_type {
__OVL_PATH_UPPER = (1 << 0),
__OVL_PATH_MERGE = (1 << 1),
+ __OVL_PATH_OPAQUE = (1 << 2),
};
#define OVL_TYPE_UPPER(type) ((type) & __OVL_PATH_UPPER)
#define OVL_TYPE_MERGE(type) ((type) & __OVL_PATH_MERGE)
+#define OVL_TYPE_OPAQUE(type) ((type) & __OVL_PATH_OPAQUE)
#define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
#define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 293be5f..12c4922 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -41,7 +41,6 @@ struct ovl_entry {
struct {
u64 version;
const char *redirect;
- bool opaque;
bool copying;
};
struct rcu_head rcu;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 6a857fb..dce4141 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -181,7 +181,8 @@ void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache)
bool ovl_dentry_is_opaque(struct dentry *dentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
- return oe->opaque;
+
+ return OVL_TYPE_OPAQUE(oe->__type);
}
bool ovl_dentry_is_whiteout(struct dentry *dentry)
@@ -193,7 +194,7 @@ void ovl_dentry_set_opaque(struct dentry *dentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
- oe->opaque = true;
+ oe->__type |= __OVL_PATH_OPAQUE;
}
bool ovl_redirect_dir(struct super_block *sb)
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 03/11] ovl: check if all layers are on the same fs
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 01/11] ovl: store path type in dentry Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 02/11] ovl: cram opaque boolean into type flags Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
` (11 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
Some features can only work when all lower layers are on the same fs
and some features require that upper layer is also on the same fs.
Test those conditions during mount time, so features can check them later.
Add helper ovl_same_lower_sb() to return the common super block in case
all lower layers are on the same fs and helper ovl_same_sb() to return
the super block common to all layers.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/overlayfs.h | 2 ++
fs/overlayfs/ovl_entry.h | 3 +++
fs/overlayfs/super.c | 9 +++++++++
fs/overlayfs/util.c | 14 ++++++++++++++
4 files changed, 28 insertions(+)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 9420101..48d0dae 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -153,6 +153,8 @@ int ovl_want_write(struct dentry *dentry);
void ovl_drop_write(struct dentry *dentry);
struct dentry *ovl_workdir(struct dentry *dentry);
const struct cred *ovl_override_creds(struct super_block *sb);
+struct super_block *ovl_same_lower_sb(struct super_block *sb);
+struct super_block *ovl_same_sb(struct super_block *sb);
struct ovl_entry *ovl_alloc_entry(unsigned int numlower);
bool ovl_dentry_remote(struct dentry *dentry);
bool ovl_dentry_weird(struct dentry *dentry);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 12c4922..41708bf 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -29,6 +29,9 @@ struct ovl_fs {
const struct cred *creator_cred;
bool tmpfile;
wait_queue_head_t copyup_wq;
+ /* sb common to all (or all lower) layers */
+ struct super_block *same_lower_sb;
+ struct super_block *same_sb;
};
enum ovl_path_type;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 671bac0..b8830ee 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -898,6 +898,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
ufs->lower_mnt = kcalloc(numlower, sizeof(struct vfsmount *), GFP_KERNEL);
if (ufs->lower_mnt == NULL)
goto out_put_workdir;
+
for (i = 0; i < numlower; i++) {
struct vfsmount *mnt = clone_private_mount(&stack[i]);
@@ -914,11 +915,19 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
ufs->lower_mnt[ufs->numlower] = mnt;
ufs->numlower++;
+
+ /* Check if all lower layers are on same sb */
+ if (i == 0)
+ ufs->same_lower_sb = mnt->mnt_sb;
+ else if (ufs->same_lower_sb != mnt->mnt_sb)
+ ufs->same_lower_sb = NULL;
}
/* If the upper fs is nonexistent, we mark overlayfs r/o too */
if (!ufs->upper_mnt)
sb->s_flags |= MS_RDONLY;
+ else if (ufs->upper_mnt->mnt_sb == ufs->same_lower_sb)
+ ufs->same_sb = ufs->same_lower_sb;
if (remote)
sb->s_d_op = &ovl_reval_dentry_operations;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index dce4141..43dcdf5 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -41,6 +41,20 @@ const struct cred *ovl_override_creds(struct super_block *sb)
return override_creds(ofs->creator_cred);
}
+struct super_block *ovl_same_lower_sb(struct super_block *sb)
+{
+ struct ovl_fs *ofs = sb->s_fs_info;
+
+ return ofs->same_lower_sb;
+}
+
+struct super_block *ovl_same_sb(struct super_block *sb)
+{
+ struct ovl_fs *ofs = sb->s_fs_info;
+
+ return ofs->same_sb;
+}
+
struct ovl_entry *ovl_alloc_entry(unsigned int numlower)
{
size_t size = offsetof(struct ovl_entry, lowerstack[numlower]);
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (2 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 03/11] ovl: check if all layers are on the same fs Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 13:32 ` kbuild test robot
` (2 more replies)
2017-04-24 9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
` (10 subsequent siblings)
14 siblings, 3 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
Sometimes it is interesting to know if an upper file is pure
upper or a copy up target, and if it is a copy up target, it
may be interesting to find the copy up origin.
This will be used to preserve lower inode numbers across copy up.
Store the lower inode file handle in upper inode xattr overlay.fh
on copy up to use it later for these cases.
On failure to encode lower file handle, store an invalid 'null'
handle, so we can always use the overlay.fh xattr to distignuish
between a copy up and a pure upper inode.
If lower fs does not support NFS export ops or if not all lower
layers are on the same fs, don't try to encode a lower file handle
and use the 'null' handle instead.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/copy_up.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/overlayfs.h | 15 ++++++++
fs/overlayfs/ovl_entry.h | 2 +
fs/overlayfs/super.c | 11 ++++++
fs/overlayfs/util.c | 14 +++++++
5 files changed, 140 insertions(+)
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 906ea6c..1a967b9 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -20,6 +20,7 @@
#include <linux/namei.h>
#include <linux/fdtable.h>
#include <linux/ratelimit.h>
+#include <linux/exportfs.h>
#include "overlayfs.h"
#include "ovl_entry.h"
@@ -232,6 +233,95 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
return err;
}
+static struct ovl_fh *ovl_get_fh(struct dentry *lower)
+{
+ const struct export_operations *nop = lower->d_sb->s_export_op;
+ struct ovl_fh *fh;
+ int fh_type, fh_len, dwords;
+ void *buf = NULL;
+ void *ret = NULL;
+ int buflen = MAX_HANDLE_SZ;
+ int err;
+
+ /* Do not encode file handle if we cannot decode it later */
+ err = -EOPNOTSUPP;
+ if (!nop || !nop->fh_to_dentry)
+ goto out_err;
+
+ err = -ENOMEM;
+ buf = kmalloc(buflen, GFP_TEMPORARY);
+ if (!buf)
+ goto out_err;
+
+ fh = buf;
+ dwords = (buflen - offsetof(struct ovl_fh, fid)) >> 2;
+ fh_type = exportfs_encode_fh(lower,
+ (struct fid *)fh->fid,
+ &dwords, 0);
+ fh_len = (dwords << 2) + offsetof(struct ovl_fh, fid);
+
+ err = -EOVERFLOW;
+ if (fh_len > buflen || fh_type <= 0 || fh_type == FILEID_INVALID)
+ goto out_err;
+
+ fh->version = OVL_FH_VERSION;
+ fh->magic = OVL_FH_MAGIC;
+ fh->type = fh_type;
+ fh->len = fh_len;
+
+ err = -ENOMEM;
+ ret = kmalloc(fh_len, GFP_KERNEL);
+ if (!ret)
+ goto out_err;
+
+ memcpy(ret, buf, fh_len);
+
+ kfree(buf);
+ return ret;
+
+out_err:
+ pr_warn_ratelimited("overlay: failed to get redirect fh (%i)\n", err);
+ kfree(buf);
+ kfree(ret);
+ return ERR_PTR(err);
+}
+
+static struct ovl_fh null_fh = {
+ .version = OVL_FH_VERSION,
+ .magic = OVL_FH_MAGIC,
+ .type = FILEID_INVALID,
+ .len = sizeof(struct ovl_fh),
+};
+
+static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
+{
+ int err;
+ const struct ovl_fh *fh = NULL;
+
+ if (ovl_redirect_fh(dentry->d_sb))
+ fh = ovl_get_fh(ovl_dentry_lower(dentry));
+ /*
+ * On failure to encode lower fh, store an invalid 'null' fh, so
+ * we can always use the overlay.fh xattr to distignuish between
+ * a copy up and a pure upper inode. If lower fs does not support
+ * encoding fh, don't try to encode again.
+ */
+ err = PTR_ERR(fh);
+ if (IS_ERR_OR_NULL(fh)) {
+ if (err == -EOPNOTSUPP) {
+ pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");
+ ovl_clear_redirect_fh(dentry->d_sb);
+ }
+ fh = &null_fh;
+ }
+
+ err = ovl_do_setxattr(upper, OVL_XATTR_FH, fh, fh->len, 0);
+
+ if (fh != &null_fh)
+ kfree(fh);
+ return err;
+}
+
static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
struct dentry *dentry, struct path *lowerpath,
struct kstat *stat, const char *link,
@@ -316,6 +406,14 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
if (err)
goto out_cleanup;
+ /*
+ * Store file handle of lower inode in upper inode xattr to
+ * allow lookup of the copy up origin inode.
+ */
+ err = ovl_set_lower_fh(dentry, temp);
+ if (err)
+ goto out_cleanup;
+
if (tmpfile)
err = ovl_do_link(temp, udir, upper, true);
else
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 48d0dae..c3cfbc5 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -22,6 +22,7 @@ enum ovl_path_type {
#define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
#define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
#define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect"
+#define OVL_XATTR_FH OVL_XATTR_PREFIX "fh"
#define OVL_ISUPPER_MASK 1UL
@@ -148,6 +149,18 @@ static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper)
return (struct inode *) (x & ~OVL_ISUPPER_MASK);
}
+/* redirect data format for redirect by file handle */
+struct ovl_fh {
+ unsigned char version; /* 0 */
+ unsigned char magic; /* 0xfb */
+ unsigned char len; /* size of this header + size of fid */
+ unsigned char type; /* fid_type of fid */
+ unsigned char fid[0]; /* file identifier */
+} __packed;
+
+#define OVL_FH_VERSION 0
+#define OVL_FH_MAGIC 0xfb
+
/* util.c */
int ovl_want_write(struct dentry *dentry);
void ovl_drop_write(struct dentry *dentry);
@@ -175,6 +188,8 @@ bool ovl_redirect_dir(struct super_block *sb);
void ovl_clear_redirect_dir(struct super_block *sb);
const char *ovl_dentry_get_redirect(struct dentry *dentry);
void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
+bool ovl_redirect_fh(struct super_block *sb);
+void ovl_clear_redirect_fh(struct super_block *sb);
void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
void ovl_inode_init(struct inode *inode, struct inode *realinode,
bool is_upper);
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 41708bf..2172dc5 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -32,6 +32,8 @@ struct ovl_fs {
/* sb common to all (or all lower) layers */
struct super_block *same_lower_sb;
struct super_block *same_sb;
+ /* redirect by file handle */
+ bool redirect_fh;
};
enum ovl_path_type;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b8830ee..34632ec 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -17,6 +17,7 @@
#include <linux/statfs.h>
#include <linux/seq_file.h>
#include <linux/posix_acl_xattr.h>
+#include <linux/exportfs.h>
#include "overlayfs.h"
#include "ovl_entry.h"
@@ -929,6 +930,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
else if (ufs->upper_mnt->mnt_sb == ufs->same_lower_sb)
ufs->same_sb = ufs->same_lower_sb;
+ /*
+ * Redirect by file handle is used to find a lower entry in one of the
+ * lower layers, so the handle must be unique across all lower layers.
+ * Therefore, enable redirect by file handle, only if all lower layers
+ * are on the same sb which supports lookup by file handles.
+ */
+ if (ufs->same_lower_sb && ufs->same_lower_sb->s_export_op &&
+ ufs->same_lower_sb->s_export_op->fh_to_dentry)
+ ufs->redirect_fh = true;
+
if (remote)
sb->s_d_op = &ovl_reval_dentry_operations;
else
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 43dcdf5..b3bc117 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -240,6 +240,20 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect)
oe->redirect = redirect;
}
+bool ovl_redirect_fh(struct super_block *sb)
+{
+ struct ovl_fs *ofs = sb->s_fs_info;
+
+ return ofs->redirect_fh;
+}
+
+void ovl_clear_redirect_fh(struct super_block *sb)
+{
+ struct ovl_fs *ofs = sb->s_fs_info;
+
+ ofs->redirect_fh = false;
+}
+
void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-24 9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
@ 2017-04-24 13:32 ` kbuild test robot
2017-04-24 13:57 ` Amir Goldstein
2017-04-25 14:53 ` Miklos Szeredi
2017-04-26 9:39 ` Miklos Szeredi
2 siblings, 1 reply; 69+ messages in thread
From: kbuild test robot @ 2017-04-24 13:32 UTC (permalink / raw)
To: Amir Goldstein
Cc: kbuild-all, Miklos Szeredi, Vivek Goyal, Al Viro, linux-unionfs,
linux-fsdevel
Hi Amir,
[auto build test WARNING on miklos-vfs/overlayfs-next]
[also build test WARNING on v4.11-rc8 next-20170424]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Amir-Goldstein/overlayfs-constant-inode-numbers/20170424-175555
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
coccinelle warnings: (new ones prefixed by >>)
>> fs/overlayfs/copy_up.c:309:7-14: ERROR: PTR_ERR applied after initialization to constant on line 299
vim +309 fs/overlayfs/copy_up.c
293 .len = sizeof(struct ovl_fh),
294 };
295
296 static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
297 {
298 int err;
> 299 const struct ovl_fh *fh = NULL;
300
301 if (ovl_redirect_fh(dentry->d_sb))
302 fh = ovl_get_fh(ovl_dentry_lower(dentry));
303 /*
304 * On failure to encode lower fh, store an invalid 'null' fh, so
305 * we can always use the overlay.fh xattr to distignuish between
306 * a copy up and a pure upper inode. If lower fs does not support
307 * encoding fh, don't try to encode again.
308 */
> 309 err = PTR_ERR(fh);
310 if (IS_ERR_OR_NULL(fh)) {
311 if (err == -EOPNOTSUPP) {
312 pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-24 13:32 ` kbuild test robot
@ 2017-04-24 13:57 ` Amir Goldstein
0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 13:57 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, Miklos Szeredi, Vivek Goyal, Al Viro, linux-unionfs,
linux-fsdevel
On Mon, Apr 24, 2017 at 4:32 PM, kbuild test robot <lkp@intel.com> wrote:
> Hi Amir,
>
> [auto build test WARNING on miklos-vfs/overlayfs-next]
> [also build test WARNING on v4.11-rc8 next-20170424]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Amir-Goldstein/overlayfs-constant-inode-numbers/20170424-175555
> base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
>
>
> coccinelle warnings: (new ones prefixed by >>)
>
>>> fs/overlayfs/copy_up.c:309:7-14: ERROR: PTR_ERR applied after initialization to constant on line 299
Why is this wrong?
The pointer tested is not const - it's referenced data is.
Anyway, this pointed out another thing worth a warning:
The static variable null_fh was meant to be const.
I wonder if coccinelle has a way to figure out this pattern?
I guess not.
> > 289 static struct ovl_fh null_fh = {
> 290 .version = OVL_FH_VERSION,
> 291 .magic = OVL_FH_MAGIC,
> 292 .type = FILEID_INVALID,
>
> vim +309 fs/overlayfs/copy_up.c
>
> 293 .len = sizeof(struct ovl_fh),
> 294 };
> 295
> 296 static int ovl_set_lower_fh(struct dentry *dentry, struct dentry *upper)
> 297 {
> 298 int err;
> > 299 const struct ovl_fh *fh = NULL;
> 300
> 301 if (ovl_redirect_fh(dentry->d_sb))
> 302 fh = ovl_get_fh(ovl_dentry_lower(dentry));
> 303 /*
> 304 * On failure to encode lower fh, store an invalid 'null' fh, so
> 305 * we can always use the overlay.fh xattr to distignuish between
> 306 * a copy up and a pure upper inode. If lower fs does not support
> 307 * encoding fh, don't try to encode again.
> 308 */
> > 309 err = PTR_ERR(fh);
> 310 if (IS_ERR_OR_NULL(fh)) {
> 311 if (err == -EOPNOTSUPP) {
> 312 pr_warn("overlay: file handle not supported by lower - turning off redirect_fh\n");
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-24 9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
2017-04-24 13:32 ` kbuild test robot
@ 2017-04-25 14:53 ` Miklos Szeredi
2017-04-26 5:47 ` Amir Goldstein
2017-04-26 9:39 ` Miklos Szeredi
2 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-25 14:53 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Sometimes it is interesting to know if an upper file is pure
> upper or a copy up target, and if it is a copy up target, it
> may be interesting to find the copy up origin.
>
> This will be used to preserve lower inode numbers across copy up.
>
> Store the lower inode file handle in upper inode xattr overlay.fh
> on copy up to use it later for these cases.
>
> On failure to encode lower file handle, store an invalid 'null'
> handle, so we can always use the overlay.fh xattr to distignuish
> between a copy up and a pure upper inode.
>
> If lower fs does not support NFS export ops or if not all lower
> layers are on the same fs, don't try to encode a lower file handle
> and use the 'null' handle instead.
Decoding fh on wrong fs is going to result in "interesting"
posibilities, so I think we should be storing some kind of identifier
about the layer from the very start.
The trivial way to do that would be to encode the filesystem's UUID
into the stored fh. Problem seems to be that only ext4 is setting
sb->s_uuid. Probably not too hard to fix the others.
When decoding, trivial to check in the samefs case, but we'd need a
table for the uuid->layer lookup for the non-samefs case. But that can
wait, I'd be content with just having the infrastructure there and
just using it to verify the handle for now.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-25 14:53 ` Miklos Szeredi
@ 2017-04-26 5:47 ` Amir Goldstein
2017-04-26 9:21 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 5:47 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Sometimes it is interesting to know if an upper file is pure
>> upper or a copy up target, and if it is a copy up target, it
>> may be interesting to find the copy up origin.
>>
>> This will be used to preserve lower inode numbers across copy up.
>>
>> Store the lower inode file handle in upper inode xattr overlay.fh
>> on copy up to use it later for these cases.
>>
>> On failure to encode lower file handle, store an invalid 'null'
>> handle, so we can always use the overlay.fh xattr to distignuish
>> between a copy up and a pure upper inode.
>>
>> If lower fs does not support NFS export ops or if not all lower
>> layers are on the same fs, don't try to encode a lower file handle
>> and use the 'null' handle instead.
>
> Decoding fh on wrong fs is going to result in "interesting"
> posibilities, so I think we should be storing some kind of identifier
> about the layer from the very start.
>
> The trivial way to do that would be to encode the filesystem's UUID
> into the stored fh. Problem seems to be that only ext4 is setting
> sb->s_uuid. Probably not too hard to fix the others.
>
xfs supports sb->s_export_op->get_uuid() (and seems to be the only
fs that supports exportfs block ops). It may be more appropriate
for our use case (universal unique file handle) to use this API
and add support for it in other fs.
We can also use the existence of sb->s_export_op->get_uuid
as a promise for a persistent/exportable sb uuid instead of assuming
that sb->s_uuid has such properties.
> When decoding, trivial to check in the samefs case, but we'd need a
> table for the uuid->layer lookup for the non-samefs case. But that can
> wait, I'd be content with just having the infrastructure there and
> just using it to verify the handle for now.
>
Sounds good.
I'll do the same_lower_sb implementation for v3.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-26 5:47 ` Amir Goldstein
@ 2017-04-26 9:21 ` Miklos Szeredi
2017-04-26 9:27 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:21 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 7:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Sometimes it is interesting to know if an upper file is pure
>>> upper or a copy up target, and if it is a copy up target, it
>>> may be interesting to find the copy up origin.
>>>
>>> This will be used to preserve lower inode numbers across copy up.
>>>
>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>> on copy up to use it later for these cases.
>>>
>>> On failure to encode lower file handle, store an invalid 'null'
>>> handle, so we can always use the overlay.fh xattr to distignuish
>>> between a copy up and a pure upper inode.
>>>
>>> If lower fs does not support NFS export ops or if not all lower
>>> layers are on the same fs, don't try to encode a lower file handle
>>> and use the 'null' handle instead.
>>
>> Decoding fh on wrong fs is going to result in "interesting"
>> posibilities, so I think we should be storing some kind of identifier
>> about the layer from the very start.
>>
>> The trivial way to do that would be to encode the filesystem's UUID
>> into the stored fh. Problem seems to be that only ext4 is setting
>> sb->s_uuid. Probably not too hard to fix the others.
>>
>
> xfs supports sb->s_export_op->get_uuid() (and seems to be the only
> fs that supports exportfs block ops). It may be more appropriate
> for our use case (universal unique file handle) to use this API
> and add support for it in other fs.
> We can also use the existence of sb->s_export_op->get_uuid
> as a promise for a persistent/exportable sb uuid instead of assuming
> that sb->s_uuid has such properties.
Right, if ->get_uuid() could be made to work on all exportable fs,
than that would be good.
The "offset" argument worries me a little. And we'd need to get rid
of the printk in the xfs code (or move it to pnfsd, which is where it
belongs).
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-26 9:21 ` Miklos Szeredi
@ 2017-04-26 9:27 ` Amir Goldstein
2017-04-26 9:35 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 9:27 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 12:21 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 7:47 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Tue, Apr 25, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> Sometimes it is interesting to know if an upper file is pure
>>>> upper or a copy up target, and if it is a copy up target, it
>>>> may be interesting to find the copy up origin.
>>>>
>>>> This will be used to preserve lower inode numbers across copy up.
>>>>
>>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>>> on copy up to use it later for these cases.
>>>>
>>>> On failure to encode lower file handle, store an invalid 'null'
>>>> handle, so we can always use the overlay.fh xattr to distignuish
>>>> between a copy up and a pure upper inode.
>>>>
>>>> If lower fs does not support NFS export ops or if not all lower
>>>> layers are on the same fs, don't try to encode a lower file handle
>>>> and use the 'null' handle instead.
>>>
>>> Decoding fh on wrong fs is going to result in "interesting"
>>> posibilities, so I think we should be storing some kind of identifier
>>> about the layer from the very start.
>>>
>>> The trivial way to do that would be to encode the filesystem's UUID
>>> into the stored fh. Problem seems to be that only ext4 is setting
>>> sb->s_uuid. Probably not too hard to fix the others.
>>>
>>
>> xfs supports sb->s_export_op->get_uuid() (and seems to be the only
>> fs that supports exportfs block ops). It may be more appropriate
>> for our use case (universal unique file handle) to use this API
>> and add support for it in other fs.
>> We can also use the existence of sb->s_export_op->get_uuid
>> as a promise for a persistent/exportable sb uuid instead of assuming
>> that sb->s_uuid has such properties.
>
> Right, if ->get_uuid() could be made to work on all exportable fs,
> than that would be good.
>
> The "offset" argument worries me a little. And we'd need to get rid
> of the printk in the xfs code (or move it to pnfsd, which is where it
> belongs).
>
The offset argument is discard-able, it gives you more information
than we need.
Another problem is that ->get_uuid for xfs is compiled out by default
without CONFIG_PNFSD, although this could be changed.
Anyway, I have a very simple patch for xfs to set sb->s_uuid.
btrfs has several uuid's (i.e. subvolumes) on the same sb struct IIUC,
so need to see how to handle this.
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-26 9:27 ` Amir Goldstein
@ 2017-04-26 9:35 ` Miklos Szeredi
0 siblings, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:35 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 11:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
> The offset argument is discard-able, it gives you more information
> than we need.
Sure, the problem with that is what should a filesystem put there
which cannot provide such an offset? Is it optional? What value
indicates invalid offset?
> Another problem is that ->get_uuid for xfs is compiled out by default
> without CONFIG_PNFSD, although this could be changed.
>
> Anyway, I have a very simple patch for xfs to set sb->s_uuid.
> btrfs has several uuid's (i.e. subvolumes) on the same sb struct IIUC,
> so need to see how to handle this.
Yes, actually btrfs wants some sort of lightweight superblock for
subvolumes with just s_dev and s_uuid.
Not sure what we can do about that for now...
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-24 9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
2017-04-24 13:32 ` kbuild test robot
2017-04-25 14:53 ` Miklos Szeredi
@ 2017-04-26 9:39 ` Miklos Szeredi
2017-04-26 9:53 ` Amir Goldstein
2 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:39 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Sometimes it is interesting to know if an upper file is pure
> upper or a copy up target, and if it is a copy up target, it
> may be interesting to find the copy up origin.
>
> This will be used to preserve lower inode numbers across copy up.
>
> Store the lower inode file handle in upper inode xattr overlay.fh
> on copy up to use it later for these cases.
>
> On failure to encode lower file handle, store an invalid 'null'
> handle, so we can always use the overlay.fh xattr to distignuish
> between a copy up and a pure upper inode.
>
> If lower fs does not support NFS export ops or if not all lower
> layers are on the same fs, don't try to encode a lower file handle
> and use the 'null' handle instead.
One other question regarding this: do we want to store the handle of
the next file in the copy up chain or the handle of the original file?
This patch seems to do the "next file" thing. For directories,
obviously that's what we want, but for files...
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-26 9:39 ` Miklos Szeredi
@ 2017-04-26 9:53 ` Amir Goldstein
2017-04-26 9:57 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 9:53 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Sometimes it is interesting to know if an upper file is pure
>> upper or a copy up target, and if it is a copy up target, it
>> may be interesting to find the copy up origin.
>>
>> This will be used to preserve lower inode numbers across copy up.
>>
>> Store the lower inode file handle in upper inode xattr overlay.fh
>> on copy up to use it later for these cases.
>>
>> On failure to encode lower file handle, store an invalid 'null'
>> handle, so we can always use the overlay.fh xattr to distignuish
>> between a copy up and a pure upper inode.
>>
>> If lower fs does not support NFS export ops or if not all lower
>> layers are on the same fs, don't try to encode a lower file handle
>> and use the 'null' handle instead.
>
> One other question regarding this: do we want to store the handle of
> the next file in the copy up chain or the handle of the original file?
>
> This patch seems to do the "next file" thing. For directories,
> obviously that's what we want, but for files...
>
What I found when working on this is that any file below to uppermost
lower is of zero interest to us.
So I defined 'stable inode' and we only need to lookup stable inode:
Stable := uppermost lower (or upper if numlower == 0)
For NFS export, Stable fh is unique enough, because
when rotating upper layer or any change of layer stack configuration,
NFS handles may become stale and this is fine.
inode numbers are guarantied to remain constant and persistent
as long as upper is not rotated.
Rotating upper will change stable inode numbers and this is fine
(regard it as cpio/tar of the filesystem).
Hardlinks will be preserved as long as lower stack configuration
doesn't change.
When upper is rotated the copy up hardlink bunch will be broken
from the non-copy-up hardlink bunch, which is quite a minor
concern IMO (cpio/tar don't always preserve hardlinks).
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 04/11] ovl: store file handle of lower inode on copy up
2017-04-26 9:53 ` Amir Goldstein
@ 2017-04-26 9:57 ` Miklos Szeredi
0 siblings, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:57 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 11:53 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 12:39 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Sometimes it is interesting to know if an upper file is pure
>>> upper or a copy up target, and if it is a copy up target, it
>>> may be interesting to find the copy up origin.
>>>
>>> This will be used to preserve lower inode numbers across copy up.
>>>
>>> Store the lower inode file handle in upper inode xattr overlay.fh
>>> on copy up to use it later for these cases.
>>>
>>> On failure to encode lower file handle, store an invalid 'null'
>>> handle, so we can always use the overlay.fh xattr to distignuish
>>> between a copy up and a pure upper inode.
>>>
>>> If lower fs does not support NFS export ops or if not all lower
>>> layers are on the same fs, don't try to encode a lower file handle
>>> and use the 'null' handle instead.
>>
>> One other question regarding this: do we want to store the handle of
>> the next file in the copy up chain or the handle of the original file?
>>
>> This patch seems to do the "next file" thing. For directories,
>> obviously that's what we want, but for files...
>>
>
> What I found when working on this is that any file below to uppermost
> lower is of zero interest to us.
>
> So I defined 'stable inode' and we only need to lookup stable inode:
> Stable := uppermost lower (or upper if numlower == 0)
>
> For NFS export, Stable fh is unique enough, because
> when rotating upper layer or any change of layer stack configuration,
> NFS handles may become stale and this is fine.
>
> inode numbers are guarantied to remain constant and persistent
> as long as upper is not rotated.
> Rotating upper will change stable inode numbers and this is fine
> (regard it as cpio/tar of the filesystem).
>
> Hardlinks will be preserved as long as lower stack configuration
> doesn't change.
> When upper is rotated the copy up hardlink bunch will be broken
> from the non-copy-up hardlink bunch, which is quite a minor
> concern IMO (cpio/tar don't always preserve hardlinks).
Okay, makes sense.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (3 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 04/11] ovl: store file handle of lower inode on copy up Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-25 8:10 ` Amir Goldstein
2017-04-25 15:13 ` Miklos Szeredi
2017-04-24 9:14 ` [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin Amir Goldstein
` (9 subsequent siblings)
14 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
When overlay.fh xattr is found in a directory inode, instead of lookup
of the dentry in next lower layer by name, first try to get it by calling
exportfs_decode_fh().
On failure to lookup by file handle to lower layer, fall back to lookup
by name with or without path redirect.
For now we only support following by file handle from upper if there is a
single lower layer, because fallback from lookup by file hande to lookup
by path in mid layers is not yet implemented.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/namei.c | 185 +++++++++++++++++++++++++++++++++++++++++++----
fs/overlayfs/overlayfs.h | 1 +
fs/overlayfs/util.c | 14 ++++
3 files changed, 186 insertions(+), 14 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index d660177..0d1cc8f 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -9,9 +9,11 @@
#include <linux/fs.h>
#include <linux/cred.h>
+#include <linux/mount.h>
#include <linux/namei.h>
#include <linux/xattr.h>
#include <linux/ratelimit.h>
+#include <linux/exportfs.h>
#include "overlayfs.h"
#include "ovl_entry.h"
@@ -21,7 +23,10 @@ struct ovl_lookup_data {
bool opaque;
bool stop;
bool last;
- char *redirect;
+ bool by_path; /* redirect by path */
+ bool by_fh; /* redirect by file handle */
+ char *redirect; /* path to follow */
+ struct ovl_fh *fh; /* file handle to follow */
};
static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
@@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
goto err_free;
}
+static int ovl_check_redirect_fh(struct dentry *dentry,
+ struct ovl_lookup_data *d)
+{
+ int res;
+ void *buf = NULL;
+
+ res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
+ if (res < 0) {
+ if (res == -ENODATA || res == -EOPNOTSUPP)
+ return 0;
+ goto fail;
+ }
+ buf = kzalloc(res, GFP_TEMPORARY);
+ if (!buf)
+ return -ENOMEM;
+
+ if (res == 0)
+ goto fail;
+
+ res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
+ if (res < 0 || !ovl_redirect_fh_ok(buf, res))
+ goto fail;
+
+ kfree(d->fh);
+ d->fh = buf;
+
+ return 0;
+
+err_free:
+ kfree(buf);
+ return 0;
+fail:
+ pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
+ goto err_free;
+}
+
static bool ovl_is_opaquedir(struct dentry *dentry)
{
int res;
@@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
return false;
}
+/* Check if p1 is connected with a chain of hashed dentries to p2 */
+static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
+{
+ struct dentry *p;
+
+ for (p = p2; !IS_ROOT(p); p = p->d_parent) {
+ if (d_unhashed(p))
+ return false;
+ if (p->d_parent == p1)
+ return true;
+ }
+ return false;
+}
+
+/* Check if dentry is reachable from mnt via path lookup */
+static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
+{
+ struct vfsmount *mnt = ctx;
+
+ return ovl_is_lookable(mnt->mnt_root, dentry);
+}
+
+static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
+ const struct ovl_fh *fh)
+{
+ int bytes = (fh->len - offsetof(struct ovl_fh, fid));
+
+ /*
+ * When redirect_fh is disabled, 'invalid' file handles are stored
+ * to indicate that this entry has been copied up.
+ */
+ if (!bytes || (int)fh->type == FILEID_INVALID)
+ return ERR_PTR(-ESTALE);
+
+ /*
+ * Several layers can be on the same fs and decoded dentry may be in
+ * either one of those layers. We are looking for a match of dentry
+ * and mnt to find out to which layer the decoded dentry belongs to.
+ */
+ return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
+ bytes >> 2, (int)fh->type,
+ ovl_dentry_under_mnt, mnt);
+}
+
static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
const char *name, unsigned int namelen,
size_t prelen, const char *post,
- struct dentry **ret)
+ struct vfsmount *mnt, struct dentry **ret)
{
struct dentry *this;
int err;
- this = lookup_one_len_unlocked(name, base, namelen);
+ /*
+ * Lookup of upper is with null d->fh.
+ * Lookup of lower is either by_fh with non-null d->fh
+ * or by_path with null d->fh.
+ */
+ if (d->fh)
+ this = ovl_lookup_fh(mnt, d->fh);
+ else
+ this = lookup_one_len_unlocked(name, base, namelen);
if (IS_ERR(this)) {
err = PTR_ERR(this);
this = NULL;
if (err == -ENOENT || err == -ENAMETOOLONG)
goto out;
+ if (d->fh && err == -ESTALE)
+ goto out;
goto out_err;
}
+
+ /* If found by file handle - don't follow that handle again */
+ kfree(d->fh);
+ d->fh = NULL;
+
if (!this->d_inode)
goto put_and_out;
@@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
d->stop = d->opaque = true;
goto out;
}
- err = ovl_check_redirect(this, d, prelen, post);
- if (err)
- goto out_err;
+ if (d->last)
+ goto out;
+ if (d->by_path) {
+ err = ovl_check_redirect(this, d, prelen, post);
+ if (err)
+ goto out_err;
+ }
+ if (d->by_fh) {
+ err = ovl_check_redirect_fh(this, d);
+ if (err)
+ goto out_err;
+ }
out:
*ret = this;
return 0;
@@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
return err;
}
+static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
+ struct dentry **ret)
+{
+ return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
+}
+
static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
struct dentry **ret)
{
@@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
if (d->name.name[0] != '/')
return ovl_lookup_single(base, d, d->name.name, d->name.len,
- 0, "", ret);
+ 0, "", NULL, ret);
while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
const char *s = d->name.name + d->name.len - rem;
@@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
return -EIO;
err = ovl_lookup_single(base, d, s, thislen,
- d->name.len - rem, next, &base);
+ d->name.len - rem, next, NULL, &base);
dput(dentry);
if (err)
return err;
@@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
const struct cred *old_cred;
struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
struct ovl_entry *poe = dentry->d_parent->d_fsdata;
+ struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
struct path *stack = NULL;
struct dentry *upperdir, *upperdentry = NULL;
unsigned int ctr = 0;
@@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
.opaque = false,
.stop = false,
.last = !poe->numlower,
+ .by_path = true,
.redirect = NULL,
+ .by_fh = true,
+ .fh = NULL,
};
if (dentry->d_name.len > ofs->namelen)
@@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
if (!upperredirect)
goto out_put_upper;
if (d.redirect[0] == '/')
- poe = dentry->d_sb->s_root->d_fsdata;
+ poe = roe;
}
if (d.opaque)
type |= __OVL_PATH_OPAQUE;
}
- if (!d.stop && poe->numlower) {
+ /*
+ * For now we only support lower by fh in single layer, because
+ * fallback from lookup by fh to lookup by path in mid layers for
+ * merge directory is not yet implemented.
+ */
+ if (!ofs->redirect_fh || ofs->numlower > 1) {
+ kfree(d.fh);
+ d.fh = NULL;
+ }
+
+ if (!d.stop && (poe->numlower || d.fh)) {
err = -ENOMEM;
stack = kcalloc(ofs->numlower, sizeof(struct path),
GFP_TEMPORARY);
@@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
goto out_put_upper;
}
+ /* Try to lookup lower layers by file handle */
+ d.by_path = false;
+ for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
+ struct path lowerpath = poe->lowerstack[i];
+
+ d.last = i == poe->numlower - 1;
+ err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
+ if (err)
+ goto out_put;
+
+ if (!this)
+ continue;
+
+ stack[ctr].dentry = this;
+ stack[ctr].mnt = lowerpath.mnt;
+ ctr++;
+ /*
+ * Found by fh - won't lookup by path.
+ * TODO: set d.redirect to dentry_path(this),
+ * so lookup can continue by path.
+ */
+ d.stop = true;
+ }
+
+ /* Fallback to lookup lower layers by path */
+ d.by_path = true;
+ d.by_fh = false;
+ kfree(d.fh);
+ d.fh = NULL;
for (i = 0; !d.stop && i < poe->numlower; i++) {
struct path lowerpath = poe->lowerstack[i];
@@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
if (d.stop)
break;
- if (d.redirect &&
- d.redirect[0] == '/' &&
- poe != dentry->d_sb->s_root->d_fsdata) {
- poe = dentry->d_sb->s_root->d_fsdata;
+ if (d.redirect && d.redirect[0] == '/' && poe != roe) {
+ poe = roe;
/* Find the current layer on the root dentry */
for (i = 0; i < poe->numlower; i++)
@@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
dput(upperdentry);
kfree(upperredirect);
out:
+ kfree(d.fh);
kfree(d.redirect);
revert_creds(old_cred);
return ERR_PTR(err);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c3cfbc5..08002ce 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
bool ovl_redirect_fh(struct super_block *sb);
void ovl_clear_redirect_fh(struct super_block *sb);
+bool ovl_redirect_fh_ok(const char *redirect, size_t size);
void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
void ovl_inode_init(struct inode *inode, struct inode *realinode,
bool is_upper);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index b3bc117..dba9753 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
ofs->redirect_fh = false;
}
+bool ovl_redirect_fh_ok(const char *redirect, size_t size)
+{
+ struct ovl_fh *fh = (void *)redirect;
+
+ if (size < sizeof(struct ovl_fh) || size < fh->len)
+ return false;
+
+ if (fh->version > OVL_FH_VERSION ||
+ fh->magic != OVL_FH_MAGIC)
+ return false;
+
+ return true;
+}
+
void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-24 9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
@ 2017-04-25 8:10 ` Amir Goldstein
2017-04-25 15:13 ` Miklos Szeredi
1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 8:10 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 12:14 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> When overlay.fh xattr is found in a directory inode, instead of lookup
> of the dentry in next lower layer by name, first try to get it by calling
> exportfs_decode_fh().
>
> On failure to lookup by file handle to lower layer, fall back to lookup
> by name with or without path redirect.
>
> For now we only support following by file handle from upper if there is a
> single lower layer, because fallback from lookup by file hande to lookup
> by path in mid layers is not yet implemented.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/overlayfs/namei.c | 185 +++++++++++++++++++++++++++++++++++++++++++----
> fs/overlayfs/overlayfs.h | 1 +
> fs/overlayfs/util.c | 14 ++++
> 3 files changed, 186 insertions(+), 14 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index d660177..0d1cc8f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -9,9 +9,11 @@
>
> #include <linux/fs.h>
> #include <linux/cred.h>
> +#include <linux/mount.h>
> #include <linux/namei.h>
> #include <linux/xattr.h>
> #include <linux/ratelimit.h>
> +#include <linux/exportfs.h>
> #include "overlayfs.h"
> #include "ovl_entry.h"
>
> @@ -21,7 +23,10 @@ struct ovl_lookup_data {
> bool opaque;
> bool stop;
> bool last;
> - char *redirect;
> + bool by_path; /* redirect by path */
> + bool by_fh; /* redirect by file handle */
> + char *redirect; /* path to follow */
> + struct ovl_fh *fh; /* file handle to follow */
> };
>
> static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> @@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> goto err_free;
> }
>
> +static int ovl_check_redirect_fh(struct dentry *dentry,
> + struct ovl_lookup_data *d)
> +{
> + int res;
> + void *buf = NULL;
> +
> + res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
> + if (res < 0) {
> + if (res == -ENODATA || res == -EOPNOTSUPP)
> + return 0;
> + goto fail;
> + }
> + buf = kzalloc(res, GFP_TEMPORARY);
> + if (!buf)
> + return -ENOMEM;
> +
> + if (res == 0)
> + goto fail;
> +
> + res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
> + if (res < 0 || !ovl_redirect_fh_ok(buf, res))
> + goto fail;
> +
> + kfree(d->fh);
> + d->fh = buf;
> +
> + return 0;
> +
> +err_free:
> + kfree(buf);
> + return 0;
> +fail:
> + pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
> + goto err_free;
> +}
> +
> static bool ovl_is_opaquedir(struct dentry *dentry)
> {
> int res;
> @@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
> return false;
> }
>
> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
> +{
> + struct dentry *p;
> +
> + for (p = p2; !IS_ROOT(p); p = p->d_parent) {
> + if (d_unhashed(p))
> + return false;
> + if (p->d_parent == p1)
> + return true;
> + }
> + return false;
> +}
> +
> +/* Check if dentry is reachable from mnt via path lookup */
> +static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
> +{
> + struct vfsmount *mnt = ctx;
> +
> + return ovl_is_lookable(mnt->mnt_root, dentry);
> +}
> +
> +static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
> + const struct ovl_fh *fh)
> +{
> + int bytes = (fh->len - offsetof(struct ovl_fh, fid));
> +
> + /*
> + * When redirect_fh is disabled, 'invalid' file handles are stored
> + * to indicate that this entry has been copied up.
> + */
> + if (!bytes || (int)fh->type == FILEID_INVALID)
> + return ERR_PTR(-ESTALE);
> +
> + /*
> + * Several layers can be on the same fs and decoded dentry may be in
> + * either one of those layers. We are looking for a match of dentry
> + * and mnt to find out to which layer the decoded dentry belongs to.
> + */
> + return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> + bytes >> 2, (int)fh->type,
> + ovl_dentry_under_mnt, mnt);
> +}
> +
> static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> const char *name, unsigned int namelen,
> size_t prelen, const char *post,
> - struct dentry **ret)
> + struct vfsmount *mnt, struct dentry **ret)
> {
> struct dentry *this;
> int err;
>
> - this = lookup_one_len_unlocked(name, base, namelen);
> + /*
> + * Lookup of upper is with null d->fh.
> + * Lookup of lower is either by_fh with non-null d->fh
> + * or by_path with null d->fh.
> + */
> + if (d->fh)
> + this = ovl_lookup_fh(mnt, d->fh);
> + else
> + this = lookup_one_len_unlocked(name, base, namelen);
> if (IS_ERR(this)) {
> err = PTR_ERR(this);
> this = NULL;
> if (err == -ENOENT || err == -ENAMETOOLONG)
> goto out;
> + if (d->fh && err == -ESTALE)
> + goto out;
> goto out_err;
> }
> +
> + /* If found by file handle - don't follow that handle again */
> + kfree(d->fh);
> + d->fh = NULL;
> +
> if (!this->d_inode)
> goto put_and_out;
>
> @@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> d->stop = d->opaque = true;
> goto out;
> }
> - err = ovl_check_redirect(this, d, prelen, post);
> - if (err)
> - goto out_err;
> + if (d->last)
> + goto out;
> + if (d->by_path) {
> + err = ovl_check_redirect(this, d, prelen, post);
> + if (err)
> + goto out_err;
> + }
> + if (d->by_fh) {
> + err = ovl_check_redirect_fh(this, d);
> + if (err)
> + goto out_err;
> + }
> out:
> *ret = this;
> return 0;
> @@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> return err;
> }
>
> +static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
> + struct dentry **ret)
> +{
> + return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
> +}
> +
> static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
> struct dentry **ret)
> {
> @@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>
> if (d->name.name[0] != '/')
> return ovl_lookup_single(base, d, d->name.name, d->name.len,
> - 0, "", ret);
> + 0, "", NULL, ret);
>
> while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
> const char *s = d->name.name + d->name.len - rem;
> @@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
> return -EIO;
>
> err = ovl_lookup_single(base, d, s, thislen,
> - d->name.len - rem, next, &base);
> + d->name.len - rem, next, NULL, &base);
> dput(dentry);
> if (err)
> return err;
> @@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> const struct cred *old_cred;
> struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> struct ovl_entry *poe = dentry->d_parent->d_fsdata;
> + struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
> struct path *stack = NULL;
> struct dentry *upperdir, *upperdentry = NULL;
> unsigned int ctr = 0;
> @@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> .opaque = false,
> .stop = false,
> .last = !poe->numlower,
> + .by_path = true,
> .redirect = NULL,
> + .by_fh = true,
> + .fh = NULL,
> };
>
> if (dentry->d_name.len > ofs->namelen)
> @@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> if (!upperredirect)
> goto out_put_upper;
> if (d.redirect[0] == '/')
> - poe = dentry->d_sb->s_root->d_fsdata;
> + poe = roe;
> }
> if (d.opaque)
> type |= __OVL_PATH_OPAQUE;
> }
>
> - if (!d.stop && poe->numlower) {
> + /*
> + * For now we only support lower by fh in single layer, because
> + * fallback from lookup by fh to lookup by path in mid layers for
> + * merge directory is not yet implemented.
> + */
> + if (!ofs->redirect_fh || ofs->numlower > 1) {
> + kfree(d.fh);
> + d.fh = NULL;
> + }
> +
> + if (!d.stop && (poe->numlower || d.fh)) {
> err = -ENOMEM;
> stack = kcalloc(ofs->numlower, sizeof(struct path),
> GFP_TEMPORARY);
> @@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> goto out_put_upper;
> }
>
> + /* Try to lookup lower layers by file handle */
> + d.by_path = false;
> + for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
> + struct path lowerpath = poe->lowerstack[i];
> +
> + d.last = i == poe->numlower - 1;
copy&paste bug: should be s/poe/roe in 2 lines above.
it matters especially when lower files are moved into an opaque dir
I am improving xfstest overlay/017 to cover this case.
> + err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
> + if (err)
> + goto out_put;
> +
> + if (!this)
> + continue;
> +
> + stack[ctr].dentry = this;
> + stack[ctr].mnt = lowerpath.mnt;
> + ctr++;
> + /*
> + * Found by fh - won't lookup by path.
> + * TODO: set d.redirect to dentry_path(this),
> + * so lookup can continue by path.
> + */
> + d.stop = true;
> + }
> +
> + /* Fallback to lookup lower layers by path */
> + d.by_path = true;
> + d.by_fh = false;
> + kfree(d.fh);
> + d.fh = NULL;
> for (i = 0; !d.stop && i < poe->numlower; i++) {
> struct path lowerpath = poe->lowerstack[i];
>
> @@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> if (d.stop)
> break;
>
> - if (d.redirect &&
> - d.redirect[0] == '/' &&
> - poe != dentry->d_sb->s_root->d_fsdata) {
> - poe = dentry->d_sb->s_root->d_fsdata;
> + if (d.redirect && d.redirect[0] == '/' && poe != roe) {
> + poe = roe;
>
> /* Find the current layer on the root dentry */
> for (i = 0; i < poe->numlower; i++)
> @@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> dput(upperdentry);
> kfree(upperredirect);
> out:
> + kfree(d.fh);
> kfree(d.redirect);
> revert_creds(old_cred);
> return ERR_PTR(err);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c3cfbc5..08002ce 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
> void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
> bool ovl_redirect_fh(struct super_block *sb);
> void ovl_clear_redirect_fh(struct super_block *sb);
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size);
> void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
> void ovl_inode_init(struct inode *inode, struct inode *realinode,
> bool is_upper);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index b3bc117..dba9753 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
> ofs->redirect_fh = false;
> }
>
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size)
> +{
> + struct ovl_fh *fh = (void *)redirect;
> +
> + if (size < sizeof(struct ovl_fh) || size < fh->len)
> + return false;
> +
> + if (fh->version > OVL_FH_VERSION ||
> + fh->magic != OVL_FH_MAGIC)
> + return false;
> +
> + return true;
> +}
> +
> void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
> {
> struct ovl_entry *oe = dentry->d_fsdata;
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-24 9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
2017-04-25 8:10 ` Amir Goldstein
@ 2017-04-25 15:13 ` Miklos Szeredi
2017-04-25 17:41 ` Amir Goldstein
1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-25 15:13 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> When overlay.fh xattr is found in a directory inode, instead of lookup
> of the dentry in next lower layer by name, first try to get it by calling
> exportfs_decode_fh().
>
> On failure to lookup by file handle to lower layer, fall back to lookup
> by name with or without path redirect.
>
> For now we only support following by file handle from upper if there is a
> single lower layer, because fallback from lookup by file hande to lookup
> by path in mid layers is not yet implemented.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/overlayfs/namei.c | 185 +++++++++++++++++++++++++++++++++++++++++++----
> fs/overlayfs/overlayfs.h | 1 +
> fs/overlayfs/util.c | 14 ++++
> 3 files changed, 186 insertions(+), 14 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index d660177..0d1cc8f 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -9,9 +9,11 @@
>
> #include <linux/fs.h>
> #include <linux/cred.h>
> +#include <linux/mount.h>
> #include <linux/namei.h>
> #include <linux/xattr.h>
> #include <linux/ratelimit.h>
> +#include <linux/exportfs.h>
> #include "overlayfs.h"
> #include "ovl_entry.h"
>
> @@ -21,7 +23,10 @@ struct ovl_lookup_data {
> bool opaque;
> bool stop;
> bool last;
> - char *redirect;
> + bool by_path; /* redirect by path */
> + bool by_fh; /* redirect by file handle */
> + char *redirect; /* path to follow */
> + struct ovl_fh *fh; /* file handle to follow */
> };
>
> static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> @@ -81,6 +86,42 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
> goto err_free;
> }
>
> +static int ovl_check_redirect_fh(struct dentry *dentry,
> + struct ovl_lookup_data *d)
> +{
> + int res;
> + void *buf = NULL;
> +
> + res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
> + if (res < 0) {
> + if (res == -ENODATA || res == -EOPNOTSUPP)
> + return 0;
> + goto fail;
> + }
> + buf = kzalloc(res, GFP_TEMPORARY);
> + if (!buf)
> + return -ENOMEM;
> +
> + if (res == 0)
> + goto fail;
> +
> + res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
> + if (res < 0 || !ovl_redirect_fh_ok(buf, res))
> + goto fail;
> +
> + kfree(d->fh);
> + d->fh = buf;
> +
> + return 0;
> +
> +err_free:
> + kfree(buf);
> + return 0;
> +fail:
> + pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
> + goto err_free;
> +}
> +
> static bool ovl_is_opaquedir(struct dentry *dentry)
> {
> int res;
> @@ -96,22 +137,81 @@ static bool ovl_is_opaquedir(struct dentry *dentry)
> return false;
> }
>
> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
> +{
> + struct dentry *p;
> +
> + for (p = p2; !IS_ROOT(p); p = p->d_parent) {
> + if (d_unhashed(p))
> + return false;
> + if (p->d_parent == p1)
> + return true;
> + }
> + return false;
> +}
Walking the dentry tree without RCU protection is dangerous and broken.
I'm also wondering if there's a better way to find the layer (e.g.
store the layer index in the handle as well).
> +
> +/* Check if dentry is reachable from mnt via path lookup */
> +static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
> +{
> + struct vfsmount *mnt = ctx;
> +
> + return ovl_is_lookable(mnt->mnt_root, dentry);
> +}
> +
> +static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
> + const struct ovl_fh *fh)
> +{
> + int bytes = (fh->len - offsetof(struct ovl_fh, fid));
> +
> + /*
> + * When redirect_fh is disabled, 'invalid' file handles are stored
> + * to indicate that this entry has been copied up.
> + */
> + if (!bytes || (int)fh->type == FILEID_INVALID)
> + return ERR_PTR(-ESTALE);
> +
> + /*
> + * Several layers can be on the same fs and decoded dentry may be in
> + * either one of those layers. We are looking for a match of dentry
> + * and mnt to find out to which layer the decoded dentry belongs to.
> + */
> + return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> + bytes >> 2, (int)fh->type,
> + ovl_dentry_under_mnt, mnt);
> +}
> +
> static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> const char *name, unsigned int namelen,
> size_t prelen, const char *post,
> - struct dentry **ret)
> + struct vfsmount *mnt, struct dentry **ret)
I think it would be better to split this function into path and fh
variants and extract the common parts into helper(s).
> {
> struct dentry *this;
> int err;
>
> - this = lookup_one_len_unlocked(name, base, namelen);
> + /*
> + * Lookup of upper is with null d->fh.
> + * Lookup of lower is either by_fh with non-null d->fh
> + * or by_path with null d->fh.
> + */
> + if (d->fh)
> + this = ovl_lookup_fh(mnt, d->fh);
> + else
> + this = lookup_one_len_unlocked(name, base, namelen);
> if (IS_ERR(this)) {
> err = PTR_ERR(this);
> this = NULL;
> if (err == -ENOENT || err == -ENAMETOOLONG)
> goto out;
> + if (d->fh && err == -ESTALE)
> + goto out;
> goto out_err;
> }
> +
> + /* If found by file handle - don't follow that handle again */
> + kfree(d->fh);
> + d->fh = NULL;
> +
> if (!this->d_inode)
> goto put_and_out;
>
> @@ -135,9 +235,18 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> d->stop = d->opaque = true;
> goto out;
> }
> - err = ovl_check_redirect(this, d, prelen, post);
> - if (err)
> - goto out_err;
> + if (d->last)
> + goto out;
> + if (d->by_path) {
> + err = ovl_check_redirect(this, d, prelen, post);
> + if (err)
> + goto out_err;
> + }
> + if (d->by_fh) {
> + err = ovl_check_redirect_fh(this, d);
> + if (err)
> + goto out_err;
> + }
> out:
> *ret = this;
> return 0;
> @@ -152,6 +261,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
> return err;
> }
>
> +static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
> + struct dentry **ret)
> +{
> + return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
> +}
> +
> static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
> struct dentry **ret)
> {
> @@ -162,7 +277,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
>
> if (d->name.name[0] != '/')
> return ovl_lookup_single(base, d, d->name.name, d->name.len,
> - 0, "", ret);
> + 0, "", NULL, ret);
>
> while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
> const char *s = d->name.name + d->name.len - rem;
> @@ -175,7 +290,7 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
> return -EIO;
>
> err = ovl_lookup_single(base, d, s, thislen,
> - d->name.len - rem, next, &base);
> + d->name.len - rem, next, NULL, &base);
> dput(dentry);
> if (err)
> return err;
> @@ -220,6 +335,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> const struct cred *old_cred;
> struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> struct ovl_entry *poe = dentry->d_parent->d_fsdata;
> + struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
> struct path *stack = NULL;
> struct dentry *upperdir, *upperdentry = NULL;
> unsigned int ctr = 0;
> @@ -235,7 +351,10 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> .opaque = false,
> .stop = false,
> .last = !poe->numlower,
> + .by_path = true,
> .redirect = NULL,
> + .by_fh = true,
> + .fh = NULL,
> };
>
> if (dentry->d_name.len > ofs->namelen)
> @@ -259,13 +378,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> if (!upperredirect)
> goto out_put_upper;
> if (d.redirect[0] == '/')
> - poe = dentry->d_sb->s_root->d_fsdata;
> + poe = roe;
> }
> if (d.opaque)
> type |= __OVL_PATH_OPAQUE;
> }
>
> - if (!d.stop && poe->numlower) {
> + /*
> + * For now we only support lower by fh in single layer, because
> + * fallback from lookup by fh to lookup by path in mid layers for
> + * merge directory is not yet implemented.
> + */
> + if (!ofs->redirect_fh || ofs->numlower > 1) {
> + kfree(d.fh);
> + d.fh = NULL;
> + }
> +
> + if (!d.stop && (poe->numlower || d.fh)) {
> err = -ENOMEM;
> stack = kcalloc(ofs->numlower, sizeof(struct path),
> GFP_TEMPORARY);
> @@ -273,6 +402,35 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> goto out_put_upper;
> }
>
> + /* Try to lookup lower layers by file handle */
> + d.by_path = false;
> + for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
> + struct path lowerpath = poe->lowerstack[i];
> +
> + d.last = i == poe->numlower - 1;
> + err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
> + if (err)
> + goto out_put;
> +
> + if (!this)
> + continue;
> +
> + stack[ctr].dentry = this;
> + stack[ctr].mnt = lowerpath.mnt;
> + ctr++;
> + /*
> + * Found by fh - won't lookup by path.
> + * TODO: set d.redirect to dentry_path(this),
> + * so lookup can continue by path.
> + */
> + d.stop = true;
> + }
> +
> + /* Fallback to lookup lower layers by path */
> + d.by_path = true;
> + d.by_fh = false;
> + kfree(d.fh);
> + d.fh = NULL;
> for (i = 0; !d.stop && i < poe->numlower; i++) {
> struct path lowerpath = poe->lowerstack[i];
>
> @@ -291,10 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> if (d.stop)
> break;
>
> - if (d.redirect &&
> - d.redirect[0] == '/' &&
> - poe != dentry->d_sb->s_root->d_fsdata) {
> - poe = dentry->d_sb->s_root->d_fsdata;
> + if (d.redirect && d.redirect[0] == '/' && poe != roe) {
> + poe = roe;
>
> /* Find the current layer on the root dentry */
> for (i = 0; i < poe->numlower; i++)
> @@ -354,6 +510,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> dput(upperdentry);
> kfree(upperredirect);
> out:
> + kfree(d.fh);
> kfree(d.redirect);
> revert_creds(old_cred);
> return ERR_PTR(err);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c3cfbc5..08002ce 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -190,6 +190,7 @@ const char *ovl_dentry_get_redirect(struct dentry *dentry);
> void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
> bool ovl_redirect_fh(struct super_block *sb);
> void ovl_clear_redirect_fh(struct super_block *sb);
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size);
> void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
> void ovl_inode_init(struct inode *inode, struct inode *realinode,
> bool is_upper);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index b3bc117..dba9753 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -254,6 +254,20 @@ void ovl_clear_redirect_fh(struct super_block *sb)
> ofs->redirect_fh = false;
> }
>
> +bool ovl_redirect_fh_ok(const char *redirect, size_t size)
> +{
> + struct ovl_fh *fh = (void *)redirect;
> +
> + if (size < sizeof(struct ovl_fh) || size < fh->len)
> + return false;
> +
> + if (fh->version > OVL_FH_VERSION ||
> + fh->magic != OVL_FH_MAGIC)
> + return false;
> +
> + return true;
> +}
> +
> void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
> {
> struct ovl_entry *oe = dentry->d_fsdata;
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-25 15:13 ` Miklos Szeredi
@ 2017-04-25 17:41 ` Amir Goldstein
2017-04-25 19:11 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 17:41 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 6:13 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> When overlay.fh xattr is found in a directory inode, instead of lookup
>> of the dentry in next lower layer by name, first try to get it by calling
>> exportfs_decode_fh().
>>
>> On failure to lookup by file handle to lower layer, fall back to lookup
>> by name with or without path redirect.
>>
>> For now we only support following by file handle from upper if there is a
>> single lower layer, because fallback from lookup by file hande to lookup
>> by path in mid layers is not yet implemented.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>> fs/overlayfs/namei.c | 185 +++++++++++++++++++++++++++++++++++++++++++----
>> fs/overlayfs/overlayfs.h | 1 +
>> fs/overlayfs/util.c | 14 ++++
>> 3 files changed, 186 insertions(+), 14 deletions(-)
>>
[...]
>>
>> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
>> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
>> +{
>> + struct dentry *p;
>> +
>> + for (p = p2; !IS_ROOT(p); p = p->d_parent) {
>> + if (d_unhashed(p))
>> + return false;
>> + if (p->d_parent == p1)
>> + return true;
>> + }
>> + return false;
>> +}
>
> Walking the dentry tree without RCU protection is dangerous and broken.
>
I wonder if is_subdir() would be correct here?
Or I could just follow its lead to implement the parent walk correctly.
I did want to verify that the found dentry is not only 'connected' to
root, but also 'lookable', because I don't want to find a deleted file
when looking in lower layers.
Maybe that was too much and in any case, I could just verify that
the decoded dentry itself is hashed.
> I'm also wondering if there's a better way to find the layer
The purpose of this test is not only to find the layer, but
also to verify that the found inode is linked under the layer root.
I think that decode_fh() will always be able to create a disconnected
dentry if decoding an inode that is on the same sb as the layer where
fh was encoded. I'm pretty sure this was what I found in my initial
tests which made me write the broken ovl_is_lookable().
> (e.g. store the layer index in the handle as well).
>
But the layer index is a volatile number that can change.
I would like to be able to find by fh also when more layers are added
to the stack.
The only thing I can think of is to store sb_uuid+layer_root_fh+lower_fh.
At mount, we build a hash of the lower sb_uuid (save same_lower_uuid
for now).
At lookup, we first find lower_sb by uuid (verify same_lower_uuid for now),
then decode lower_root by root_fh, then find lower_mnt by lower_root,
then decode lower_fh with lower_mnt.
Sound reasonable?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-25 17:41 ` Amir Goldstein
@ 2017-04-25 19:11 ` Amir Goldstein
2017-04-26 9:06 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 19:11 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 8:41 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 6:13 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> When overlay.fh xattr is found in a directory inode, instead of lookup
>>> of the dentry in next lower layer by name, first try to get it by calling
>>> exportfs_decode_fh().
>>>
>>> On failure to lookup by file handle to lower layer, fall back to lookup
>>> by name with or without path redirect.
>>>
>>> For now we only support following by file handle from upper if there is a
>>> single lower layer, because fallback from lookup by file hande to lookup
>>> by path in mid layers is not yet implemented.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>> fs/overlayfs/namei.c | 185 +++++++++++++++++++++++++++++++++++++++++++----
>>> fs/overlayfs/overlayfs.h | 1 +
>>> fs/overlayfs/util.c | 14 ++++
>>> 3 files changed, 186 insertions(+), 14 deletions(-)
>>>
> [...]
>>>
>>> +/* Check if p1 is connected with a chain of hashed dentries to p2 */
>>> +static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
>>> +{
>>> + struct dentry *p;
>>> +
>>> + for (p = p2; !IS_ROOT(p); p = p->d_parent) {
>>> + if (d_unhashed(p))
>>> + return false;
>>> + if (p->d_parent == p1)
>>> + return true;
>>> + }
>>> + return false;
>>> +}
>>
>> Walking the dentry tree without RCU protection is dangerous and broken.
>>
>
> I wonder if is_subdir() would be correct here?
> Or I could just follow its lead to implement the parent walk correctly.
> I did want to verify that the found dentry is not only 'connected' to
> root, but also 'lookable', because I don't want to find a deleted file
> when looking in lower layers.
> Maybe that was too much and in any case, I could just verify that
> the decoded dentry itself is hashed.
>
>> I'm also wondering if there's a better way to find the layer
>
> The purpose of this test is not only to find the layer, but
> also to verify that the found inode is linked under the layer root.
> I think that decode_fh() will always be able to create a disconnected
> dentry if decoding an inode that is on the same sb as the layer where
> fh was encoded. I'm pretty sure this was what I found in my initial
> tests which made me write the broken ovl_is_lookable().
>
>> (e.g. store the layer index in the handle as well).
>>
>
> But the layer index is a volatile number that can change.
> I would like to be able to find by fh also when more layers are added
> to the stack.
>
> The only thing I can think of is to store sb_uuid+layer_root_fh+lower_fh.
> At mount, we build a hash of the lower sb_uuid (save same_lower_uuid
> for now).
> At lookup, we first find lower_sb by uuid (verify same_lower_uuid for now),
> then decode lower_root by root_fh, then find lower_mnt by lower_root,
> then decode lower_fh with lower_mnt.
>
> Sound reasonable?
Or maybe like this:
At mount time either set or verify the xattr in upper layer root inode:
overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
ovl_root_id includes for each layer:
- sb uuid
- fh of root inode
If mount was able to set or verify that all ovl_root_id[i] match their
respective lower layer sb and root inode, then redirect_fh can be enabled,
otherwise it is disabled.
With redirect_fh enabled, it is safe to lookup by the lower layer index,
root fh and lower inode fh.
With redirect_fh enabled, it is safe to store handles on copy up along
with lower layer index and root fh.
A lower layer can be used and reused by any number of overlay mounts
at different layer index.
An upper layer can be reused in an overlay mount with either copied lower
layers or with different lower stack and will have redirect_fh disabled.
An upper layer can be rotated as lower layer, because file handles are
never followed from a lower layer. Constant inode numbers code does
not need to follow by fh from lower layers.
With this scheme, there is no need to store nor match sb_uuid a for
every single copy up and every single lookup by fh.
There is no need to 'lookup' the layer, just use the index and compare
the root_fh.
It is quite safe from following handles to wrong fs, except if user copies
parts of an upper layer (without the layer root), but doing something like
that is equivalent to a user that takes down an NFS server, brings up
a server with the same network address and exports the same share
name from a different filesystem.
Maybe the chances are more slim, but the same interesting things could
happen.
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-25 19:11 ` Amir Goldstein
@ 2017-04-26 9:06 ` Miklos Szeredi
2017-04-26 9:40 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:06 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 9:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Or maybe like this:
>
> At mount time either set or verify the xattr in upper layer root inode:
> overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
> ovl_root_id includes for each layer:
> - sb uuid
> - fh of root inode
>
> If mount was able to set or verify that all ovl_root_id[i] match their
> respective lower layer sb and root inode, then redirect_fh can be enabled,
> otherwise it is disabled.
>
> With redirect_fh enabled, it is safe to lookup by the lower layer index,
> root fh and lower inode fh.
> With redirect_fh enabled, it is safe to store handles on copy up along
> with lower layer index and root fh.
>
> A lower layer can be used and reused by any number of overlay mounts
> at different layer index.
>
> An upper layer can be reused in an overlay mount with either copied lower
> layers or with different lower stack and will have redirect_fh disabled.
>
> An upper layer can be rotated as lower layer, because file handles are
> never followed from a lower layer. Constant inode numbers code does
> not need to follow by fh from lower layers.
>
> With this scheme, there is no need to store nor match sb_uuid a for
> every single copy up and every single lookup by fh.
> There is no need to 'lookup' the layer, just use the index and compare
> the root_fh.
>
> It is quite safe from following handles to wrong fs, except if user copies
> parts of an upper layer (without the layer root), but doing something like
> that is equivalent to a user that takes down an NFS server, brings up
> a server with the same network address and exports the same share
> name from a different filesystem.
>
> Maybe the chances are more slim, but the same interesting things could
> happen.
Checking UUID would be O(1) and very fast, so I wouldn't worry about
that. Using is_subdir() to verify the layer is O(depth) but still
very fast. I don't think that's an issue either.
Using is_subdir() to find the layer would be O(depth*numlower). But
we can optimize that if we really want to: have a function that
returns the first ancestor of a dentry that is a layer root (marked
with a flag). Then we just need to map that dentry to the layer,
which can be done with a hash table or whatever.
And anyway uncached lookup will be slow, and we are only doing this
for copied up files and directories. So I don't think we need to
worry too much about optimizing this.
So for now lets just go with the original patch but replace
ovl_is_lookable() with is_subdir().
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 9:06 ` Miklos Szeredi
@ 2017-04-26 9:40 ` Amir Goldstein
2017-04-26 9:55 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 9:40 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 12:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Apr 25, 2017 at 9:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Or maybe like this:
>>
>> At mount time either set or verify the xattr in upper layer root inode:
>> overlay.root.$i [i:=0..numlower-1] - ovl_root_id of lower layer i
>> ovl_root_id includes for each layer:
>> - sb uuid
>> - fh of root inode
>>
>> If mount was able to set or verify that all ovl_root_id[i] match their
>> respective lower layer sb and root inode, then redirect_fh can be enabled,
>> otherwise it is disabled.
>>
>> With redirect_fh enabled, it is safe to lookup by the lower layer index,
>> root fh and lower inode fh.
>> With redirect_fh enabled, it is safe to store handles on copy up along
>> with lower layer index and root fh.
>>
>> A lower layer can be used and reused by any number of overlay mounts
>> at different layer index.
>>
>> An upper layer can be reused in an overlay mount with either copied lower
>> layers or with different lower stack and will have redirect_fh disabled.
>>
>> An upper layer can be rotated as lower layer, because file handles are
>> never followed from a lower layer. Constant inode numbers code does
>> not need to follow by fh from lower layers.
>>
>> With this scheme, there is no need to store nor match sb_uuid a for
>> every single copy up and every single lookup by fh.
>> There is no need to 'lookup' the layer, just use the index and compare
>> the root_fh.
>>
>> It is quite safe from following handles to wrong fs, except if user copies
>> parts of an upper layer (without the layer root), but doing something like
>> that is equivalent to a user that takes down an NFS server, brings up
>> a server with the same network address and exports the same share
>> name from a different filesystem.
>>
>> Maybe the chances are more slim, but the same interesting things could
>> happen.
>
> Checking UUID would be O(1) and very fast, so I wouldn't worry about
> that. Using is_subdir() to verify the layer is O(depth) but still
> very fast. I don't think that's an issue either.
>
> Using is_subdir() to find the layer would be O(depth*numlower). But
> we can optimize that if we really want to: have a function that
> returns the first ancestor of a dentry that is a layer root (marked
> with a flag). Then we just need to map that dentry to the layer,
> which can be done with a hash table or whatever.
>
> And anyway uncached lookup will be slow, and we are only doing this
> for copied up files and directories. So I don't think we need to
> worry too much about optimizing this.
>
> So for now lets just go with the original patch but replace
> ovl_is_lookable() with is_subdir().
>
Just to see that I understand you correctly.
I am now working on storing the following:
/*
* The tuple origin.{fh,layer,uuid} is a universal unique identifier
* for a copy up origin, where:
* origin.fh - exported file handle of the lower file
* origin.root - exported file handle of the lower layer root
* origin.uuid - uuid of the lower filesystem
*
* origin.{fh,root} are stored in format of a variable length binary blob
* with struct ovl_fh header (total blob size up to 20 bytes).
* uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
*/
I intend to implement lookup as follows:
- compare(origin.uuid, same_lower_sb->s_uuid)
# layer root dentries cannot be DCACHE_DISCONNECTED, so
# exportfs_decode_fh ignores mnt arg and returns the cached dentry
- root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
- find layer where lowerstack[layer].dentry == root
- this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
is_subdir() is NOT needed for decoding the layer root
is_subdir() is optional for decoding the lower file, because
it is not needed to identify the layer
The lookup is O(numlower)+O(depth) where O(depth) is
just as precousion.
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 9:40 ` Amir Goldstein
@ 2017-04-26 9:55 ` Miklos Szeredi
2017-04-26 10:17 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 9:55 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Just to see that I understand you correctly.
>
> I am now working on storing the following:
>
> /*
> * The tuple origin.{fh,layer,uuid} is a universal unique identifier
> * for a copy up origin, where:
> * origin.fh - exported file handle of the lower file
> * origin.root - exported file handle of the lower layer root
> * origin.uuid - uuid of the lower filesystem
I wouldn't even store origin.root.
> *
> * origin.{fh,root} are stored in format of a variable length binary blob
> * with struct ovl_fh header (total blob size up to 20 bytes).
> * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
> */
>
> I intend to implement lookup as follows:
> - compare(origin.uuid, same_lower_sb->s_uuid)
> # layer root dentries cannot be DCACHE_DISCONNECTED, so
> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
> - find layer where lowerstack[layer].dentry == root
> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>
> is_subdir() is NOT needed for decoding the layer root
> is_subdir() is optional for decoding the lower file, because
> it is not needed to identify the layer
Hmm, we can just force exportfs_decode_fh() to return a connected
dentry (return false from *acceptable() if the dentry is disconnected)
before going on to iterate the layers to see which one contains it.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 9:55 ` Miklos Szeredi
@ 2017-04-26 10:17 ` Amir Goldstein
2017-04-26 12:15 ` Miklos Szeredi
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 10:17 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Just to see that I understand you correctly.
>>
>> I am now working on storing the following:
>>
>> /*
>> * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>> * for a copy up origin, where:
>> * origin.fh - exported file handle of the lower file
>> * origin.root - exported file handle of the lower layer root
>> * origin.uuid - uuid of the lower filesystem
>
> I wouldn't even store origin.root.
>
>> *
>> * origin.{fh,root} are stored in format of a variable length binary blob
>> * with struct ovl_fh header (total blob size up to 20 bytes).
>> * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>> */
>>
>> I intend to implement lookup as follows:
>> - compare(origin.uuid, same_lower_sb->s_uuid)
>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>> - find layer where lowerstack[layer].dentry == root
>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>
>> is_subdir() is NOT needed for decoding the layer root
>> is_subdir() is optional for decoding the lower file, because
>> it is not needed to identify the layer
>
> Hmm, we can just force exportfs_decode_fh() to return a connected
> dentry (return false from *acceptable() if the dentry is disconnected)
> before going on to iterate the layers to see which one contains it.
>
Hmm, this might work, but to quote from exportfs_decode_fh():
"It's not a directory. Life is a little more complicated."
IIUC, 'connected' means 'connected to sb root', and not
'connected to mnt root', so in the optimal case where
all lower dentries are cached, exportfs_decode_fh() will return
a connected dentry for every fh we give it regardless of the
mnt argument, so we will have to use is_subdir() to find the
right layer, which brings us back to O(numlower*depth)
With the extra cost of storing the deducible information origin.root,
we will have less complex and more efficient lookup code.
Let me try and implement it and see if I am right.
We can always discard origin.root from v4 if it turns
out to be unhelpful.
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 10:17 ` Amir Goldstein
@ 2017-04-26 12:15 ` Miklos Szeredi
2017-04-26 14:51 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 12:15 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> Just to see that I understand you correctly.
>>>
>>> I am now working on storing the following:
>>>
>>> /*
>>> * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>> * for a copy up origin, where:
>>> * origin.fh - exported file handle of the lower file
>>> * origin.root - exported file handle of the lower layer root
>>> * origin.uuid - uuid of the lower filesystem
>>
>> I wouldn't even store origin.root.
>>
>>> *
>>> * origin.{fh,root} are stored in format of a variable length binary blob
>>> * with struct ovl_fh header (total blob size up to 20 bytes).
>>> * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>> */
>>>
>>> I intend to implement lookup as follows:
>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>> - find layer where lowerstack[layer].dentry == root
>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>
>>> is_subdir() is NOT needed for decoding the layer root
>>> is_subdir() is optional for decoding the lower file, because
>>> it is not needed to identify the layer
>>
>> Hmm, we can just force exportfs_decode_fh() to return a connected
>> dentry (return false from *acceptable() if the dentry is disconnected)
>> before going on to iterate the layers to see which one contains it.
>>
>
> Hmm, this might work, but to quote from exportfs_decode_fh():
> "It's not a directory. Life is a little more complicated."
>
> IIUC, 'connected' means 'connected to sb root', and not
> 'connected to mnt root', so in the optimal case where
> all lower dentries are cached, exportfs_decode_fh() will return
> a connected dentry for every fh we give it regardless of the
> mnt argument, so we will have to use is_subdir() to find the
> right layer, which brings us back to O(numlower*depth)
It just means that we might have to make up an artificial mount which
has its root at the sb root to be able to decode the handle into a
connected one.
>
> With the extra cost of storing the deducible information origin.root,
> we will have less complex and more efficient lookup code.
>
> Let me try and implement it and see if I am right.
> We can always discard origin.root from v4 if it turns
> out to be unhelpful.
I don't have good feelings about storing the root fh just because we
don't special case the layer root anywhere yet, and I wouldn't want to
do that unless there's a good reason.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 12:15 ` Miklos Szeredi
@ 2017-04-26 14:51 ` Amir Goldstein
2017-04-27 6:27 ` Amir Goldstein
2017-04-27 7:40 ` Miklos Szeredi
0 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 14:51 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>
>>>> Just to see that I understand you correctly.
>>>>
>>>> I am now working on storing the following:
>>>>
>>>> /*
>>>> * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>>> * for a copy up origin, where:
>>>> * origin.fh - exported file handle of the lower file
>>>> * origin.root - exported file handle of the lower layer root
>>>> * origin.uuid - uuid of the lower filesystem
>>>
>>> I wouldn't even store origin.root.
>>>
>>>> *
>>>> * origin.{fh,root} are stored in format of a variable length binary blob
>>>> * with struct ovl_fh header (total blob size up to 20 bytes).
>>>> * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>>> */
>>>>
>>>> I intend to implement lookup as follows:
>>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>>> - find layer where lowerstack[layer].dentry == root
>>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>>
>>>> is_subdir() is NOT needed for decoding the layer root
>>>> is_subdir() is optional for decoding the lower file, because
>>>> it is not needed to identify the layer
>>>
>>> Hmm, we can just force exportfs_decode_fh() to return a connected
>>> dentry (return false from *acceptable() if the dentry is disconnected)
>>> before going on to iterate the layers to see which one contains it.
>>>
>>
>> Hmm, this might work, but to quote from exportfs_decode_fh():
>> "It's not a directory. Life is a little more complicated."
>>
>> IIUC, 'connected' means 'connected to sb root', and not
>> 'connected to mnt root', so in the optimal case where
>> all lower dentries are cached, exportfs_decode_fh() will return
>> a connected dentry for every fh we give it regardless of the
>> mnt argument, so we will have to use is_subdir() to find the
>> right layer, which brings us back to O(numlower*depth)
>
> It just means that we might have to make up an artificial mount which
> has its root at the sb root to be able to decode the handle into a
> connected one.
>
I'm not sure I understand what this artificial mount buys us.
>>
>> With the extra cost of storing the deducible information origin.root,
>> we will have less complex and more efficient lookup code.
>>
>> Let me try and implement it and see if I am right.
>> We can always discard origin.root from v4 if it turns
>> out to be unhelpful.
>
> I don't have good feelings about storing the root fh just because we
> don't special case the layer root anywhere yet, and I wouldn't want to
> do that unless there's a good reason.
>
There are a few reasons for origin.root, not sure if they are good:
1. lookup is O(numlower+depth) instead of O(numlower*depth)
2. origin.uuid validates that we are still on the same sb
origin.root validates that we are still using the same lower dirs
and that files from old lower were not moved around to find themselves
inside a different lower dir
3. hardlinks between layers (!!!) will still get to the right layer
I personally think that reason #1 is the important one, but I think we
disagree on the technical details of exportfs_decode_fh() and we
need to sort this out.
Here is my untested implementation of find layer by uuid/rootfh
with the relevant comments. Maybe it helps you point out what
I am missing or what you are missing:
/* Find lower layer index by layer root file handle and uuid */
static int ovl_find_layer_by_fh(struct dentry *dentry, struct
ovl_lookup_data *d)
{
struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
struct super_block *lower_sb = ovl_same_lower_sb(dentry->d_sb);
struct dentry *this;
int i;
/*
* For now, we only support lookup by fh for all lower layers on the
* same sb. Not all filesystems set sb->s_uuid. For those who don't
* this code will compare zeros, which at least ensures us that the
* file handles are not crossing from filesystem with sb->s_uuid to
* a filesystem without sb->s_uuid and vice versa.
*/
if (!lower_sb || memcmp(lower_sb->s_uuid, &d->uuid, sizeof(d->uuid)))
return -1;
/*
* Layer root dentries are pinned, there are no aliases for dirs, and
* all lower layers are on the same sb. If rootfh is correct,
* exportfs_decode_fh() will find it in dcache and return the only
* instance, regardless of the mnt argument and we can compare the
* returned pointer with the pointers in lowerstack.
*/
this = ovl_decode_fh(roe->lowerstack[0].mnt, d->rootfh, ovl_is_dir);
if (IS_ERR(this))
return -1;
for (i = 0; i < roe->numlower; i++) {
if (this == roe->lowerstack[i].dentry)
break;
}
dput(this);
return i < roe->numlower ? i : -1;
}
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 14:51 ` Amir Goldstein
@ 2017-04-27 6:27 ` Amir Goldstein
2017-04-27 7:48 ` Miklos Szeredi
2017-04-27 7:40 ` Miklos Szeredi
1 sibling, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27 6:27 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 5:51 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 12:17 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Wed, Apr 26, 2017 at 12:55 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Wed, Apr 26, 2017 at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>> Just to see that I understand you correctly.
>>>>>
>>>>> I am now working on storing the following:
>>>>>
>>>>> /*
>>>>> * The tuple origin.{fh,layer,uuid} is a universal unique identifier
>>>>> * for a copy up origin, where:
>>>>> * origin.fh - exported file handle of the lower file
>>>>> * origin.root - exported file handle of the lower layer root
>>>>> * origin.uuid - uuid of the lower filesystem
>>>>
>>>> I wouldn't even store origin.root.
>>>>
>>>>> *
>>>>> * origin.{fh,root} are stored in format of a variable length binary blob
>>>>> * with struct ovl_fh header (total blob size up to 20 bytes).
>>>>> * uuid is stored in raw format (16 bytes) as published by sb->s_uuid.
>>>>> */
>>>>>
>>>>> I intend to implement lookup as follows:
>>>>> - compare(origin.uuid, same_lower_sb->s_uuid)
>>>>> # layer root dentries cannot be DCACHE_DISCONNECTED, so
>>>>> # exportfs_decode_fh ignores mnt arg and returns the cached dentry
>>>>> - root = exportfs_decode_fh(lowerstack[0].mnt, origin.root)
>>>>> - find layer where lowerstack[layer].dentry == root
>>>>> - this = exportfs_decode_fh(lowerstack[layer].mnt, origin.fh)
>>>>>
>>>>> is_subdir() is NOT needed for decoding the layer root
>>>>> is_subdir() is optional for decoding the lower file, because
>>>>> it is not needed to identify the layer
>>>>
>>>> Hmm, we can just force exportfs_decode_fh() to return a connected
>>>> dentry (return false from *acceptable() if the dentry is disconnected)
>>>> before going on to iterate the layers to see which one contains it.
>>>>
>>>
>>> Hmm, this might work, but to quote from exportfs_decode_fh():
>>> "It's not a directory. Life is a little more complicated."
>>>
>>> IIUC, 'connected' means 'connected to sb root', and not
>>> 'connected to mnt root', so in the optimal case where
>>> all lower dentries are cached, exportfs_decode_fh() will return
>>> a connected dentry for every fh we give it regardless of the
>>> mnt argument, so we will have to use is_subdir() to find the
>>> right layer, which brings us back to O(numlower*depth)
>>
>> It just means that we might have to make up an artificial mount which
>> has its root at the sb root to be able to decode the handle into a
>> connected one.
>>
>
> I'm not sure I understand what this artificial mount buys us.
Let me try to explain the problem with a worse case, but not
improbable example:
Suppose I have an overlay with deep file at /a/b/c/.../z
Suppose the layers are at /old/{lower,upper} I copy them
over to /new/{lower,upper} and mount the overlay at new path.
Suppose that dcache is fully populated under /new and fully
evicted under /old.
When trying to decode the file handle for z, exportfs_decode_fh()
will call the file system to actually read all directories a..z from disk
in order to reconnect the dentry of old z all the way up to /old
and it will do that *before* calling the acceptable() callback.
Alternatively, if we first try to decode the file handle for /old/lower,
decoding will be very fast (most likely already in cache) and we will
not have to continue to decoding z and reading all directories a..z
from disk.
This is why and how I implemented lookup by origin.{root+fh}
in v3 patch set.
>
>>>
>>> With the extra cost of storing the deducible information origin.root,
>>> we will have less complex and more efficient lookup code.
>>>
>>> Let me try and implement it and see if I am right.
>>> We can always discard origin.root from v4 if it turns
>>> out to be unhelpful.
>>
>> I don't have good feelings about storing the root fh just because we
>> don't special case the layer root anywhere yet, and I wouldn't want to
>> do that unless there's a good reason.
>>
Wait, what do you mean by "we don't special case the layer root?"
Do you mean that we could mount an overlay at a subdir path?
i.e. in the example below, we could mount an overlay with
upperdir=/new/upper/a/b/c,lowerdir=/new/lower/a/b/c?
If this is what you mean then it is not true that we don't special case
layer root. We do it with path redirect relative to layer root.
If anything, we should be storing origin.root along with overlay.redirect
in order to verify that we are not redirecting into the wrong relative
path.
>
> There are a few reasons for origin.root, not sure if they are good:
> 1. lookup is O(numlower+depth) instead of O(numlower*depth)
> 2. origin.uuid validates that we are still on the same sb
> origin.root validates that we are still using the same lower dirs
> and that files from old lower were not moved around to find themselves
> inside a different lower dir
> 3. hardlinks between layers (!!!) will still get to the right layer
>
> I personally think that reason #1 is the important one, but I think we
> disagree on the technical details of exportfs_decode_fh() and we
> need to sort this out.
>
> Here is my untested implementation of find layer by uuid/rootfh
> with the relevant comments. Maybe it helps you point out what
> I am missing or what you are missing:
>
> /* Find lower layer index by layer root file handle and uuid */
> static int ovl_find_layer_by_fh(struct dentry *dentry, struct
> ovl_lookup_data *d)
> {
> struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
> struct super_block *lower_sb = ovl_same_lower_sb(dentry->d_sb);
> struct dentry *this;
> int i;
>
> /*
> * For now, we only support lookup by fh for all lower layers on the
> * same sb. Not all filesystems set sb->s_uuid. For those who don't
> * this code will compare zeros, which at least ensures us that the
> * file handles are not crossing from filesystem with sb->s_uuid to
> * a filesystem without sb->s_uuid and vice versa.
> */
> if (!lower_sb || memcmp(lower_sb->s_uuid, &d->uuid, sizeof(d->uuid)))
> return -1;
>
> /*
> * Layer root dentries are pinned, there are no aliases for dirs, and
> * all lower layers are on the same sb. If rootfh is correct,
> * exportfs_decode_fh() will find it in dcache and return the only
> * instance, regardless of the mnt argument and we can compare the
> * returned pointer with the pointers in lowerstack.
> */
> this = ovl_decode_fh(roe->lowerstack[0].mnt, d->rootfh, ovl_is_dir);
> if (IS_ERR(this))
> return -1;
>
> for (i = 0; i < roe->numlower; i++) {
> if (this == roe->lowerstack[i].dentry)
> break;
> }
>
> dput(this);
> return i < roe->numlower ? i : -1;
> }
>
> Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-27 6:27 ` Amir Goldstein
@ 2017-04-27 7:48 ` Miklos Szeredi
2017-04-27 9:22 ` Amir Goldstein
2017-04-27 9:26 ` Miklos Szeredi
0 siblings, 2 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27 7:48 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Let me try to explain the problem with a worse case, but not
> improbable example:
>
> Suppose I have an overlay with deep file at /a/b/c/.../z
> Suppose the layers are at /old/{lower,upper} I copy them
> over to /new/{lower,upper} and mount the overlay at new path.
>
> Suppose that dcache is fully populated under /new and fully
> evicted under /old.
>
> When trying to decode the file handle for z, exportfs_decode_fh()
> will call the file system to actually read all directories a..z from disk
> in order to reconnect the dentry of old z all the way up to /old
> and it will do that *before* calling the acceptable() callback.
>
> Alternatively, if we first try to decode the file handle for /old/lower,
> decoding will be very fast (most likely already in cache) and we will
> not have to continue to decoding z and reading all directories a..z
> from disk.
To answer my own question in the prev mail: we need to decode the fh
and not just blindly use the inum to prevent issues with
copied/mutilited/etc lower layers.
And yes, in the copied case decoding origin.root first would be a good
optimization that couldn't be done without it.
> Wait, what do you mean by "we don't special case the layer root?"
> Do you mean that we could mount an overlay at a subdir path?
> i.e. in the example below, we could mount an overlay with
> upperdir=/new/upper/a/b/c,lowerdir=/new/lower/a/b/c?
>
> If this is what you mean then it is not true that we don't special case
> layer root. We do it with path redirect relative to layer root.
> If anything, we should be storing origin.root along with overlay.redirect
> in order to verify that we are not redirecting into the wrong relative
> path.
Yeah, you're right, we are special casing layer root.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-27 7:48 ` Miklos Szeredi
@ 2017-04-27 9:22 ` Amir Goldstein
2017-04-27 9:26 ` Miklos Szeredi
1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-27 9:22 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Thu, Apr 27, 2017 at 10:48 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Let me try to explain the problem with a worse case, but not
>> improbable example:
>>
>> Suppose I have an overlay with deep file at /a/b/c/.../z
>> Suppose the layers are at /old/{lower,upper} I copy them
>> over to /new/{lower,upper} and mount the overlay at new path.
>>
>> Suppose that dcache is fully populated under /new and fully
>> evicted under /old.
>>
>> When trying to decode the file handle for z, exportfs_decode_fh()
>> will call the file system to actually read all directories a..z from disk
>> in order to reconnect the dentry of old z all the way up to /old
>> and it will do that *before* calling the acceptable() callback.
>>
>> Alternatively, if we first try to decode the file handle for /old/lower,
>> decoding will be very fast (most likely already in cache) and we will
>> not have to continue to decoding z and reading all directories a..z
>> from disk.
>
> To answer my own question in the prev mail: we need to decode the fh
> and not just blindly use the inum to prevent issues with
> copied/mutilited/etc lower layers.
>
I was going to refer you to this example when reading you question
in prev email. That's what we get for no read/write barriers in emails ;-)
> And yes, in the copied case decoding origin.root first would be a good
> optimization that couldn't be done without it.
>
Good, so we seem to have an agreement w.r.t. the lookup fh patch.
I've already applied a change to disable redirect_fh if lower s_uuid is
zeros and I verified that it works as expected by running the hard-link
constant inode test that relies on redirect_fh over xfs mounted with
-o nouuid.
I will be posting the enhanced xfstest for constant inodes later today.
Let me know when are are done reviewing the series, so I can rework it
with the binary blob change you requested.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-27 7:48 ` Miklos Szeredi
2017-04-27 9:22 ` Amir Goldstein
@ 2017-04-27 9:26 ` Miklos Szeredi
[not found] ` <CAOQ4uxiweaqzR3eT-StgtDFAHBuYhGRvAJE6v=XpH33MevpmoA@mail.gmail.com>
1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27 9:26 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Thu, Apr 27, 2017 at 9:48 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Apr 27, 2017 at 8:27 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Let me try to explain the problem with a worse case, but not
>> improbable example:
>>
>> Suppose I have an overlay with deep file at /a/b/c/.../z
>> Suppose the layers are at /old/{lower,upper} I copy them
>> over to /new/{lower,upper} and mount the overlay at new path.
>>
>> Suppose that dcache is fully populated under /new and fully
>> evicted under /old.
>>
>> When trying to decode the file handle for z, exportfs_decode_fh()
>> will call the file system to actually read all directories a..z from disk
>> in order to reconnect the dentry of old z all the way up to /old
>> and it will do that *before* calling the acceptable() callback.
>>
>> Alternatively, if we first try to decode the file handle for /old/lower,
>> decoding will be very fast (most likely already in cache) and we will
>> not have to continue to decoding z and reading all directories a..z
>> from disk.
>
> To answer my own question in the prev mail: we need to decode the fh
> and not just blindly use the inum to prevent issues with
> copied/mutilited/etc lower layers.
Hmm, this is absurd. Why are we going to all this trouble to find the
origin inode though decoding the file handle when this thing was meant
to be an *optimization*? Without redirect, we can look up origin just
like we do for merge dirs. Way faster than decoding a connected
dentry, which is going to result in a readdir of the parent directory
and whatnot. The only thing we need is a bool "was this copied" flag.
For moved files, decoding the fh might be an optimization over walking
the redirect, but that depends on a various factors, and it might also
be a lot slower... But it's needed for the snapshot case, right?
Am I missing something?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 05/11] ovl: lookup redirect by file handle
2017-04-26 14:51 ` Amir Goldstein
2017-04-27 6:27 ` Amir Goldstein
@ 2017-04-27 7:40 ` Miklos Szeredi
1 sibling, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27 7:40 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 4:51 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 3:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> I don't have good feelings about storing the root fh just because we
>> don't special case the layer root anywhere yet, and I wouldn't want to
>> do that unless there's a good reason.
>>
>
> There are a few reasons for origin.root, not sure if they are good:
> 1. lookup is O(numlower+depth) instead of O(numlower*depth)
We can optimize to O(numlower+depth) even without origin.root.
> 2. origin.uuid validates that we are still on the same sb
> origin.root validates that we are still using the same lower dirs
> and that files from old lower were not moved around to find themselves
> inside a different lower dir
Parent is encoded in the fh, so that makes it resistant to moving. See
the exportfs_get_name() trickery to get a non-dir connected. It's
needed whether we have origin.root or not. And yes, it's pretty
heavyweight. Wondering if it's worth the trouble, since we are not
actually going to use the lower inode for anything else than getting
the inode number. And then we could just store the inode number
instead of the fh, and be rid of this mess.
If file is moved to another layer by moving an ancestor directory then
we won't detect that. Question is: do we care? It's definitely in
the "you messed with lower dirs, you keep the pieces" territory.
> 3. hardlinks between layers (!!!) will still get to the right layer
Even without origin.root it should get the right layer, since we are
encoding the parent in the fh.
> I personally think that reason #1 is the important one, but I think we
> disagree on the technical details of exportfs_decode_fh() and we
> need to sort this out.
>
> Here is my untested implementation of find layer by uuid/rootfh
> with the relevant comments. Maybe it helps you point out what
> I am missing or what you are missing:
Yeah, it simplifies the implementation. But implementation is
secondary to interface...
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (4 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 05/11] ovl: lookup redirect by file handle Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
` (8 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
When non directory upper has overlay.fh xattr, lookup in lower layers
by file handle or by path to find the copy up origin inode.
Until this change a non-dir dentry could have had oe->numlower == 1
with oe->lowerstack[0] pointing at the copy up origin path, right
after copy up, but not when a non-dir dentry was created by ovl_lookup().
After this change, a non-dir dentry could be pointing at the copy up
origin after ovl_lookup(), as long as the copy up was done by overlayfs
that had redirect_fh support.
Non-dir entries that were copied up by overlayfs without redirect_fh
support will look the same as pure upper non-dir entries.
This is going to be used for persistent inode numbers across copy up.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/namei.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 0d1cc8f..318092a 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -225,15 +225,16 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
goto put_and_out;
}
if (!d_can_lookup(this)) {
- d->stop = true;
- if (d->is_dir)
+ if (d->is_dir) {
+ d->stop = true;
goto put_and_out;
- goto out;
- }
- d->is_dir = true;
- if (!d->last && ovl_is_opaquedir(this)) {
- d->stop = d->opaque = true;
- goto out;
+ }
+ } else {
+ d->is_dir = true;
+ if (!d->last && ovl_is_opaquedir(this)) {
+ d->stop = d->opaque = true;
+ goto out;
+ }
}
if (d->last)
goto out;
@@ -247,6 +248,9 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
if (err)
goto out_err;
}
+ /* No redirect for non-dir means pure upper */
+ if (!d->is_dir)
+ d->stop = !d->fh && !d->redirect;
out:
*ret = this;
return 0;
@@ -385,11 +389,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
}
/*
- * For now we only support lower by fh in single layer, because
- * fallback from lookup by fh to lookup by path in mid layers for
- * merge directory is not yet implemented.
+ * For now we only support lookup by fh in single layer for directory,
+ * because fallback from lookup by fh to lookup by path in mid layers
+ * for merge directory is not yet implemented.
*/
- if (!ofs->redirect_fh || ofs->numlower > 1) {
+ if (!ofs->redirect_fh || (d.is_dir && ofs->numlower > 1)) {
kfree(d.fh);
d.fh = NULL;
}
@@ -402,7 +406,6 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
goto out_put_upper;
}
- /* Try to lookup lower layers by file handle */
d.by_path = false;
for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
struct path lowerpath = poe->lowerstack[i];
@@ -446,7 +449,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
stack[ctr].mnt = lowerpath.mnt;
ctr++;
- if (d.stop)
+ /* Do not follow non-dir copy up origin more than once */
+ if (d.stop || !d.is_dir)
break;
if (d.redirect && d.redirect[0] == '/' && poe != roe) {
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (5 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 06/11] ovl: lookup non-dir inode copy up origin Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-26 14:40 ` Miklos Szeredi
2017-04-24 9:14 ` [PATCH v2 08/11] ovl: redirect non-dir by path on rename Amir Goldstein
` (7 subsequent siblings)
14 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
a target of a copy up.
For directory entries COPYUP = MERGE && UPPER. For non-dir entries
non zero oe->numlower implies COPYUP, but COPYUP does not imply
non zero oe->numlower. COPYUP can also be set on lookup when detecting
an overlay.fh xattr on a non-dir, even if that fh cannot be followed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/namei.c | 3 +++
fs/overlayfs/overlayfs.h | 2 ++
fs/overlayfs/util.c | 12 ++++++++----
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 318092a..73a8879 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
}
if (d.opaque)
type |= __OVL_PATH_OPAQUE;
+ /* overlay.fh xattr implies this is a copy up */
+ if (d.fh)
+ type |= __OVL_PATH_COPYUP;
}
/*
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 08002ce..d0bb538 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -13,11 +13,13 @@ enum ovl_path_type {
__OVL_PATH_UPPER = (1 << 0),
__OVL_PATH_MERGE = (1 << 1),
__OVL_PATH_OPAQUE = (1 << 2),
+ __OVL_PATH_COPYUP = (1 << 3),
};
#define OVL_TYPE_UPPER(type) ((type) & __OVL_PATH_UPPER)
#define OVL_TYPE_MERGE(type) ((type) & __OVL_PATH_MERGE)
#define OVL_TYPE_OPAQUE(type) ((type) & __OVL_PATH_OPAQUE)
+#define OVL_TYPE_COPYUP(type) ((type) & __OVL_PATH_COPYUP)
#define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
#define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index dba9753..89789bc 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
if (oe->__upperdentry) {
type |= __OVL_PATH_UPPER;
/*
- * Non-dir dentry can hold lower dentry from before
- * copy-up.
+ * oe->numlower implies a copy up, but copy up does not imply
+ * oe->numlower. It can also be set on lookup when detecting
+ * an overlay.fh xattr on a non-dir that cannot be followed.
*/
- if (oe->numlower && is_dir)
- type |= __OVL_PATH_MERGE;
+ if (oe->numlower) {
+ type |= __OVL_PATH_COPYUP;
+ if (is_dir)
+ type |= __OVL_PATH_MERGE;
+ }
} else {
if (oe->numlower > 1)
type |= __OVL_PATH_MERGE;
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-24 9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
@ 2017-04-26 14:40 ` Miklos Szeredi
2017-04-26 14:53 ` Miklos Szeredi
2017-04-26 14:57 ` Amir Goldstein
0 siblings, 2 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 14:40 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
> Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
> a target of a copy up.
>
> For directory entries COPYUP = MERGE && UPPER. For non-dir entries
> non zero oe->numlower implies COPYUP, but COPYUP does not imply
> non zero oe->numlower. COPYUP can also be set on lookup when detecting
> an overlay.fh xattr on a non-dir, even if that fh cannot be followed.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/overlayfs/namei.c | 3 +++
> fs/overlayfs/overlayfs.h | 2 ++
> fs/overlayfs/util.c | 12 ++++++++----
> 3 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 318092a..73a8879 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
> }
> if (d.opaque)
> type |= __OVL_PATH_OPAQUE;
> + /* overlay.fh xattr implies this is a copy up */
> + if (d.fh)
> + type |= __OVL_PATH_COPYUP;
> }
>
> /*
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 08002ce..d0bb538 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -13,11 +13,13 @@ enum ovl_path_type {
> __OVL_PATH_UPPER = (1 << 0),
> __OVL_PATH_MERGE = (1 << 1),
> __OVL_PATH_OPAQUE = (1 << 2),
> + __OVL_PATH_COPYUP = (1 << 3),
> };
>
> #define OVL_TYPE_UPPER(type) ((type) & __OVL_PATH_UPPER)
> #define OVL_TYPE_MERGE(type) ((type) & __OVL_PATH_MERGE)
> #define OVL_TYPE_OPAQUE(type) ((type) & __OVL_PATH_OPAQUE)
> +#define OVL_TYPE_COPYUP(type) ((type) & __OVL_PATH_COPYUP)
>
> #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
> #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index dba9753..89789bc 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
> if (oe->__upperdentry) {
> type |= __OVL_PATH_UPPER;
> /*
> - * Non-dir dentry can hold lower dentry from before
> - * copy-up.
> + * oe->numlower implies a copy up, but copy up does not imply
> + * oe->numlower. It can also be set on lookup when detecting
> + * an overlay.fh xattr on a non-dir that cannot be followed.
The code looks fine, but I don't understand the comment. Why would we
set COPYUP flag when the fh cannot be followed?
The reason I think the COPYUP vs. MERGE distinction is needed is the
ovl_check_empty_and_clear() thing. It starts with a merged directory
with some whiteouts in it and exchanges it with an empty and opaque
directory. Normally the empty directory will be deleted immediately,
but if something fails during the deletion, then it will remain there.
The overlay is left in a consistent state, but the association with
the original inode should still remain, so it will have COPYUP but not
MERGE.
Now the current code is actually broken, because we leave the old,
replaced directory in __upperdentry as well as the rest of the lower
stack. So should the deletion fail after the replacement things won't
work properly.
I think we can fix that by replacing __upperdentry. Luckily we are
under inode lock, so protected against concurrent readdir or creation
inside the directory. Then we have lifetime problems. Until now a
positive __upperdentry was assumed to have a lifetime equal to that of
the overlay dentry. We'd need an old_upperdentry to save it. I think
that's it it, but maybe there are other issues.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-26 14:40 ` Miklos Szeredi
@ 2017-04-26 14:53 ` Miklos Szeredi
2017-04-26 15:02 ` Amir Goldstein
2017-04-26 14:57 ` Amir Goldstein
1 sibling, 1 reply; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-26 14:53 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> The reason I think the COPYUP vs. MERGE distinction is needed is the
> ovl_check_empty_and_clear() thing. It starts with a merged directory
> with some whiteouts in it and exchanges it with an empty and opaque
> directory. Normally the empty directory will be deleted immediately,
> but if something fails during the deletion, then it will remain there.
> The overlay is left in a consistent state, but the association with
> the original inode should still remain, so it will have COPYUP but not
> MERGE.
One more thought: we could introduce a separate "overlay.merge"
attribute that is the exact opposite of "overlay.opaque".
"overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
imply "overlay.merge".
It would allow us to optionally get rid of "overlay.opaque" when back
compatibility is not needed.
It would also allow a new feature: on metadata only updates of regular
files we wouldn't need to copy up the data.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-26 14:53 ` Miklos Szeredi
@ 2017-04-26 15:02 ` Amir Goldstein
2017-04-26 18:51 ` Amir Goldstein
2017-04-27 9:32 ` Miklos Szeredi
0 siblings, 2 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 15:02 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>> ovl_check_empty_and_clear() thing. It starts with a merged directory
>> with some whiteouts in it and exchanges it with an empty and opaque
>> directory. Normally the empty directory will be deleted immediately,
>> but if something fails during the deletion, then it will remain there.
>> The overlay is left in a consistent state, but the association with
>> the original inode should still remain, so it will have COPYUP but not
>> MERGE.
>
> One more thought: we could introduce a separate "overlay.merge"
> attribute that is the exact opposite of "overlay.opaque".
> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
> imply "overlay.merge".
>
> It would allow us to optionally get rid of "overlay.opaque" when back
> compatibility is not needed.
>
> It would also allow a new feature: on metadata only updates of regular
> files we wouldn't need to copy up the data.
>
So you intend to set overlay.merge for non-dir?
How is it different from overlay.fh then?
With it's new name, overlay.origin.fh indicates that there is a copy
up origin below us. Either directly below us, or at overlay.redirect.
We can also try to follow to origin by fh, but that is only an optimization -
an important optimization IMO, because file rename are more common
than dir renames and lookup stable inode by fh in a deep directory
with many layers will be much more efficient by fh.
Are we understanding each other w.r.t. overlay.merge vs overlay.fh?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-26 15:02 ` Amir Goldstein
@ 2017-04-26 18:51 ` Amir Goldstein
2017-04-27 9:32 ` Miklos Szeredi
1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 18:51 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 6:02 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>>> ovl_check_empty_and_clear() thing. It starts with a merged directory
>>> with some whiteouts in it and exchanges it with an empty and opaque
>>> directory. Normally the empty directory will be deleted immediately,
>>> but if something fails during the deletion, then it will remain there.
>>> The overlay is left in a consistent state, but the association with
>>> the original inode should still remain, so it will have COPYUP but not
>>> MERGE.
>>
>> One more thought: we could introduce a separate "overlay.merge"
>> attribute that is the exact opposite of "overlay.opaque".
>> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
>> imply "overlay.merge".
>>
>> It would allow us to optionally get rid of "overlay.opaque" when back
>> compatibility is not needed.
>>
>> It would also allow a new feature: on metadata only updates of regular
>> files we wouldn't need to copy up the data.
>>
>
> So you intend to set overlay.merge for non-dir?
> How is it different from overlay.fh then?
> With it's new name, overlay.origin.fh indicates that there is a copy
> up origin below us. Either directly below us, or at overlay.redirect.
> We can also try to follow to origin by fh, but that is only an optimization -
I miss-spoke - redirect_fh to origin is not only as optimization.
Although renames do not depend on redirect_fh, hardlinks do.
As I learned from improved unionmount-testsuite:
./run --ov=1 hard-link
...
./run --link /mnt/a/no_foo110 /mnt/a/foo110
mount -t overlay overlay /mnt
-olowerdir=/upper/0:/lower,upperdir=/upper/1,workdir=/upper/work
sh (8035): drop_caches: 3
/mnt/a/foo110: inode number wrong (got 68908, want 68898)
This error happens in non-samefs case when there is more than 1 lower layer
and redirect_fh is disabled.
It happens after link and mount cycle because the linked upper file, does not
know how to lookup the lower origin.
The error does not happen with samefs and with single lower fs, i.e.:
./run --ov=0 hard-link
./run --ov=1 --samefs hard-link
Because in those cases, all the upper hardlinks follow to origin by fh
and report the same inode number.
I think this calls for setting overlay.redirect also on the target of
ovl_link()??
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-26 15:02 ` Amir Goldstein
2017-04-26 18:51 ` Amir Goldstein
@ 2017-04-27 9:32 ` Miklos Szeredi
1 sibling, 0 replies; 69+ messages in thread
From: Miklos Szeredi @ 2017-04-27 9:32 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 5:02 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 5:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 26, 2017 at 4:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> The reason I think the COPYUP vs. MERGE distinction is needed is the
>>> ovl_check_empty_and_clear() thing. It starts with a merged directory
>>> with some whiteouts in it and exchanges it with an empty and opaque
>>> directory. Normally the empty directory will be deleted immediately,
>>> but if something fails during the deletion, then it will remain there.
>>> The overlay is left in a consistent state, but the association with
>>> the original inode should still remain, so it will have COPYUP but not
>>> MERGE.
>>
>> One more thought: we could introduce a separate "overlay.merge"
>> attribute that is the exact opposite of "overlay.opaque".
>> "overlay.merge" would imply "overlay.fh" but "overlay.fh" would not
>> imply "overlay.merge".
>>
>> It would allow us to optionally get rid of "overlay.opaque" when back
>> compatibility is not needed.
>>
>> It would also allow a new feature: on metadata only updates of regular
>> files we wouldn't need to copy up the data.
>>
>
> So you intend to set overlay.merge for non-dir?
Nope, not by default.
> How is it different from overlay.fh then?
It would make sense for regular files for the non-samefs or non-clone
fs cases if only metadata (attr, xattr) are modified but data is not.
We'd create an empty file with the copied up metadata and
"overlay.merge" set indicating that the data I/O should still be
redirected to the origin, while metadata is kept in the copied up
file. This can be upgraded to a fully copied-up file later.
Not something for this series, obviously.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs
2017-04-26 14:40 ` Miklos Szeredi
2017-04-26 14:53 ` Miklos Szeredi
@ 2017-04-26 14:57 ` Amir Goldstein
1 sibling, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-26 14:57 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Wed, Apr 26, 2017 at 5:40 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Mon, Apr 24, 2017 at 11:14 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
>> Define a new type flag OVL_TYPE_COPYUP to indicate that an entry is
>> a target of a copy up.
>>
>> For directory entries COPYUP = MERGE && UPPER. For non-dir entries
>> non zero oe->numlower implies COPYUP, but COPYUP does not imply
>> non zero oe->numlower. COPYUP can also be set on lookup when detecting
>> an overlay.fh xattr on a non-dir, even if that fh cannot be followed.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>> fs/overlayfs/namei.c | 3 +++
>> fs/overlayfs/overlayfs.h | 2 ++
>> fs/overlayfs/util.c | 12 ++++++++----
>> 3 files changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
>> index 318092a..73a8879 100644
>> --- a/fs/overlayfs/namei.c
>> +++ b/fs/overlayfs/namei.c
>> @@ -386,6 +386,9 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>> }
>> if (d.opaque)
>> type |= __OVL_PATH_OPAQUE;
>> + /* overlay.fh xattr implies this is a copy up */
>> + if (d.fh)
>> + type |= __OVL_PATH_COPYUP;
>> }
>>
>> /*
>> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
>> index 08002ce..d0bb538 100644
>> --- a/fs/overlayfs/overlayfs.h
>> +++ b/fs/overlayfs/overlayfs.h
>> @@ -13,11 +13,13 @@ enum ovl_path_type {
>> __OVL_PATH_UPPER = (1 << 0),
>> __OVL_PATH_MERGE = (1 << 1),
>> __OVL_PATH_OPAQUE = (1 << 2),
>> + __OVL_PATH_COPYUP = (1 << 3),
>> };
>>
>> #define OVL_TYPE_UPPER(type) ((type) & __OVL_PATH_UPPER)
>> #define OVL_TYPE_MERGE(type) ((type) & __OVL_PATH_MERGE)
>> #define OVL_TYPE_OPAQUE(type) ((type) & __OVL_PATH_OPAQUE)
>> +#define OVL_TYPE_COPYUP(type) ((type) & __OVL_PATH_COPYUP)
>>
>> #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
>> #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
>> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> index dba9753..89789bc 100644
>> --- a/fs/overlayfs/util.c
>> +++ b/fs/overlayfs/util.c
>> @@ -101,11 +101,15 @@ enum ovl_path_type ovl_update_type(struct dentry *dentry, bool is_dir)
>> if (oe->__upperdentry) {
>> type |= __OVL_PATH_UPPER;
>> /*
>> - * Non-dir dentry can hold lower dentry from before
>> - * copy-up.
>> + * oe->numlower implies a copy up, but copy up does not imply
>> + * oe->numlower. It can also be set on lookup when detecting
>> + * an overlay.fh xattr on a non-dir that cannot be followed.
>
> The code looks fine, but I don't understand the comment. Why would we
> set COPYUP flag when the fh cannot be followed?
>
See patch #8 ovl: redirect non-dir by path on rename
overlay.fh *is* the indication of a copy up and non-dir copy ups
should be redirected
on rename.
With this, copying the layers will not break the constant inode property.
upper files will continue to follow by path to origin and report the
new (post copy)
stable inode number.
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH v2 08/11] ovl: redirect non-dir by path on rename
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (6 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 07/11] ovl: set the COPYUP type flag for non-dirs Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up Amir Goldstein
` (6 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
When a non-dir COPYUP type entry is being renamed, set its
overlay.redirect xattr, just the same as when renaming a lower
or merge directory.
This will be used to find the copy up original of non-dir inodes
in case the lower layers do not support lookup by file handle.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/dir.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 6515796..edfe3df 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -795,6 +795,13 @@ static bool ovl_type_merge_or_lower(struct dentry *dentry)
return OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type);
}
+static bool ovl_type_copyup(struct dentry *dentry)
+{
+ enum ovl_path_type type = ovl_path_type(dentry);
+
+ return OVL_TYPE_COPYUP(type);
+}
+
static bool ovl_can_move(struct dentry *dentry)
{
return ovl_redirect_dir(dentry->d_sb) ||
@@ -1022,6 +1029,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
err = ovl_set_opaque(old, olddentry);
if (err)
goto out_dput;
+ } else if (ovl_type_copyup(old)) {
+ err = ovl_set_redirect(old, samedir);
}
if (!overwrite && new_is_dir) {
if (ovl_type_merge_or_lower(new))
@@ -1030,6 +1039,8 @@ static int ovl_rename(struct inode *olddir, struct dentry *old,
err = ovl_set_opaque(new, newdentry);
if (err)
goto out_dput;
+ } else if (!overwrite && ovl_type_copyup(new)) {
+ err = ovl_set_redirect(new, samedir);
}
err = ovl_do_rename(old_upperdir->d_inode, olddentry,
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (7 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 08/11] ovl: redirect non-dir by path on rename Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 10/11] ovl: persistent and constant inode number for directories Amir Goldstein
` (5 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
When getting attributes for overlay inode of path type COPYUP,
get the inode and dev numbers from the copy up origin inode.
This results in constant and persistent st_ino/st_dev representation
of files in overlay mount before and after copy up as well as after
mount cycle.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/inode.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 17b8418..3615a52 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -60,15 +60,25 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
static int ovl_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags)
{
- struct dentry *dentry = path->dentry;
+ struct dentry *lower, *dentry = path->dentry;
struct path realpath;
const struct cred *old_cred;
+ enum ovl_path_type type;
int err;
- ovl_path_real(dentry, &realpath);
+ type = ovl_path_real(dentry, &realpath);
old_cred = ovl_override_creds(dentry->d_sb);
err = vfs_getattr(&realpath, stat, request_mask, flags);
revert_creds(old_cred);
+ if (err)
+ return err;
+
+ lower = ovl_dentry_lower(dentry);
+ if (OVL_TYPE_COPYUP(type) && lower) {
+ stat->dev = lower->d_sb->s_dev;
+ stat->ino = lower->d_inode->i_ino;
+ }
+
return err;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 10/11] ovl: persistent and constant inode number for directories
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (8 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 09/11] ovl: constant st_ino/st_dev across copy up Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 9:14 ` [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount Amir Goldstein
` (4 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
stat(2) on overlay directories reports the overlay temp inode
number, which is constant across copy up, but is not persistent.
When all layers are on the same fs, report the upper most lower inode
(a.k.a stable inode) number for directories.
This inode number is persistent, unique across the overlay mount and
constant across copy up.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/dir.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index edfe3df..6106649 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -154,8 +154,23 @@ static int ovl_dir_getattr(const struct path *path, struct kstat *stat,
if (err)
return err;
+ /*
+ * Always use the overlay bdev for directories, so 'find -xdev' will
+ * scan the entire overlay mount and won't cross the overlay mount
+ * boundaries.
+ */
stat->dev = dentry->d_sb->s_dev;
- stat->ino = dentry->d_inode->i_ino;
+ /*
+ * When all layers are not on the same fs, the pair real inode numbers
+ * and overlay bdev is not unique, so use the non persistent overlay
+ * inode number.
+ * When all layers are on the same fs, use the stable inode number,
+ * which is persistent, unique and constant across copy up.
+ */
+ if (!ovl_same_sb(dentry->d_sb))
+ stat->ino = dentry->d_inode->i_ino;
+ else if (OVL_TYPE_UPPER(type) && OVL_TYPE_MERGE(type))
+ stat->ino = ovl_dentry_lower(dentry)->d_inode->i_ino;
/*
* It's probably not worth it to count subdirs to get the
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (9 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 10/11] ovl: persistent and constant inode number for directories Amir Goldstein
@ 2017-04-24 9:14 ` Amir Goldstein
2017-04-24 18:40 ` [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks Amir Goldstein
` (3 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 9:14 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
Overlay directory inodes report overlay bdev to stat(2).
Overlay non-dir inodes report real bdev and real ino to stat(2).
Due to the different bdev values for dir and non-dir inodes, when executing
the command du -x on an overlay mount, the result is wrong because non-dirs
are not accounted for in the overlay bdev usage.
The reasons for this bdev inconsistecy is:
1. The overlay ino is not persistent, so real ino is used for non-dirs
2. The tupple overlay bdev and real ino is not unique, so real bdev is
used for non-dirs
In case all overlay layers are on the same underlying fs, the tupple
from reason 2 above is unique, so use this tupple for non-dirs to get
the correct result from du -x.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/inode.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3615a52..39c3bb0 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -78,6 +78,13 @@ static int ovl_getattr(const struct path *path, struct kstat *stat,
stat->dev = lower->d_sb->s_dev;
stat->ino = lower->d_inode->i_ino;
}
+ /*
+ * When all layers are on same fs, the tupple overlay bdev
+ * and real inode ino is unique, so it is preferred to expose
+ * overlay bdev for overlay inodes for things like du -x.
+ */
+ if (ovl_same_sb(dentry->d_sb))
+ stat->dev = dentry->d_sb->s_dev;
return err;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (10 preceding siblings ...)
2017-04-24 9:14 ` [PATCH v2 11/11] ovl: fix du --one-file-system on overlay mount Amir Goldstein
@ 2017-04-24 18:40 ` Amir Goldstein
2017-04-24 18:51 ` [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (2 subsequent siblings)
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 18:40 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
An upper type non directory dentry that is a copy up target
should have a reference to its lower copy up origin.
There are three ways for an upper type dentry to be instantiated:
1. A lower type dentry that is being copied up
2. An entry that is found in upper dir by ovl_lookup()
3. A negative dentry is hardlinked to an upper type dentry
In the first case, the lower reference is set before copy up.
In the second case, the lower reference is found by ovl_lookup().
In the last case of hardlinked upper dentry, it is not easy to
update the lower reference of the negative dentry. Instead,
drop the newly hardlinked negative dentry from dcache and let
the next access call ovl_lookup() to find its lower reference.
This makes sure that the inode number reported by stat(2) after
the hardlink is created is the same inode number that will be
reported by stat(2) after mount cycle, which is the inode number
of the lower copy up origin of the hardlink source.
NOTE that this does not fix breaking of lower hardlinks on copy
up, but it will result in stat(2) reporting the same inode number
for all the upper broken hardlinks.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/dir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 03854abf7..6ef35e8 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -197,6 +197,9 @@ static void ovl_instantiate(struct dentry *dentry, struct inode *inode,
inc_nlink(inode);
}
d_instantiate(dentry, inode);
+ /* Force lookup of new upper hardlink to find its lower */
+ if (hardlink)
+ d_drop(dentry);
}
static bool ovl_type_merge(struct dentry *dentry)
--
2.7.4
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (11 preceding siblings ...)
2017-04-24 18:40 ` [PATCH v2 12/12] ovl: persistent inode numbers for hardlinks Amir Goldstein
@ 2017-04-24 18:51 ` Amir Goldstein
2017-04-25 11:52 ` Vivek Goyal
2017-04-25 12:16 ` Vivek Goyal
14 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-24 18:51 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Vivek Goyal, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 12:14 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Miklos,
>
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
>
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.
>
> For the case of *all* layers on same fs, overlayfs also gains:
> - Persistent inode numbers for directories
> - Correct results for du -x
>
> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
> this series. It will be addressed for the 'samefs' configuration by the
> follow up 'stable inode' work, which is also going to address preserving
> hardlinks on copy up.
>
> This series is available for testing on [1].
> unionmount-testsuite needs a small fix patch for layers_check() [2].
> Tested the following layer configurations:
> ./run --ov{,=0,=10} {,--samefs}
Miklos,
I instrumented unionmount-testsuite to mount cycle and compare ino
with pre copy up ino after rename and link operations [2].
It found one bug w.r.t. inode number of hardlinks, i.e.:
./run --ov hard-link
/mnt/a/no_foo100: inode number wrong (got 18833, want 17406)
I posted a 12th patch ("ovl: persistent inode numbers for hardlinks")
that fixes this issue, but I am not sure about 2 things:
1. The fix may not be so elegant (d_drop after d_instantiate)
2. Should broken hardlinks report the same inode number?
Without patch 12, broken hardlinks do report the same (lower) ino,
but only after ovl_lookup(). After ovl_link() the target reports the upper ino.
Patch 12 fixes the ovl_link() case, but I'm not sure if that is the
desired outcome.
>
> Tested constant inode numbers with xfstest overlay/017 and added a check
> for persistent directory inode numbers across mount cycle [3].
>
> Most of the patches in this series you already reviewed at one time or another
> and have your comments already addressed. Some other patches are trivial.
> Probably the only patches you need to take a closer look at are the 2 lookup
> patches (5-6).
>
> The implementation of lookup of a merged dir with a combination of redirect
> by fh from upper and redirect by name in mid layer is more complicated.
> Because this case is not strictly needed for this series, I simplified
> things a bit and restricted lookup by fh to those cases:
> 1. Non directory (lookup of copy up origin)
> 2. Merge directory when ofs->numlower == 1
>
> This restriction may be relaxed later on if we want to handle lookup by fh
> with fallback to lookup by path for merge dirs.
>
> What do you say? ... Too late for v4.12?
>
> Amir.
>
> [1] https://github.com/amir73il/linux/commits/ovl-constino
> [2] https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
> [3] https://github.com/amir73il/xfstests/commits/overlayfs-devel
>
> Amir Goldstein (11):
> ovl: store path type in dentry
> ovl: cram opaque boolean into type flags
> ovl: check if all layers are on the same fs
> ovl: store file handle of lower inode on copy up
> ovl: lookup redirect by file handle
> ovl: lookup non-dir inode copy up origin
> ovl: set the COPYUP type flag for non-dirs
> ovl: redirect non-dir by path on rename
> ovl: constant st_ino/st_dev across copy up
> ovl: persistent inode number for directories
> ovl: fix du --one-file-system on overlay mount
>
> fs/overlayfs/copy_up.c | 98 +++++++++++++++++++++
> fs/overlayfs/dir.c | 28 +++++-
> fs/overlayfs/inode.c | 21 ++++-
> fs/overlayfs/namei.c | 216 +++++++++++++++++++++++++++++++++++++++++------
> fs/overlayfs/overlayfs.h | 23 +++++
> fs/overlayfs/ovl_entry.h | 9 +-
> fs/overlayfs/super.c | 21 +++++
> fs/overlayfs/util.c | 83 ++++++++++++++++--
> 8 files changed, 461 insertions(+), 38 deletions(-)
>
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (12 preceding siblings ...)
2017-04-24 18:51 ` [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
@ 2017-04-25 11:52 ` Vivek Goyal
2017-04-25 12:05 ` Amir Goldstein
2017-04-25 12:16 ` Vivek Goyal
14 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 11:52 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> Miklos,
>
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
>
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.
>
> For the case of *all* layers on same fs, overlayfs also gains:
> - Persistent inode numbers for directories
> - Correct results for du -x
>
> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
> this series. It will be addressed for the 'samefs' configuration by the
> follow up 'stable inode' work, which is also going to address preserving
> hardlinks on copy up.
Hi Amir,
We need to update Documentation/filesystems/overlayfs.txt as well to
reflect new semantics of reporting st_dev and st_ino?
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 11:52 ` Vivek Goyal
@ 2017-04-25 12:05 ` Amir Goldstein
0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 12:05 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 2:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> Miklos,
>>
>> Following your comments on the 'stable inodes' series from last week,
>> this series fixes constant inode numbers for stat(2) with any layer
>> configuration.
>>
>> For the case of all *lower* layers on same fs that supports NFS export,
>> redirect by file handle will be used to optimize the lookup of the copy
>> up origin of non-dir inode.
>>
>> For the case of *all* layers on same fs, overlayfs also gains:
>> - Persistent inode numbers for directories
>> - Correct results for du -x
>>
>> Consistcy of stat(2) st_ino with readdir(3) d_ino is NOT addressed by
>> this series. It will be addressed for the 'samefs' configuration by the
>> follow up 'stable inode' work, which is also going to address preserving
>> hardlinks on copy up.
>
> Hi Amir,
>
> We need to update Documentation/filesystems/overlayfs.txt as well to
> reflect new semantics of reporting st_dev and st_ino?
>
Of course need to! I'll do that.
Thanks!
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-24 9:14 [PATCH v2 00/11] overlayfs constant inode numbers Amir Goldstein
` (13 preceding siblings ...)
2017-04-25 11:52 ` Vivek Goyal
@ 2017-04-25 12:16 ` Vivek Goyal
2017-04-25 12:41 ` Amir Goldstein
14 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 12:16 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> Miklos,
>
> Following your comments on the 'stable inodes' series from last week,
> this series fixes constant inode numbers for stat(2) with any layer
> configuration.
>
> For the case of all *lower* layers on same fs that supports NFS export,
> redirect by file handle will be used to optimize the lookup of the copy
> up origin of non-dir inode.
I was trying to run unionmount-testsuite (original from dhowells) and I
disabled layer check. Looks like empty directory rename test fails.
***
*** ./run --ov --ts=0 rename-empty-dir
***
TEST rename-empty-dir.py:10: Rename empty dir and rename back
./run --rename /mnt/a/empty100 /mnt/a/no_dir100
/mnt/a/empty100: Unexpected error: Invalid cross-device link
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 12:16 ` Vivek Goyal
@ 2017-04-25 12:41 ` Amir Goldstein
2017-04-25 12:52 ` Vivek Goyal
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 12:41 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> Miklos,
>>
>> Following your comments on the 'stable inodes' series from last week,
>> this series fixes constant inode numbers for stat(2) with any layer
>> configuration.
>>
>> For the case of all *lower* layers on same fs that supports NFS export,
>> redirect by file handle will be used to optimize the lookup of the copy
>> up origin of non-dir inode.
>
> I was trying to run unionmount-testsuite (original from dhowells) and I
> disabled layer check. Looks like empty directory rename test fails.
>
> ***
> *** ./run --ov --ts=0 rename-empty-dir
> ***
> TEST rename-empty-dir.py:10: Rename empty dir and rename back
> ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> /mnt/a/empty100: Unexpected error: Invalid cross-device link
>
Strange... I can't find code in recent times when this used to work
It certainly doesn't look like it should work with kernel v4.10
and redirect_dir=off.
I couldn't the point of regression by looking at the change log.
You'd need to bisect to find the regression patch.
Are you not compiling kernel with redirect_dir?
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
I guess not. If you do compile or mount with -o redirect_dir=on,
you will need some minimal patches to unionmount-testsuite
that set the expectations correctly for directory rename.
The last stable branch I have from testing v4.10 is this:
https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
But you may as well take my most recent branch for testing const ino:
https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 12:41 ` Amir Goldstein
@ 2017-04-25 12:52 ` Vivek Goyal
2017-04-25 13:23 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 12:52 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> Miklos,
> >>
> >> Following your comments on the 'stable inodes' series from last week,
> >> this series fixes constant inode numbers for stat(2) with any layer
> >> configuration.
> >>
> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> redirect by file handle will be used to optimize the lookup of the copy
> >> up origin of non-dir inode.
> >
> > I was trying to run unionmount-testsuite (original from dhowells) and I
> > disabled layer check. Looks like empty directory rename test fails.
> >
> > ***
> > *** ./run --ov --ts=0 rename-empty-dir
> > ***
> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >
>
> Strange... I can't find code in recent times when this used to work
> It certainly doesn't look like it should work with kernel v4.10
> and redirect_dir=off.
> I couldn't the point of regression by looking at the change log.
> You'd need to bisect to find the regression patch.
>
> Are you not compiling kernel with redirect_dir?
> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
I noticed that I am running with REDIRECT_DIR=n.
I also re-ran the tests without your patches and test is still broken. So
it is not due to your current patch series.
It has been long time since I ran these tests. I suspect that we might
have changed this behavior during redirect directory patches.
So question is, is this a regression or expected behavior. That is with
REDIRECT_DIR=n, renames of empty directory will be denied too.
>
> I guess not. If you do compile or mount with -o redirect_dir=on,
> you will need some minimal patches to unionmount-testsuite
> that set the expectations correctly for directory rename.
>
> The last stable branch I have from testing v4.10 is this:
> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
>
> But you may as well take my most recent branch for testing const ino:
> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
I guess I should start using your copy of unionmount-testsuite.
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 12:52 ` Vivek Goyal
@ 2017-04-25 13:23 ` Amir Goldstein
2017-04-25 13:29 ` Vivek Goyal
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 13:23 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> Miklos,
>> >>
>> >> Following your comments on the 'stable inodes' series from last week,
>> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> configuration.
>> >>
>> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> up origin of non-dir inode.
>> >
>> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> > disabled layer check. Looks like empty directory rename test fails.
>> >
>> > ***
>> > *** ./run --ov --ts=0 rename-empty-dir
>> > ***
>> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >
>>
>> Strange... I can't find code in recent times when this used to work
>> It certainly doesn't look like it should work with kernel v4.10
>> and redirect_dir=off.
>> I couldn't the point of regression by looking at the change log.
>> You'd need to bisect to find the regression patch.
>>
>> Are you not compiling kernel with redirect_dir?
>> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>
> I noticed that I am running with REDIRECT_DIR=n.
>
> I also re-ran the tests without your patches and test is still broken. So
> it is not due to your current patch series.
>
> It has been long time since I ran these tests. I suspect that we might
> have changed this behavior during redirect directory patches.
>
> So question is, is this a regression or expected behavior. That is with
> REDIRECT_DIR=n, renames of empty directory will be denied too.
>
It must be a regression, although I can't think why anyone would care.
If one really cares about renaming lower empty directories, why not enable
REDIRECT_DIR?
>>
>> I guess not. If you do compile or mount with -o redirect_dir=on,
>> you will need some minimal patches to unionmount-testsuite
>> that set the expectations correctly for directory rename.
>>
>> The last stable branch I have from testing v4.10 is this:
>> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
>>
>> But you may as well take my most recent branch for testing const ino:
>> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
>
> I guess I should start using your copy of unionmount-testsuite.
>
> Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 13:23 ` Amir Goldstein
@ 2017-04-25 13:29 ` Vivek Goyal
2017-04-25 13:49 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 13:29 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> >> Miklos,
> >> >>
> >> >> Following your comments on the 'stable inodes' series from last week,
> >> >> this series fixes constant inode numbers for stat(2) with any layer
> >> >> configuration.
> >> >>
> >> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> >> redirect by file handle will be used to optimize the lookup of the copy
> >> >> up origin of non-dir inode.
> >> >
> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
> >> > disabled layer check. Looks like empty directory rename test fails.
> >> >
> >> > ***
> >> > *** ./run --ov --ts=0 rename-empty-dir
> >> > ***
> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> >> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> >> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >> >
> >>
> >> Strange... I can't find code in recent times when this used to work
> >> It certainly doesn't look like it should work with kernel v4.10
> >> and redirect_dir=off.
> >> I couldn't the point of regression by looking at the change log.
> >> You'd need to bisect to find the regression patch.
> >>
> >> Are you not compiling kernel with redirect_dir?
> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
> >
> > I noticed that I am running with REDIRECT_DIR=n.
> >
> > I also re-ran the tests without your patches and test is still broken. So
> > it is not due to your current patch series.
> >
> > It has been long time since I ran these tests. I suspect that we might
> > have changed this behavior during redirect directory patches.
> >
> > So question is, is this a regression or expected behavior. That is with
> > REDIRECT_DIR=n, renames of empty directory will be denied too.
> >
>
> It must be a regression, although I can't think why anyone would care.
> If one really cares about renaming lower empty directories, why not enable
> REDIRECT_DIR?
I will enable it now. I just had an old config and ran into this.
But this does raise the question unionmount-testsuite need to be
maintained somewhere so that it acts as a baseline to figure out if
new patches broke some existing tests.
I can go by the tree you are maintaining but currently that's broken too
with REDIRECT_DIR=n.
Vivek
>
> >>
> >> I guess not. If you do compile or mount with -o redirect_dir=on,
> >> you will need some minimal patches to unionmount-testsuite
> >> that set the expectations correctly for directory rename.
> >>
> >> The last stable branch I have from testing v4.10 is this:
> >> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
> >>
> >> But you may as well take my most recent branch for testing const ino:
> >> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
> >
> > I guess I should start using your copy of unionmount-testsuite.
> >
> > Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 13:29 ` Vivek Goyal
@ 2017-04-25 13:49 ` Amir Goldstein
2017-04-25 13:53 ` Vivek Goyal
0 siblings, 1 reply; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 13:49 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel
On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> >> Miklos,
>> >> >>
>> >> >> Following your comments on the 'stable inodes' series from last week,
>> >> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> >> configuration.
>> >> >>
>> >> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> >> up origin of non-dir inode.
>> >> >
>> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> >> > disabled layer check. Looks like empty directory rename test fails.
>> >> >
>> >> > ***
>> >> > *** ./run --ov --ts=0 rename-empty-dir
>> >> > ***
>> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> >> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> >> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >> >
>> >>
>> >> Strange... I can't find code in recent times when this used to work
>> >> It certainly doesn't look like it should work with kernel v4.10
>> >> and redirect_dir=off.
>> >> I couldn't the point of regression by looking at the change log.
>> >> You'd need to bisect to find the regression patch.
>> >>
>> >> Are you not compiling kernel with redirect_dir?
>> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>> >
>> > I noticed that I am running with REDIRECT_DIR=n.
>> >
>> > I also re-ran the tests without your patches and test is still broken. So
>> > it is not due to your current patch series.
>> >
>> > It has been long time since I ran these tests. I suspect that we might
>> > have changed this behavior during redirect directory patches.
>> >
>> > So question is, is this a regression or expected behavior. That is with
>> > REDIRECT_DIR=n, renames of empty directory will be denied too.
>> >
>>
>> It must be a regression, although I can't think why anyone would care.
>> If one really cares about renaming lower empty directories, why not enable
>> REDIRECT_DIR?
>
> I will enable it now. I just had an old config and ran into this.
>
> But this does raise the question unionmount-testsuite need to be
> maintained somewhere so that it acts as a baseline to figure out if
> new patches broke some existing tests.
>
> I can go by the tree you are maintaining but currently that's broken too
> with REDIRECT_DIR=n.
>
Right.
I have given some though about what's the best way to handle this.
Probably need a test flag --noredirect. I'll add this to my TODO...
BTW, I try to keep the branch overlayfs-devel uptodate for testing
latest features. It could be rebased, but I'll make an effort not to.
If there is a need for a more stable non-rewindable branch, let me know.
>
>>
>> >>
>> >> I guess not. If you do compile or mount with -o redirect_dir=on,
>> >> you will need some minimal patches to unionmount-testsuite
>> >> that set the expectations correctly for directory rename.
>> >>
>> >> The last stable branch I have from testing v4.10 is this:
>> >> https://github.com/amir73il/unionmount-testsuite/commits/ovl_rename_dir
>> >>
>> >> But you may as well take my most recent branch for testing const ino:
>> >> https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel
>> >
>> > I guess I should start using your copy of unionmount-testsuite.
>> >
>> > Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 13:49 ` Amir Goldstein
@ 2017-04-25 13:53 ` Vivek Goyal
2017-04-25 14:20 ` Amir Goldstein
0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2017-04-25 13:53 UTC (permalink / raw)
To: Amir Goldstein
Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel, David Howells
On Tue, Apr 25, 2017 at 04:49:00PM +0300, Amir Goldstein wrote:
> On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
> >> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
> >> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
> >> >> >> Miklos,
> >> >> >>
> >> >> >> Following your comments on the 'stable inodes' series from last week,
> >> >> >> this series fixes constant inode numbers for stat(2) with any layer
> >> >> >> configuration.
> >> >> >>
> >> >> >> For the case of all *lower* layers on same fs that supports NFS export,
> >> >> >> redirect by file handle will be used to optimize the lookup of the copy
> >> >> >> up origin of non-dir inode.
> >> >> >
> >> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
> >> >> > disabled layer check. Looks like empty directory rename test fails.
> >> >> >
> >> >> > ***
> >> >> > *** ./run --ov --ts=0 rename-empty-dir
> >> >> > ***
> >> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
> >> >> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
> >> >> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
> >> >> >
> >> >>
> >> >> Strange... I can't find code in recent times when this used to work
> >> >> It certainly doesn't look like it should work with kernel v4.10
> >> >> and redirect_dir=off.
> >> >> I couldn't the point of regression by looking at the change log.
> >> >> You'd need to bisect to find the regression patch.
> >> >>
> >> >> Are you not compiling kernel with redirect_dir?
> >> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
> >> >
> >> > I noticed that I am running with REDIRECT_DIR=n.
> >> >
> >> > I also re-ran the tests without your patches and test is still broken. So
> >> > it is not due to your current patch series.
> >> >
> >> > It has been long time since I ran these tests. I suspect that we might
> >> > have changed this behavior during redirect directory patches.
> >> >
> >> > So question is, is this a regression or expected behavior. That is with
> >> > REDIRECT_DIR=n, renames of empty directory will be denied too.
> >> >
> >>
> >> It must be a regression, although I can't think why anyone would care.
> >> If one really cares about renaming lower empty directories, why not enable
> >> REDIRECT_DIR?
> >
> > I will enable it now. I just had an old config and ran into this.
> >
> > But this does raise the question unionmount-testsuite need to be
> > maintained somewhere so that it acts as a baseline to figure out if
> > new patches broke some existing tests.
> >
> > I can go by the tree you are maintaining but currently that's broken too
> > with REDIRECT_DIR=n.
> >
>
> Right.
> I have given some though about what's the best way to handle this.
> Probably need a test flag --noredirect. I'll add this to my TODO...
>
> BTW, I try to keep the branch overlayfs-devel uptodate for testing
> latest features. It could be rebased, but I'll make an effort not to.
> If there is a need for a more stable non-rewindable branch, let me know.
I think would be good if you maintain "master" branch of your tree up
to date and hopefully that's stable so that later git pull does not talk
about conflicts. We can then use your tree for setting a baseline and
detecting regressions.
CCing Dave Howells, in case he is interested in continuing to update his
tree as overlayfs kernel development takes place.
Vivek
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v2 00/11] overlayfs constant inode numbers
2017-04-25 13:53 ` Vivek Goyal
@ 2017-04-25 14:20 ` Amir Goldstein
0 siblings, 0 replies; 69+ messages in thread
From: Amir Goldstein @ 2017-04-25 14:20 UTC (permalink / raw)
To: Vivek Goyal
Cc: Miklos Szeredi, Al Viro, linux-unionfs, linux-fsdevel, David Howells
On Tue, Apr 25, 2017 at 4:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 25, 2017 at 04:49:00PM +0300, Amir Goldstein wrote:
>> On Tue, Apr 25, 2017 at 4:29 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Apr 25, 2017 at 04:23:28PM +0300, Amir Goldstein wrote:
>> >> On Tue, Apr 25, 2017 at 3:52 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Tue, Apr 25, 2017 at 03:41:56PM +0300, Amir Goldstein wrote:
>> >> >> On Tue, Apr 25, 2017 at 3:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> >> > On Mon, Apr 24, 2017 at 12:14:05PM +0300, Amir Goldstein wrote:
>> >> >> >> Miklos,
>> >> >> >>
>> >> >> >> Following your comments on the 'stable inodes' series from last week,
>> >> >> >> this series fixes constant inode numbers for stat(2) with any layer
>> >> >> >> configuration.
>> >> >> >>
>> >> >> >> For the case of all *lower* layers on same fs that supports NFS export,
>> >> >> >> redirect by file handle will be used to optimize the lookup of the copy
>> >> >> >> up origin of non-dir inode.
>> >> >> >
>> >> >> > I was trying to run unionmount-testsuite (original from dhowells) and I
>> >> >> > disabled layer check. Looks like empty directory rename test fails.
>> >> >> >
>> >> >> > ***
>> >> >> > *** ./run --ov --ts=0 rename-empty-dir
>> >> >> > ***
>> >> >> > TEST rename-empty-dir.py:10: Rename empty dir and rename back
>> >> >> > ./run --rename /mnt/a/empty100 /mnt/a/no_dir100
>> >> >> > /mnt/a/empty100: Unexpected error: Invalid cross-device link
>> >> >> >
>> >> >>
>> >> >> Strange... I can't find code in recent times when this used to work
>> >> >> It certainly doesn't look like it should work with kernel v4.10
>> >> >> and redirect_dir=off.
>> >> >> I couldn't the point of regression by looking at the change log.
>> >> >> You'd need to bisect to find the regression patch.
>> >> >>
>> >> >> Are you not compiling kernel with redirect_dir?
>> >> >> CONFIG_OVERLAY_FS_REDIRECT_DIR=y
>> >> >
>> >> > I noticed that I am running with REDIRECT_DIR=n.
>> >> >
>> >> > I also re-ran the tests without your patches and test is still broken. So
>> >> > it is not due to your current patch series.
>> >> >
>> >> > It has been long time since I ran these tests. I suspect that we might
>> >> > have changed this behavior during redirect directory patches.
>> >> >
>> >> > So question is, is this a regression or expected behavior. That is with
>> >> > REDIRECT_DIR=n, renames of empty directory will be denied too.
>> >> >
>> >>
>> >> It must be a regression, although I can't think why anyone would care.
>> >> If one really cares about renaming lower empty directories, why not enable
>> >> REDIRECT_DIR?
>> >
>> > I will enable it now. I just had an old config and ran into this.
>> >
>> > But this does raise the question unionmount-testsuite need to be
>> > maintained somewhere so that it acts as a baseline to figure out if
>> > new patches broke some existing tests.
>> >
>> > I can go by the tree you are maintaining but currently that's broken too
>> > with REDIRECT_DIR=n.
>> >
>>
>> Right.
>> I have given some though about what's the best way to handle this.
>> Probably need a test flag --noredirect. I'll add this to my TODO...
>>
>> BTW, I try to keep the branch overlayfs-devel uptodate for testing
>> latest features. It could be rebased, but I'll make an effort not to.
>> If there is a need for a more stable non-rewindable branch, let me know.
>
> I think would be good if you maintain "master" branch of your tree up
> to date and hopefully that's stable so that later git pull does not talk
> about conflicts. We can then use your tree for setting a baseline and
> detecting regressions.
>
> CCing Dave Howells, in case he is interested in continuing to update his
> tree as overlayfs kernel development takes place.
>
OK. declaring branch master on my tree 'ff-only':
https://github.com/amir73il/unionmount-testsuite/tree/master
Last commit is set to:
060af33 run --ov --samefs uses lower/upper on same fs
This commit contains instructions also how to setup unionmount-testsuite
on non tmpfs, which is very useful for being in touch with reality.
It is recommended to test at least with the following flag combinations:
./run --ov # tmpfs not same for lower/upper
./run --ov=0 # same as above with cycle mount after mkdir/rename
./run --ov --samefs # tmpfs or configured base fs, same for lower and upper
./run --ov=0 --samefs # same as above with cycle mount after mkdir/rename
Mind you that testing constant inode work still requires branch overlayfs-devel
with the fix to check_layers() and more goodies.
Amir.
^ permalink raw reply [flat|nested] 69+ messages in thread