* [PATCH 0/3] overlayfs: allow moving directory trees @ 2016-10-25 7:34 Miklos Szeredi 2016-10-25 7:34 ` Miklos Szeredi ` (4 more replies) 0 siblings, 5 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-25 7:34 UTC (permalink / raw) To: linux-unionfs; +Cc: Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel This allows overlayfs to move directory trees (residing on lower layer) without having to recursively copy up the whole tree first. This series is available in git at: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git#redirect And is on top of the overlayfs-next branch. --- Miklos Szeredi (3): ovl: check fs features vfs: export vfs_path_lookup() ovl: redirect on rename-dir Documentation/filesystems/overlayfs.txt | 33 ++++++++++- fs/internal.h | 2 - fs/overlayfs/copy_up.c | 20 ++----- fs/overlayfs/dir.c | 86 +++++++++++++++++++--------- fs/overlayfs/namei.c | 99 ++++++++++++++++++++++++++++++--- fs/overlayfs/overlayfs.h | 5 ++ fs/overlayfs/ovl_entry.h | 4 ++ fs/overlayfs/super.c | 56 +++++++++++++++++-- fs/overlayfs/util.c | 19 +++++++ include/linux/namei.h | 2 + 10 files changed, 268 insertions(+), 58 deletions(-) -- 2.5.5 ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 1/3] ovl: check fs features 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi @ 2016-10-25 7:34 ` Miklos Szeredi 2016-10-25 7:34 ` [PATCH 2/3] vfs: export vfs_path_lookup() Miklos Szeredi ` (3 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-25 7:34 UTC (permalink / raw) To: linux-unionfs Cc: Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel, stable To allow adding new, backward incompatible features to overlayfs, we need a way to store the list of features in the overlay. This is done via "trusted.overlay.features" xattr on the root of the upper layer (or one of the lower layers, that previously acted as an upper layer). It's a comma separated list of case sensitive strings. If an overlay has an unknown feature, mount shall return an error. So mechanism should only be used for backward incompatible features. This patch doesn't add any features. If the "trusted.overlay.features" xattr contains a non-empty list, then return EINVAL error for the mount. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> --- Documentation/filesystems/overlayfs.txt | 12 ++++++++++ fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/super.c | 41 +++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+) diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 7aeb8e8d80cf..5108425157ac 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -175,6 +175,18 @@ The specified lower directories will be stacked beginning from the rightmost one and going left. In the above example lower1 will be the top, lower2 the middle and lower3 the bottom layer. +Filesystem features +------------------- + +Features are enabled via "trusted.overlay.features" xattr on the root of the +upper layer. E.g. the following command can be used to enable features "foo" +and "bar" on the overlay: + + setfattr -n "trusted.overlay.features" -v "foo,bar" /upper + mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\ +workdir=/work /merged + +If an overlay has an unknown feature, mount shall return an error. Non-standard behavior --------------------- diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index f6e4d3539a25..d61d5b9d0d91 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -19,6 +19,7 @@ enum ovl_path_type { #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay." #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque" +#define OVL_XATTR_FEATURES OVL_XATTR_PREFIX "features" #define OVL_ISUPPER_MASK 1UL diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 30263a541fd5..d6dc8d905d00 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -397,6 +397,39 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt, goto out_unlock; } +static int ovl_check_features(struct dentry *root) +{ + int res; + char *buf, *tmp, *p; + + res = vfs_getxattr(root, OVL_XATTR_FEATURES, NULL, 0); + if (res <= 0) { + if (res == -EOPNOTSUPP || res == -ENODATA) + res = 0; + return res; + } + + buf = kmalloc(res + 1, GFP_TEMPORARY); + if (!buf) + return -ENOMEM; + + res = vfs_getxattr(root, OVL_XATTR_FEATURES, buf, res); + if (res <= 0) + goto out_free; + + buf[res] = '\0'; + res = 0; + tmp = buf; + while ((p = strsep(&tmp, ",")) != NULL) { + res = -EINVAL; + pr_err("overlayfs: feature '%s' not supported\n", p); + } +out_free: + kfree(buf); + + return res; +} + static void ovl_unescape(char *s) { char *d = s; @@ -471,6 +504,10 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, if (err) goto out; + err = ovl_check_features(path->dentry); + if (err) + goto out_put; + err = vfs_statfs(path, &statfs); if (err) { pr_err("overlayfs: statfs failed on '%s'\n", name); @@ -693,6 +730,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) goto out_put_upperpath; } + err = ovl_check_features(upperpath.dentry); + if (err) + goto out_put_upperpath; + err = ovl_mount_dir(ufs->config.workdir, &workpath); if (err) goto out_put_upperpath; -- 2.5.5 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 1/3] ovl: check fs features @ 2016-10-25 7:34 ` Miklos Szeredi 0 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-25 7:34 UTC (permalink / raw) To: linux-unionfs Cc: Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel, stable To allow adding new, backward incompatible features to overlayfs, we need a way to store the list of features in the overlay. This is done via "trusted.overlay.features" xattr on the root of the upper layer (or one of the lower layers, that previously acted as an upper layer). It's a comma separated list of case sensitive strings. If an overlay has an unknown feature, mount shall return an error. So mechanism should only be used for backward incompatible features. This patch doesn't add any features. If the "trusted.overlay.features" xattr contains a non-empty list, then return EINVAL error for the mount. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> --- Documentation/filesystems/overlayfs.txt | 12 ++++++++++ fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/super.c | 41 +++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+) diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 7aeb8e8d80cf..5108425157ac 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -175,6 +175,18 @@ The specified lower directories will be stacked beginning from the rightmost one and going left. In the above example lower1 will be the top, lower2 the middle and lower3 the bottom layer. +Filesystem features +------------------- + +Features are enabled via "trusted.overlay.features" xattr on the root of the +upper layer. E.g. the following command can be used to enable features "foo" +and "bar" on the overlay: + + setfattr -n "trusted.overlay.features" -v "foo,bar" /upper + mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\ +workdir=/work /merged + +If an overlay has an unknown feature, mount shall return an error. Non-standard behavior --------------------- diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index f6e4d3539a25..d61d5b9d0d91 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -19,6 +19,7 @@ enum ovl_path_type { #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay." #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque" +#define OVL_XATTR_FEATURES OVL_XATTR_PREFIX "features" #define OVL_ISUPPER_MASK 1UL diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 30263a541fd5..d6dc8d905d00 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -397,6 +397,39 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt, goto out_unlock; } +static int ovl_check_features(struct dentry *root) +{ + int res; + char *buf, *tmp, *p; + + res = vfs_getxattr(root, OVL_XATTR_FEATURES, NULL, 0); + if (res <= 0) { + if (res == -EOPNOTSUPP || res == -ENODATA) + res = 0; + return res; + } + + buf = kmalloc(res + 1, GFP_TEMPORARY); + if (!buf) + return -ENOMEM; + + res = vfs_getxattr(root, OVL_XATTR_FEATURES, buf, res); + if (res <= 0) + goto out_free; + + buf[res] = '\0'; + res = 0; + tmp = buf; + while ((p = strsep(&tmp, ",")) != NULL) { + res = -EINVAL; + pr_err("overlayfs: feature '%s' not supported\n", p); + } +out_free: + kfree(buf); + + return res; +} + static void ovl_unescape(char *s) { char *d = s; @@ -471,6 +504,10 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, if (err) goto out; + err = ovl_check_features(path->dentry); + if (err) + goto out_put; + err = vfs_statfs(path, &statfs); if (err) { pr_err("overlayfs: statfs failed on '%s'\n", name); @@ -693,6 +730,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) goto out_put_upperpath; } + err = ovl_check_features(upperpath.dentry); + if (err) + goto out_put_upperpath; + err = ovl_mount_dir(ufs->config.workdir, &workpath); if (err) goto out_put_upperpath; -- 2.5.5 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] ovl: check fs features 2016-10-25 7:34 ` Miklos Szeredi (?) @ 2016-10-25 11:24 ` Amir Goldstein 2016-11-05 20:40 ` Amir Goldstein -1 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-25 11:24 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel, stable On Tue, Oct 25, 2016 at 10:34 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > To allow adding new, backward incompatible features to overlayfs, we need a > way to store the list of features in the overlay. This is done via > "trusted.overlay.features" xattr on the root of the upper layer (or one of > the lower layers, that previously acted as an upper layer). It's a comma > separated list of case sensitive strings. > > If an overlay has an unknown feature, mount shall return an error. So > mechanism should only be used for backward incompatible features. So maybe be explicit and call the attribute trusted.overlay.incompat_features, to allow future addition of compat and rocompat feature sets? > > This patch doesn't add any features. If the "trusted.overlay.features" > xattr contains a non-empty list, then return EINVAL error for the mount. > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> > Cc: <stable@vger.kernel.org> > --- > Documentation/filesystems/overlayfs.txt | 12 ++++++++++ > fs/overlayfs/overlayfs.h | 1 + > fs/overlayfs/super.c | 41 +++++++++++++++++++++++++++++++++ > 3 files changed, 54 insertions(+) > > diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt > index 7aeb8e8d80cf..5108425157ac 100644 > --- a/Documentation/filesystems/overlayfs.txt > +++ b/Documentation/filesystems/overlayfs.txt > @@ -175,6 +175,18 @@ The specified lower directories will be stacked beginning from the > rightmost one and going left. In the above example lower1 will be the > top, lower2 the middle and lower3 the bottom layer. > > +Filesystem features > +------------------- > + > +Features are enabled via "trusted.overlay.features" xattr on the root of the > +upper layer. E.g. the following command can be used to enable features "foo" > +and "bar" on the overlay: > + > + setfattr -n "trusted.overlay.features" -v "foo,bar" /upper > + mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\ > +workdir=/work /merged > + > +If an overlay has an unknown feature, mount shall return an error. > > Non-standard behavior > --------------------- > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > index f6e4d3539a25..d61d5b9d0d91 100644 > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -19,6 +19,7 @@ enum ovl_path_type { > > #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay." > #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque" > +#define OVL_XATTR_FEATURES OVL_XATTR_PREFIX "features" > > #define OVL_ISUPPER_MASK 1UL > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index 30263a541fd5..d6dc8d905d00 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -397,6 +397,39 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt, > goto out_unlock; > } > > +static int ovl_check_features(struct dentry *root) > +{ > + int res; > + char *buf, *tmp, *p; > + > + res = vfs_getxattr(root, OVL_XATTR_FEATURES, NULL, 0); > + if (res <= 0) { > + if (res == -EOPNOTSUPP || res == -ENODATA) > + res = 0; > + return res; > + } > + > + buf = kmalloc(res + 1, GFP_TEMPORARY); > + if (!buf) > + return -ENOMEM; > + > + res = vfs_getxattr(root, OVL_XATTR_FEATURES, buf, res); > + if (res <= 0) > + goto out_free; > + > + buf[res] = '\0'; > + res = 0; > + tmp = buf; > + while ((p = strsep(&tmp, ",")) != NULL) { > + res = -EINVAL; > + pr_err("overlayfs: feature '%s' not supported\n", p); > + } > +out_free: > + kfree(buf); > + > + return res; > +} > + > static void ovl_unescape(char *s) > { > char *d = s; > @@ -471,6 +504,10 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, > if (err) > goto out; > > + err = ovl_check_features(path->dentry); > + if (err) > + goto out_put; > + > err = vfs_statfs(path, &statfs); > if (err) { > pr_err("overlayfs: statfs failed on '%s'\n", name); > @@ -693,6 +730,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > goto out_put_upperpath; > } > > + err = ovl_check_features(upperpath.dentry); > + if (err) > + goto out_put_upperpath; > + > err = ovl_mount_dir(ufs->config.workdir, &workpath); > if (err) > goto out_put_upperpath; > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] ovl: check fs features 2016-10-25 11:24 ` Amir Goldstein @ 2016-11-05 20:40 ` Amir Goldstein 0 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-11-05 20:40 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel, stable On Tue, Oct 25, 2016 at 2:24 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Tue, Oct 25, 2016 at 10:34 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> To allow adding new, backward incompatible features to overlayfs, we need a >> way to store the list of features in the overlay. This is done via >> "trusted.overlay.features" xattr on the root of the upper layer (or one of >> the lower layers, that previously acted as an upper layer). It's a comma >> separated list of case sensitive strings. >> >> If an overlay has an unknown feature, mount shall return an error. So >> mechanism should only be used for backward incompatible features. > > So maybe be explicit and call the attribute trusted.overlay.incompat_features, > to allow future addition of compat and rocompat feature sets? > On top of the proposed features xattr, for the sake of being backward compatible with old kernels, how about creating a file 3 levels down from work dir (e.g. /work/work/a/b/c) This would cause old kernels to mount overlay read-only, which is sufficient to keep them from corrupting the redirect structure. And once again, I suggest learning from the elders fs, who have successfully gone through many on-disk format upgrades, and copy the design of the feature trio (compat/incompat/rocompat) Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 2/3] vfs: export vfs_path_lookup() 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi 2016-10-25 7:34 ` Miklos Szeredi @ 2016-10-25 7:34 ` Miklos Szeredi 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi ` (2 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-25 7:34 UTC (permalink / raw) To: linux-unionfs; +Cc: Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel Exported to modules, but currently residing in fs/internal.h due to commit 197df04c749a ("rename user_path_umountat() to user_path_mountpoint_at()"). Move back to <linux/namei.h> so that overlayfs can make use of it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/internal.h | 2 -- include/linux/namei.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index f4da3341b4a3..800319d11291 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -53,8 +53,6 @@ extern void __init chrdev_init(void); * namei.c */ extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *); -extern int vfs_path_lookup(struct dentry *, struct vfsmount *, - const char *, unsigned int, struct path *); /* * namespace.c diff --git a/include/linux/namei.h b/include/linux/namei.h index f29abda31e6d..2375bc50a9d3 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -78,6 +78,8 @@ extern struct dentry *user_path_create(int, const char __user *, struct path *, extern void done_path_create(struct path *, struct dentry *); extern struct dentry *kern_path_locked(const char *, struct path *); extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int); +extern int vfs_path_lookup(struct dentry *, struct vfsmount *, + const char *, unsigned int, struct path *); extern struct dentry *lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int); -- 2.5.5 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi 2016-10-25 7:34 ` Miklos Szeredi 2016-10-25 7:34 ` [PATCH 2/3] vfs: export vfs_path_lookup() Miklos Szeredi @ 2016-10-25 7:34 ` Miklos Szeredi 2016-10-25 11:57 ` Raphael Hertzog ` (2 more replies) 2016-10-25 20:25 ` [PATCH 0/3] overlayfs: allow moving directory trees Amir Goldstein 2016-10-26 9:34 ` [PATCH] ovl: check for emptiness of redirect dir Amir Goldstein 4 siblings, 3 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-25 7:34 UTC (permalink / raw) To: linux-unionfs; +Cc: Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel Current code returns EXDEV when a directory would need to be copied up to move. We could copy up the directory tree in this case, but there's another solution: point to old lower directory from moved upper directory. This is achieved with a "trusted.overlay.redirect" xattr storing the path relative to the root of the overlay. After such attribute has been set, the directory can be moved without further actions required. This is a backward incompatible feature, old kernels won't be able to correctly mount an overlay containing redirected directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- Documentation/filesystems/overlayfs.txt | 21 ++++++- fs/overlayfs/copy_up.c | 20 ++----- fs/overlayfs/dir.c | 86 +++++++++++++++++++--------- fs/overlayfs/namei.c | 99 ++++++++++++++++++++++++++++++--- fs/overlayfs/overlayfs.h | 4 ++ fs/overlayfs/ovl_entry.h | 4 ++ fs/overlayfs/super.c | 25 +++++---- fs/overlayfs/util.c | 19 +++++++ 8 files changed, 217 insertions(+), 61 deletions(-) diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 5108425157ac..fae48ab3b36b 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -130,6 +130,23 @@ directory. Readdir on directories that are not merged is simply handled by the underlying directory (upper or lower). +renaming directories +-------------------- + +When renaming a directory that is on the lower layer or merged (i.e. the +directory was not created on the upper layer to start with) overlayfs can +handle it in two different ways: + +1) return EXDEV error: this error is returned by rename(2) when trying to + move a file or directory across filesystem boundaries. Hence + applications are usually prepared to hande this error (mv(1) for example + recursively copies the directory tree). This is the default behavior. + +2) If the "redirect_dir" feature is enabled, then the directory will be + copied up (but not the contents). Then the "trusted.overlay.redirect" + extended attribute is set to the path of the original location from the + root of the overlay. Finally the directory is moved to the new + location. Non-directories --------------- @@ -201,8 +218,8 @@ If a file with multiple hard links is copied up, then this will "break" the link. Changes will not be propagated to other names referring to the same inode. -Directory trees are not copied up. If rename(2) is performed on a directory -which is on the lower layer or is merged, then -EXDEV will be returned. +Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged +directory will fail with EXDEV. Changes to underlying filesystems --------------------------------- diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 49f6158bb04c..0d7075208099 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -323,17 +323,11 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir, /* * Copy up a single dentry * - * Directory renames only allowed on "pure upper" (already created on - * upper filesystem, never copied up). Directories which are on lower or - * are merged may not be renamed. For these -EXDEV is returned and - * userspace has to deal with it. This means, when copying up a - * directory we can rely on it and ancestors being stable. - * - * Non-directory renames start with copy up of source if necessary. The - * actual rename will only proceed once the copy up was successful. Copy - * up uses upper parent i_mutex for exclusion. Since rename can change - * d_parent it is possible that the copy up will lock the old parent. At - * that point the file will have already been copied up anyway. + * All renames start with copy up of source if necessary. The actual + * rename will only proceed once the copy up was successful. Copy up uses + * upper parent i_mutex for exclusion. Since rename can change d_parent it + * is possible that the copy up will lock the old parent. At that point + * the file will have already been copied up anyway. */ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, struct path *lowerpath, struct kstat *stat) @@ -345,7 +339,6 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, struct path parentpath; struct dentry *lowerdentry = lowerpath->dentry; struct dentry *upperdir; - struct dentry *upperdentry; const char *link = NULL; if (WARN_ON(!workdir)) @@ -371,8 +364,7 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, pr_err("overlayfs: failed to lock workdir+upperdir\n"); goto out_unlock; } - upperdentry = ovl_dentry_upper(dentry); - if (upperdentry) { + if (ovl_dentry_upper(dentry)) { /* Raced with another copy-up? Nothing to do, then... */ err = 0; goto out_unlock; diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 2c1057d747cb..065e0211f9b6 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -757,6 +757,40 @@ static bool ovl_type_merge_or_lower(struct dentry *dentry) return OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type); } +static bool ovl_can_move(struct dentry *dentry) +{ + return ovl_redirect_dir(dentry->d_sb) || + !d_is_dir(dentry) || !ovl_type_merge_or_lower(dentry); +} + +static int ovl_set_redirect(struct dentry *dentry) +{ + char *buf; + char *path; + int err; + + if (ovl_dentry_is_redirect(dentry)) + return 0; + + buf = __getname(); + if (!buf) + return -ENOMEM; + + path = dentry_path_raw(dentry, buf, PATH_MAX); + err = PTR_ERR(path); + if (IS_ERR(path)) + goto putname; + + err = ovl_do_setxattr(ovl_dentry_upper(dentry), OVL_XATTR_REDIRECT, + path, strlen(path), 0); +putname: + __putname(buf); + if (!err) + ovl_dentry_set_redirect(dentry); + + return err; +} + static int ovl_rename(struct inode *olddir, struct dentry *old, struct inode *newdir, struct dentry *new, unsigned int flags) @@ -784,9 +818,9 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, /* Don't copy up directory trees */ err = -EXDEV; - if (is_dir && ovl_type_merge_or_lower(old)) + if (!ovl_can_move(old)) goto out; - if (!overwrite && new_is_dir && ovl_type_merge_or_lower(new)) + if (!overwrite && !ovl_can_move(new)) goto out; err = ovl_want_write(old); @@ -837,7 +871,6 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, trap = lock_rename(new_upperdir, old_upperdir); - olddentry = lookup_one_len(old->d_name.name, old_upperdir, old->d_name.len); err = PTR_ERR(olddentry); @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) goto out_dput; - if (is_dir && !old_opaque && ovl_lower_positive(new)) { - err = ovl_set_opaque(olddentry); - if (err) - goto out_dput; - ovl_dentry_set_opaque(old, true); + if (is_dir) { + if (ovl_type_merge_or_lower(old)) { + err = ovl_set_redirect(old); + if (err) + goto out_dput; + } else if (!old_opaque && ovl_lower_positive(new)) { + err = ovl_set_opaque(olddentry); + if (err) + goto out_dput; + ovl_dentry_set_opaque(old, true); + } } - if (!overwrite && - new_is_dir && !new_opaque && ovl_lower_positive(old)) { - err = ovl_set_opaque(newdentry); - if (err) - goto out_dput; - ovl_dentry_set_opaque(new, true); + if (!overwrite && new_is_dir) { + if (ovl_type_merge_or_lower(new)) { + err = ovl_set_redirect(new); + if (err) + goto out_dput; + } else if (!new_opaque && ovl_lower_positive(old)) { + err = ovl_set_opaque(newdentry); + if (err) + goto out_dput; + ovl_dentry_set_opaque(new, true); + } } - if (old_opaque || new_opaque) { - err = ovl_do_rename(old_upperdir->d_inode, olddentry, - new_upperdir->d_inode, newdentry, - flags); - } else { - /* No debug for the plain case */ - BUG_ON(flags & ~RENAME_EXCHANGE); - err = vfs_rename(old_upperdir->d_inode, olddentry, - new_upperdir->d_inode, newdentry, - NULL, flags); - } + err = ovl_do_rename(old_upperdir->d_inode, olddentry, + new_upperdir->d_inode, newdentry, + flags); if (err) goto out_dput; diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index f4057fcb0246..c7cacbb8ce09 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -8,6 +8,7 @@ */ #include <linux/fs.h> +#include <linux/mount.h> #include <linux/namei.h> #include <linux/xattr.h> #include "overlayfs.h" @@ -48,6 +49,28 @@ static bool ovl_is_opaquedir(struct dentry *dentry) return false; } +static int ovl_check_redirect(struct dentry *dentry, char **bufp) +{ + int res; + + res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, NULL, 0); + if (res > 0) { + char *buf = kzalloc(res + 1, GFP_KERNEL); + + if (!buf) + return -ENOMEM; + + res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res); + if (res < 0) + return res; + + kfree(*bufp); + *bufp = buf; + } + + return 0; +} + /* * Returns next layer in stack starting from top. * Returns -1 if this is the last layer. @@ -80,11 +103,13 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int ctr = 0; struct inode *inode = NULL; bool upperopaque = false; + bool upperredirect = false; bool stop = false; bool isdir = false; struct dentry *this; unsigned int i; int err; + char *redirect = NULL; old_cred = ovl_override_creds(dentry->d_sb); upperdir = ovl_upperdentry_dereference(poe); @@ -110,11 +135,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, isdir = true; if (poe->numlower && ovl_is_opaquedir(this)) stop = upperopaque = true; + else if (ovl_redirect_dir(dentry->d_sb)) { + err = ovl_check_redirect(this, + &redirect); + if (err) + goto out; + + if (redirect) + upperredirect = true; + } } } upperdentry = this; } + if (redirect) + poe = dentry->d_sb->s_root->d_fsdata; + if (!stop && poe->numlower) { err = -ENOMEM; stack = kcalloc(poe->numlower, sizeof(struct path), GFP_KERNEL); @@ -125,15 +162,39 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, for (i = 0; !stop && i < poe->numlower; i++) { struct path lowerpath = poe->lowerstack[i]; - this = ovl_lookup_real(lowerpath.dentry, &dentry->d_name); - err = PTR_ERR(this); - if (IS_ERR(this)) { - /* - * If it's positive, then treat ENAMETOOLONG as ENOENT. - */ - if (err == -ENAMETOOLONG && (upperdentry || ctr)) - continue; - goto out_put; + if (!redirect) { + this = ovl_lookup_real(lowerpath.dentry, &dentry->d_name); + err = PTR_ERR(this); + if (IS_ERR(this)) { + /* + * If it's positive, then treat ENAMETOOLONG as ENOENT. + */ + if (err == -ENAMETOOLONG && (upperdentry || ctr)) + continue; + goto out_put; + } + } else { + struct path thispath; + + err = vfs_path_lookup(lowerpath.dentry, lowerpath.mnt, + redirect, 0, &thispath); + + if (err) { + if (err == -ENOENT || err == -ENAMETOOLONG) + this = NULL; + } else { + this = thispath.dentry; + mntput(thispath.mnt); + if (!this->d_inode) { + dput(this); + this = NULL; + } else if (ovl_dentry_weird(this)) { + dput(this); + err = -EREMOTE; + } + } + if (err) + goto out_put; } if (!this) continue; @@ -162,6 +223,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, stack[ctr].dentry = this; stack[ctr].mnt = lowerpath.mnt; ctr++; + + if (!stop && i != poe->numlower - 1 && + d_is_dir(this) && ovl_redirect_dir(dentry->d_sb)) { + err = ovl_check_redirect(this, &redirect); + if (err) + goto out_put; + + if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { + poe = dentry->d_sb->s_root->d_fsdata; + + for (i = 0; i < poe->numlower; i++) + if (poe->lowerstack[i].mnt == lowerpath.mnt) + break; + if (WARN_ON(i == poe->numlower)) + break; + } + } } oe = ovl_alloc_entry(ctr); @@ -192,9 +270,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, revert_creds(old_cred); oe->opaque = upperopaque; + oe->redirect = upperredirect; oe->__upperdentry = upperdentry; memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr); kfree(stack); + kfree(redirect); dentry->d_fsdata = oe; d_add(dentry, inode); @@ -209,6 +289,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, out_put_upper: dput(upperdentry); out: + kfree(redirect); revert_creds(old_cred); return ERR_PTR(err); } diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index d61d5b9d0d91..9d80ce367ad8 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -20,6 +20,7 @@ enum ovl_path_type { #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay." #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque" #define OVL_XATTR_FEATURES OVL_XATTR_PREFIX "features" +#define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect" #define OVL_ISUPPER_MASK 1UL @@ -157,6 +158,9 @@ void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache); bool ovl_dentry_is_opaque(struct dentry *dentry); bool ovl_dentry_is_whiteout(struct dentry *dentry); void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque); +bool ovl_redirect_dir(struct super_block *sb); +bool ovl_dentry_is_redirect(struct dentry *dentry); +void ovl_dentry_set_redirect(struct dentry *dentry); void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry); void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper); diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h index 3b7ba59ad27e..2b22645535ff 100644 --- a/fs/overlayfs/ovl_entry.h +++ b/fs/overlayfs/ovl_entry.h @@ -26,6 +26,9 @@ struct ovl_fs { struct ovl_config config; /* creds of process who forced instantiation of super block */ const struct cred *creator_cred; + + /* Check for "redirect" on directories */ + bool redirect_dir; }; /* private information held for every overlayfs dentry */ @@ -36,6 +39,7 @@ struct ovl_entry { struct { u64 version; bool opaque; + bool redirect; }; struct rcu_head rcu; }; diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index d6dc8d905d00..fc22a8a2e0d0 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -397,7 +397,7 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt, goto out_unlock; } -static int ovl_check_features(struct dentry *root) +static int ovl_check_features(struct ovl_fs *ufs, struct dentry *root) { int res; char *buf, *tmp, *p; @@ -421,8 +421,12 @@ static int ovl_check_features(struct dentry *root) res = 0; tmp = buf; while ((p = strsep(&tmp, ",")) != NULL) { - res = -EINVAL; - pr_err("overlayfs: feature '%s' not supported\n", p); + if (strcmp(p, "redirect_dir") == 0) { + ufs->redirect_dir = true; + } else { + res = -EINVAL; + pr_err("overlayfs: feature '%s' not supported\n", p); + } } out_free: kfree(buf); @@ -494,8 +498,8 @@ static int ovl_mount_dir(const char *name, struct path *path) return err; } -static int ovl_lower_dir(const char *name, struct path *path, long *namelen, - int *stack_depth, bool *remote) +static int ovl_lower_dir(const char *name, struct path *path, + struct ovl_fs *ufs, int *stack_depth, bool *remote) { int err; struct kstatfs statfs; @@ -504,7 +508,7 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, if (err) goto out; - err = ovl_check_features(path->dentry); + err = ovl_check_features(ufs, path->dentry); if (err) goto out_put; @@ -513,7 +517,7 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, pr_err("overlayfs: statfs failed on '%s'\n", name); goto out_put; } - *namelen = max(*namelen, statfs.f_namelen); + ufs->lower_namelen = max(ufs->lower_namelen, statfs.f_namelen); *stack_depth = max(*stack_depth, path->mnt->mnt_sb->s_stack_depth); if (ovl_dentry_remote(path->dentry)) @@ -730,7 +734,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) goto out_put_upperpath; } - err = ovl_check_features(upperpath.dentry); + err = ovl_check_features(ufs, upperpath.dentry); if (err) goto out_put_upperpath; @@ -771,9 +775,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) lower = lowertmp; for (numlower = 0; numlower < stacklen; numlower++) { - err = ovl_lower_dir(lower, &stack[numlower], - &ufs->lower_namelen, &sb->s_stack_depth, - &remote); + err = ovl_lower_dir(lower, &stack[numlower], ufs, + &sb->s_stack_depth, &remote); if (err) goto out_put_lowerpath; diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 0d45a84468d2..06dae4cabd53 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -176,6 +176,25 @@ void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque) oe->opaque = opaque; } +bool ovl_redirect_dir(struct super_block *sb) +{ + struct ovl_fs *ofs = sb->s_fs_info; + + return ofs->redirect_dir; +} + +bool ovl_dentry_is_redirect(struct dentry *dentry) +{ + struct ovl_entry *oe = dentry->d_fsdata; + return oe->redirect; +} + +void ovl_dentry_set_redirect(struct dentry *dentry) +{ + struct ovl_entry *oe = dentry->d_fsdata; + oe->redirect = true; +} + void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry) { struct ovl_entry *oe = dentry->d_fsdata; -- 2.5.5 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi @ 2016-10-25 11:57 ` Raphael Hertzog 2016-10-26 11:12 ` Miklos Szeredi 2016-10-25 12:49 ` Amir Goldstein 2016-10-28 16:15 ` Al Viro 2 siblings, 1 reply; 52+ messages in thread From: Raphael Hertzog @ 2016-10-25 11:57 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-unionfs, Guillem Jover, linux-fsdevel, linux-kernel Hello Miklos, thanks for your work on this patch set! On Tue, 25 Oct 2016, Miklos Szeredi wrote: > +renaming directories > +-------------------- > + > +When renaming a directory that is on the lower layer or merged (i.e. the > +directory was not created on the upper layer to start with) overlayfs can > +handle it in two different ways: > + > +1) return EXDEV error: this error is returned by rename(2) when trying to > + move a file or directory across filesystem boundaries. Hence > + applications are usually prepared to hande this error (mv(1) for example > + recursively copies the directory tree). This is the default behavior. > + > +2) If the "redirect_dir" feature is enabled, then the directory will be > + copied up (but not the contents). Then the "trusted.overlay.redirect" > + extended attribute is set to the path of the original location from the > + root of the overlay. Finally the directory is moved to the new > + location. >From this, I understand that we will have to add "redirect_dir" to the mount options for this feature to work. Is there any reason why you put this as an opt-in feature? Do you plan to make it the default in the future when it has been available for a while? Barring any regression introduced by your patch, it seems that the feature is best available by default since it allows legitimate operations to succeed that are otherwise refused. I understand that it makes it impossible to mount the overlay filesystem with an older kernel but is that problem more widespread than the one we're fixing here? On my side, overlayfs is only used in scenarios where the kernel is always the same (or newer compared to what created the initial filesystem). Cheers, -- Raphaël Hertzog ◈ Debian Developer Support Debian LTS: http://www.freexian.com/services/debian-lts.html Learn to master Debian: http://debian-handbook.info/get/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 11:57 ` Raphael Hertzog @ 2016-10-26 11:12 ` Miklos Szeredi 2016-10-28 12:56 ` Raphael Hertzog 2016-11-06 19:14 ` Konstantin Khlebnikov 0 siblings, 2 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-26 11:12 UTC (permalink / raw) To: Raphael Hertzog Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, linux-kernel On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: > Do you plan to make it the default in the future when it has been > available for a while? > > Barring any regression introduced by your patch, it seems that the feature > is best available by default since it allows legitimate operations to > succeed that are otherwise refused. I understand that it makes it > impossible to mount the overlay filesystem with an older kernel but is > that problem more widespread than the one we're fixing here? On my side, > overlayfs is only used in scenarios where the kernel is always the same > (or newer compared to what created the initial filesystem). I think it would be safe to make it the default if upperdir is empty. Nonempty implies that it was created with old kernel (or it was crafted by hand). But there should be a way to explicitly turn it off; either because of the need for backward compatibility or because the old format is simply easier to work with for humans. How about: - If upper is nonempty, then leave redirect feature alone except when mount option "-oredirect=on" is used to force enabling it. - If upper is empty, then enable redirect feature except when mount option "-oredirect=off" is used to force disabling it. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 11:12 ` Miklos Szeredi @ 2016-10-28 12:56 ` Raphael Hertzog 2016-10-28 12:59 ` Miklos Szeredi 2016-11-06 19:14 ` Konstantin Khlebnikov 1 sibling, 1 reply; 52+ messages in thread From: Raphael Hertzog @ 2016-10-28 12:56 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, linux-kernel Hi, On Wed, 26 Oct 2016, Miklos Szeredi wrote: > On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: [ redirect feature enabled by default ] > I think it would be safe to make it the default if upperdir is empty. > Nonempty implies that it was created with old kernel (or it was > crafted by hand). But there should be a way to explicitly turn it > off; either because of the need for backward compatibility or because > the old format is simply easier to work with for humans. > > How about: > > - If upper is nonempty, then leave redirect feature alone except when > mount option "-oredirect=on" is used to force enabling it. > - If upper is empty, then enable redirect feature except when mount > option "-oredirect=off" is used to force disabling it. Looks good, but is there some stickyness of the setting? My use case is schroot and it always starts with an empty upperdir when I create a temporary/throw-away chroot but if I reboot the machine, then it's possible that schroot will remount the overlayfs with a non-empty upperdir (the temporary chroot is preserved until I explicitly end the chroot). Cheers, -- Raphaël Hertzog ◈ Debian Developer Support Debian LTS: http://www.freexian.com/services/debian-lts.html Learn to master Debian: http://debian-handbook.info/get/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-28 12:56 ` Raphael Hertzog @ 2016-10-28 12:59 ` Miklos Szeredi 0 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-28 12:59 UTC (permalink / raw) To: Raphael Hertzog Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, lkml On Fri, Oct 28, 2016 at 2:56 PM, Raphael Hertzog <hertzog@debian.org> wrote: > Hi, > > On Wed, 26 Oct 2016, Miklos Szeredi wrote: >> On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: > > [ redirect feature enabled by default ] > >> I think it would be safe to make it the default if upperdir is empty. >> Nonempty implies that it was created with old kernel (or it was >> crafted by hand). But there should be a way to explicitly turn it >> off; either because of the need for backward compatibility or because >> the old format is simply easier to work with for humans. >> >> How about: >> >> - If upper is nonempty, then leave redirect feature alone except when >> mount option "-oredirect=on" is used to force enabling it. >> - If upper is empty, then enable redirect feature except when mount >> option "-oredirect=off" is used to force disabling it. > > Looks good, but is there some stickyness of the setting? My use case is > schroot and it always starts with an empty upperdir when I create a > temporary/throw-away chroot but if I reboot the machine, then it's possible > that schroot will remount the overlayfs with a non-empty upperdir > (the temporary chroot is preserved until I explicitly end the chroot). Yes, there is stickiness (it stores the "redirect_dir" feature in an extended attribute on upperdir). Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 11:12 ` Miklos Szeredi 2016-10-28 12:56 ` Raphael Hertzog @ 2016-11-06 19:14 ` Konstantin Khlebnikov 2016-11-07 8:07 ` Miklos Szeredi 2016-11-07 11:03 ` Raphael Hertzog 1 sibling, 2 replies; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-06 19:14 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Wed, Oct 26, 2016 at 2:12 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: > >> Do you plan to make it the default in the future when it has been >> available for a while? >> >> Barring any regression introduced by your patch, it seems that the feature >> is best available by default since it allows legitimate operations to >> succeed that are otherwise refused. I understand that it makes it >> impossible to mount the overlay filesystem with an older kernel but is >> that problem more widespread than the one we're fixing here? On my side, >> overlayfs is only used in scenarios where the kernel is always the same >> (or newer compared to what created the initial filesystem). > > I think it would be safe to make it the default if upperdir is empty. > Nonempty implies that it was created with old kernel (or it was > crafted by hand). But there should be a way to explicitly turn it > off; either because of the need for backward compatibility or because > the old format is simply easier to work with for humans. > > How about: > > - If upper is nonempty, then leave redirect feature alone except when > mount option "-oredirect=on" is used to force enabling it. > - If upper is empty, then enable redirect feature except when mount > option "-oredirect=off" is used to force disabling it. I don't like this empty-nonempty upper logic. I think this feature should be off by default and be enabled explicitly in mount option. Available features could be listed in sysfs /sys/fs/overlay/..., like ext4 does. Overlayfs mounting anyway is complicated operation. User must know a lot about it and provide persistent state for each mount: list layers in correct order, work and uppder directory on the same disk, etc. Enabled features is a part of this state. Probably this could be solved in userspace tool "mount.overlay" - it could load features and layers from config file or xattr and set required mount options automatically. -- Konstantin ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-06 19:14 ` Konstantin Khlebnikov @ 2016-11-07 8:07 ` Miklos Szeredi 2016-11-07 9:58 ` Konstantin Khlebnikov 2016-11-07 11:03 ` Raphael Hertzog 1 sibling, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-07 8:07 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Sun, Nov 6, 2016 at 8:14 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > On Wed, Oct 26, 2016 at 2:12 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: >> >>> Do you plan to make it the default in the future when it has been >>> available for a while? >>> >>> Barring any regression introduced by your patch, it seems that the feature >>> is best available by default since it allows legitimate operations to >>> succeed that are otherwise refused. I understand that it makes it >>> impossible to mount the overlay filesystem with an older kernel but is >>> that problem more widespread than the one we're fixing here? On my side, >>> overlayfs is only used in scenarios where the kernel is always the same >>> (or newer compared to what created the initial filesystem). >> >> I think it would be safe to make it the default if upperdir is empty. >> Nonempty implies that it was created with old kernel (or it was >> crafted by hand). But there should be a way to explicitly turn it >> off; either because of the need for backward compatibility or because >> the old format is simply easier to work with for humans. >> >> How about: >> >> - If upper is nonempty, then leave redirect feature alone except when >> mount option "-oredirect=on" is used to force enabling it. >> - If upper is empty, then enable redirect feature except when mount >> option "-oredirect=off" is used to force disabling it. > > I don't like this empty-nonempty upper logic. > > I think this feature should be off by default and be enabled > explicitly in mount option. > Available features could be listed in sysfs /sys/fs/overlay/..., like ext4 does. > > Overlayfs mounting anyway is complicated operation. > User must know a lot about it and provide persistent state for each mount: > list layers in correct order, work and uppder directory on the same disk, etc. > Enabled features is a part of this state. > > Probably this could be solved in userspace tool "mount.overlay" - it could load > features and layers from config file or xattr and set required mount > options automatically. It seems there are some conflicting opinions here. I have mine too, but I'll let this simmer and concentrate on actual features for now. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 8:07 ` Miklos Szeredi @ 2016-11-07 9:58 ` Konstantin Khlebnikov 2016-11-07 10:04 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-07 9:58 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 11:07 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Sun, Nov 6, 2016 at 8:14 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >> On Wed, Oct 26, 2016 at 2:12 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Tue, Oct 25, 2016 at 1:57 PM, Raphael Hertzog <hertzog@debian.org> wrote: >>> >>>> Do you plan to make it the default in the future when it has been >>>> available for a while? >>>> >>>> Barring any regression introduced by your patch, it seems that the feature >>>> is best available by default since it allows legitimate operations to >>>> succeed that are otherwise refused. I understand that it makes it >>>> impossible to mount the overlay filesystem with an older kernel but is >>>> that problem more widespread than the one we're fixing here? On my side, >>>> overlayfs is only used in scenarios where the kernel is always the same >>>> (or newer compared to what created the initial filesystem). >>> >>> I think it would be safe to make it the default if upperdir is empty. >>> Nonempty implies that it was created with old kernel (or it was >>> crafted by hand). But there should be a way to explicitly turn it >>> off; either because of the need for backward compatibility or because >>> the old format is simply easier to work with for humans. >>> >>> How about: >>> >>> - If upper is nonempty, then leave redirect feature alone except when >>> mount option "-oredirect=on" is used to force enabling it. >>> - If upper is empty, then enable redirect feature except when mount >>> option "-oredirect=off" is used to force disabling it. >> >> I don't like this empty-nonempty upper logic. >> >> I think this feature should be off by default and be enabled >> explicitly in mount option. >> Available features could be listed in sysfs /sys/fs/overlay/..., like ext4 does. >> >> Overlayfs mounting anyway is complicated operation. >> User must know a lot about it and provide persistent state for each mount: >> list layers in correct order, work and uppder directory on the same disk, etc. >> Enabled features is a part of this state. >> >> Probably this could be solved in userspace tool "mount.overlay" - it could load >> features and layers from config file or xattr and set required mount >> options automatically. > > It seems there are some conflicting opinions here. I have mine too, > but I'll let this simmer and concentrate on actual features for now. Ok =) let's try keep as simple as possible. I've stumbled on somehow related problem - concurrent copy-ups are strictly serialized by rename locks. Obviously, file copying could be done in parallel: locks are required only for final rename. Because of that overlay slower that aufs for some workloads. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 9:58 ` Konstantin Khlebnikov @ 2016-11-07 10:04 ` Miklos Szeredi 2016-11-07 10:08 ` Konstantin Khlebnikov 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-07 10:04 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > I've stumbled on somehow related problem - concurrent copy-ups are > strictly serialized by rename locks. > Obviously, file copying could be done in parallel: locks are required > only for final rename. > Because of that overlay slower that aufs for some workloads. Easy to fix: for each copy up create a separate subdir of "work". Then the contention is only for the time of creating the subdir, which is very short. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 10:04 ` Miklos Szeredi @ 2016-11-07 10:08 ` Konstantin Khlebnikov 2016-11-07 13:38 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-07 10:08 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > >> I've stumbled on somehow related problem - concurrent copy-ups are >> strictly serialized by rename locks. >> Obviously, file copying could be done in parallel: locks are required >> only for final rename. >> Because of that overlay slower that aufs for some workloads. > > Easy to fix: for each copy up create a separate subdir of "work". > Then the contention is only for the time of creating the subdir, which > is very short. Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) I think proper synchronization for concurrent copy-up (for example round flag on ovl_entry) and locking rename only for rename could be better. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 10:08 ` Konstantin Khlebnikov @ 2016-11-07 13:38 ` Amir Goldstein 2016-11-10 22:56 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-07 13:38 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Miklos Szeredi, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >> >>> I've stumbled on somehow related problem - concurrent copy-ups are >>> strictly serialized by rename locks. >>> Obviously, file copying could be done in parallel: locks are required >>> only for final rename. >>> Because of that overlay slower that aufs for some workloads. >> >> Easy to fix: for each copy up create a separate subdir of "work". >> Then the contention is only for the time of creating the subdir, which >> is very short. > > Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) > I think proper synchronization for concurrent copy-up (for example > round flag on ovl_entry) and locking rename only for rename could be > better. Removing s_vfs_rename_mutex from copy-up path is something I have been pondering about. Assuming that I understand Al's comment above vfs_rename() correctly, the sole purpose of per-sb serialization is to prevent loop creations. However, how can one create a loop by moving a non-directory? So it looks like at least for the non-dir copy up case, a much finer grained lock is in order. Anyway, it's on my todo list, as concurrent operation performance on overlayfs is important to out use case. Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 13:38 ` Amir Goldstein @ 2016-11-10 22:56 ` Amir Goldstein 2016-11-11 9:46 ` Konstantin Khlebnikov 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-10 22:56 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Miklos Szeredi, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 3:38 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >> On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>> >>>> I've stumbled on somehow related problem - concurrent copy-ups are >>>> strictly serialized by rename locks. >>>> Obviously, file copying could be done in parallel: locks are required >>>> only for final rename. >>>> Because of that overlay slower that aufs for some workloads. >>> >>> Easy to fix: for each copy up create a separate subdir of "work". >>> Then the contention is only for the time of creating the subdir, which >>> is very short. >> >> Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) >> I think proper synchronization for concurrent copy-up (for example >> round flag on ovl_entry) and locking rename only for rename could be >> better. > > Removing s_vfs_rename_mutex from copy-up path is something I have been > pondering about. > Assuming that I understand Al's comment above vfs_rename() correctly, > the sole purpose of per-sb serialization is to prevent loop creations. > However, how can one create a loop by moving a non-directory? > So it looks like at least for the non-dir copy up case, a much finer grained > lock is in order. > I posted patches to relax the s_vfs_rename_mutex for copy-up and whiteout in some use cases. Konstantin, It would be useful to know if those patches help with your use case. Thanks, Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-10 22:56 ` Amir Goldstein @ 2016-11-11 9:46 ` Konstantin Khlebnikov 2016-11-11 10:06 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-11 9:46 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 1:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Nov 7, 2016 at 3:38 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>> On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>>> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>> >>>>> I've stumbled on somehow related problem - concurrent copy-ups are >>>>> strictly serialized by rename locks. >>>>> Obviously, file copying could be done in parallel: locks are required >>>>> only for final rename. >>>>> Because of that overlay slower that aufs for some workloads. >>>> >>>> Easy to fix: for each copy up create a separate subdir of "work". >>>> Then the contention is only for the time of creating the subdir, which >>>> is very short. >>> >>> Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) >>> I think proper synchronization for concurrent copy-up (for example >>> round flag on ovl_entry) and locking rename only for rename could be >>> better. >> >> Removing s_vfs_rename_mutex from copy-up path is something I have been >> pondering about. >> Assuming that I understand Al's comment above vfs_rename() correctly, >> the sole purpose of per-sb serialization is to prevent loop creations. >> However, how can one create a loop by moving a non-directory? >> So it looks like at least for the non-dir copy up case, a much finer grained >> lock is in order. >> > > > I posted patches to relax the s_vfs_rename_mutex for copy-up and > whiteout in some use cases. > > Konstantin, > > It would be useful to know if those patches help with your use case. > Well.. I think relaxing only s_vfs_rename_mutex wouldn't help much here. Copying is still serialized by i_mutex on workdir? Data copying should be done without rename locks at all. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-11 9:46 ` Konstantin Khlebnikov @ 2016-11-11 10:06 ` Miklos Szeredi 2016-11-11 12:42 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-11 10:06 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Amir Goldstein, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 10:46 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > On Fri, Nov 11, 2016 at 1:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Mon, Nov 7, 2016 at 3:38 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>> On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>> On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>>> >>>>>> I've stumbled on somehow related problem - concurrent copy-ups are >>>>>> strictly serialized by rename locks. >>>>>> Obviously, file copying could be done in parallel: locks are required >>>>>> only for final rename. >>>>>> Because of that overlay slower that aufs for some workloads. >>>>> >>>>> Easy to fix: for each copy up create a separate subdir of "work". >>>>> Then the contention is only for the time of creating the subdir, which >>>>> is very short. >>>> >>>> Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) >>>> I think proper synchronization for concurrent copy-up (for example >>>> round flag on ovl_entry) and locking rename only for rename could be >>>> better. >>> >>> Removing s_vfs_rename_mutex from copy-up path is something I have been >>> pondering about. >>> Assuming that I understand Al's comment above vfs_rename() correctly, >>> the sole purpose of per-sb serialization is to prevent loop creations. >>> However, how can one create a loop by moving a non-directory? >>> So it looks like at least for the non-dir copy up case, a much finer grained >>> lock is in order. >>> >> >> >> I posted patches to relax the s_vfs_rename_mutex for copy-up and >> whiteout in some use cases. >> >> Konstantin, >> >> It would be useful to know if those patches help with your use case. >> > > Well.. I think relaxing only s_vfs_rename_mutex wouldn't help much here. > Copying is still serialized by i_mutex on workdir? > Data copying should be done without rename locks at all. We do need something to prevent multiple copy-ups starting up in parallel on the same file, though. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-11 10:06 ` Miklos Szeredi @ 2016-11-11 12:42 ` Amir Goldstein 2016-11-13 9:11 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-11 12:42 UTC (permalink / raw) To: Miklos Szeredi Cc: Konstantin Khlebnikov, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 12:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Nov 11, 2016 at 10:46 AM, Konstantin Khlebnikov > <koct9i@gmail.com> wrote: >> On Fri, Nov 11, 2016 at 1:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: >>> On Mon, Nov 7, 2016 at 3:38 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>>> On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>>> On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>>> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>>>> >>>>>>> I've stumbled on somehow related problem - concurrent copy-ups are >>>>>>> strictly serialized by rename locks. >>>>>>> Obviously, file copying could be done in parallel: locks are required >>>>>>> only for final rename. >>>>>>> Because of that overlay slower that aufs for some workloads. >>>>>> >>>>>> Easy to fix: for each copy up create a separate subdir of "work". >>>>>> Then the contention is only for the time of creating the subdir, which >>>>>> is very short. >>>>> >>>>> Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) >>>>> I think proper synchronization for concurrent copy-up (for example >>>>> round flag on ovl_entry) and locking rename only for rename could be >>>>> better. >>>> >>>> Removing s_vfs_rename_mutex from copy-up path is something I have been >>>> pondering about. >>>> Assuming that I understand Al's comment above vfs_rename() correctly, >>>> the sole purpose of per-sb serialization is to prevent loop creations. >>>> However, how can one create a loop by moving a non-directory? >>>> So it looks like at least for the non-dir copy up case, a much finer grained >>>> lock is in order. >>>> >>> >>> >>> I posted patches to relax the s_vfs_rename_mutex for copy-up and >>> whiteout in some use cases. >>> >>> Konstantin, >>> >>> It would be useful to know if those patches help with your use case. >>> >> >> Well.. I think relaxing only s_vfs_rename_mutex wouldn't help much here. >> Copying is still serialized by i_mutex on workdir? >> Data copying should be done without rename locks at all. > > We do need something to prevent multiple copy-ups starting up in > parallel on the same file, though. > I guess an inode_lock on the copy-up victim should suffice? I will look into it as soon as I am done with profiling. So far I ran only 2 rm -rf threads on 2 different overlay mounts on the same underlying fs and s_vfs_rename_mutex was contended about ~4% of the time. In this test, copy-up is not dominant - only ~2% for the directory copy-ups, but vfs_whiteouts take 20% and the vfs_rename itself 10%, both with s_vfs_rename_mutex held. Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-11 12:42 ` Amir Goldstein @ 2016-11-13 9:11 ` Amir Goldstein 0 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-11-13 9:11 UTC (permalink / raw) To: Miklos Szeredi Cc: Konstantin Khlebnikov, Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 2:42 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Nov 11, 2016 at 12:06 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Fri, Nov 11, 2016 at 10:46 AM, Konstantin Khlebnikov >> <koct9i@gmail.com> wrote: >>> On Fri, Nov 11, 2016 at 1:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: >>>> On Mon, Nov 7, 2016 at 3:38 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>>>> On Mon, Nov 7, 2016 at 12:08 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>>>> On Mon, Nov 7, 2016 at 1:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>>>> On Mon, Nov 7, 2016 at 10:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: >>>>>>> >>>>>>>> I've stumbled on somehow related problem - concurrent copy-ups are >>>>>>>> strictly serialized by rename locks. >>>>>>>> Obviously, file copying could be done in parallel: locks are required >>>>>>>> only for final rename. >>>>>>>> Because of that overlay slower that aufs for some workloads. >>>>>>> >>>>>>> Easy to fix: for each copy up create a separate subdir of "work". >>>>>>> Then the contention is only for the time of creating the subdir, which >>>>>>> is very short. >>>>>> >>>>>> Yeah, but lock_rename() also takes per-sb s_vfs_rename_mutex (kludge by Al Viro) >>>>>> I think proper synchronization for concurrent copy-up (for example >>>>>> round flag on ovl_entry) and locking rename only for rename could be >>>>>> better. >>>>> >>>>> Removing s_vfs_rename_mutex from copy-up path is something I have been >>>>> pondering about. >>>>> Assuming that I understand Al's comment above vfs_rename() correctly, >>>>> the sole purpose of per-sb serialization is to prevent loop creations. >>>>> However, how can one create a loop by moving a non-directory? >>>>> So it looks like at least for the non-dir copy up case, a much finer grained >>>>> lock is in order. >>>>> >>>> >>>> >>>> I posted patches to relax the s_vfs_rename_mutex for copy-up and >>>> whiteout in some use cases. >>>> >>>> Konstantin, >>>> >>>> It would be useful to know if those patches help with your use case. >>>> >>> >>> Well.. I think relaxing only s_vfs_rename_mutex wouldn't help much here. >>> Copying is still serialized by i_mutex on workdir? >>> Data copying should be done without rename locks at all. >> >> We do need something to prevent multiple copy-ups starting up in >> parallel on the same file, though. >> > > I guess an inode_lock on the copy-up victim should suffice? Just to follow up on this hijacked thread. I posted patches to lock the overlay inode of copied up file and relaxed the lock_rename during data copy up. Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-06 19:14 ` Konstantin Khlebnikov 2016-11-07 8:07 ` Miklos Szeredi @ 2016-11-07 11:03 ` Raphael Hertzog 2016-11-07 11:31 ` Konstantin Khlebnikov 1 sibling, 1 reply; 52+ messages in thread From: Raphael Hertzog @ 2016-11-07 11:03 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Miklos Szeredi, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List Hello, On Sun, 06 Nov 2016, Konstantin Khlebnikov wrote: > > - If upper is nonempty, then leave redirect feature alone except when > > mount option "-oredirect=on" is used to force enabling it. > > - If upper is empty, then enable redirect feature except when mount > > option "-oredirect=off" is used to force disabling it. > > I don't like this empty-nonempty upper logic. Why? (I don't have the feeling that your subsequent paragraphs answer this question... unless "overlayfs mounting is hard, let's complicate it even more" is your answer) > I think this feature should be off by default and be enabled > explicitly in mount option. > Available features could be listed in sysfs /sys/fs/overlay/..., like ext4 does. TTBOMK in ext4, they are set at mkfs time and the default feature set comes from /etc/mke2fs.conf. There's nothing like that for overlayfs. > Overlayfs mounting anyway is complicated operation. > User must know a lot about it and provide persistent state for each mount: > list layers in correct order, work and uppder directory on the same disk, etc. > Enabled features is a part of this state. A large part of the users are not direct overlayfs users, they use application (like schroot and live-build in my case) that rely on overlayfs to offer some user-modifiable throw-away chroots or some persistency on top of a read-only image. In both cases, the upper directory start empty. I find it highly disturbing to have to modify all those applications just to get the correct semantics to rename a directory. > Probably this could be solved in userspace tool "mount.overlay" - it could load > features and layers from config file or xattr and set required mount > options automatically. I'm all for having a better API to mount overlayfs, but I don't think blocking on the "redirect=on" by default is a good way to get this. Cheers, -- Raphaël Hertzog ◈ Debian Developer Support Debian LTS: http://www.freexian.com/services/debian-lts.html Learn to master Debian: http://debian-handbook.info/get/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 11:03 ` Raphael Hertzog @ 2016-11-07 11:31 ` Konstantin Khlebnikov 2016-11-07 13:42 ` Raphael Hertzog 0 siblings, 1 reply; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-07 11:31 UTC (permalink / raw) To: Raphael Hertzog Cc: Miklos Szeredi, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 7, 2016 at 2:03 PM, Raphael Hertzog <hertzog@debian.org> wrote: > Hello, > > On Sun, 06 Nov 2016, Konstantin Khlebnikov wrote: >> > - If upper is nonempty, then leave redirect feature alone except when >> > mount option "-oredirect=on" is used to force enabling it. >> > - If upper is empty, then enable redirect feature except when mount >> > option "-oredirect=off" is used to force disabling it. >> >> I don't like this empty-nonempty upper logic. > > Why? (I don't have the feeling that your subsequent paragraphs answer this > question... unless "overlayfs mounting is hard, let's complicate it even > more" is your answer) Mixing flags from mount options, xattrs and emptiness of upper layer doesn't make it simpler. We have clear statement that options in /proc/mounts describes overlay instance, let's keep feeding state in this way. > >> I think this feature should be off by default and be enabled >> explicitly in mount option. >> Available features could be listed in sysfs /sys/fs/overlay/..., like ext4 does. > > TTBOMK in ext4, they are set at mkfs time and the default feature set > comes from /etc/mke2fs.conf. There's nothing like that for overlayfs. /etc/mke2fs.conf is used by mkfs.ext4 - state is saved in super-block inside filesystem. overlayfs have no persistent super-block. > >> Overlayfs mounting anyway is complicated operation. >> User must know a lot about it and provide persistent state for each mount: >> list layers in correct order, work and uppder directory on the same disk, etc. >> Enabled features is a part of this state. > > A large part of the users are not direct overlayfs users, they use > application (like schroot and live-build in my case) that rely on > overlayfs to offer some user-modifiable throw-away chroots or some > persistency on top of a read-only image. In both cases, the upper > directory start empty. > > I find it highly disturbing to have to modify all those applications just > to get the correct semantics to rename a directory. Other application are already aware about overlay layout and use it. We cannot enable by default new backward incompatible features. Returning -EXDEV is a completely correct semantics for rename, most applications employ broken assumptions about this syscall =) > >> Probably this could be solved in userspace tool "mount.overlay" - it could load >> features and layers from config file or xattr and set required mount >> options automatically. > > I'm all for having a better API to mount overlayfs, but I don't think > blocking on the "redirect=on" by default is a good way to get this. > > Cheers, > -- > Raphaël Hertzog ◈ Debian Developer > > Support Debian LTS: http://www.freexian.com/services/debian-lts.html > Learn to master Debian: http://debian-handbook.info/get/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 11:31 ` Konstantin Khlebnikov @ 2016-11-07 13:42 ` Raphael Hertzog 2016-11-10 22:39 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Raphael Hertzog @ 2016-11-07 13:42 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Miklos Szeredi, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, 07 Nov 2016, Konstantin Khlebnikov wrote: > > Why? (I don't have the feeling that your subsequent paragraphs answer this > > question... unless "overlayfs mounting is hard, let's complicate it even > > more" is your answer) > > Mixing flags from mount options, xattrs and emptiness of upper layer > doesn't make it simpler. It depends for whom. It does make it simpler for applications that just want overlayfs to work like other normal filesystems. > We have clear statement that options in /proc/mounts describes overlay > instance, let's keep feeding state in this way. Didn't you say above that xattrs provide flags too? > > I find it highly disturbing to have to modify all those applications just > > to get the correct semantics to rename a directory. > > Other application are already aware about overlay layout and use it. > We cannot enable by default new backward incompatible features. On the opposite, if we have to modify those applications to add a new mount option, then they will no longer work with older versions of overlayfs... so you move the complexity down to applications if they want to work with multiple kernel versions. There's no technical problem to enable a new backward incompatible feature. It's just that you don't want to do it in case the user wants to mount it again with an older kernel. So this is just policy. So what about a new mount option that defines a compatibility level? 0: initial feature set 1: with renamedir flag It would default to 1 but the user can set it to "0" to keep compatibility with older versions of overlayfs. In the future, as more backward incompatible changes are added, you add new levels and define the values of the various flags based on this setting. > Returning -EXDEV is a completely correct semantics for rename, > most applications employ broken assumptions about this syscall =) Maybe (I don't know what standards say), but then what matters is real-life. And in real-life, it's somewhat unexpected to get back -EXDEV when the rename() happens in the same directory (and has therefore no chance to cross any mount boundary). Cheers, -- Raphaël Hertzog ◈ Debian Developer Support Debian LTS: http://www.freexian.com/services/debian-lts.html Learn to master Debian: http://debian-handbook.info/get/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-07 13:42 ` Raphael Hertzog @ 2016-11-10 22:39 ` Miklos Szeredi 2016-11-11 9:41 ` Konstantin Khlebnikov 2016-11-13 10:00 ` Amir Goldstein 0 siblings, 2 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-11-10 22:39 UTC (permalink / raw) To: Raphael Hertzog Cc: Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List New version is at: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect News: - it actually should work in all cases - when rename is not cross directory, just store the new name instead of a full path, as suggested by Amir - when redirect path is too long fall back to EXDEV (the max length should probably be a module param) About turning the feature on or off. Yes, maybe the empty checking is too complicated. Going one simpler: - default to old behavior, turn on with mount option - add module option and kernel compile option to turn on the feature by default I guess distros wil simply enable this by default, since back compatibility is basically a non-issue. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-10 22:39 ` Miklos Szeredi @ 2016-11-11 9:41 ` Konstantin Khlebnikov 2016-11-13 10:00 ` Amir Goldstein 1 sibling, 0 replies; 52+ messages in thread From: Konstantin Khlebnikov @ 2016-11-11 9:41 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 1:39 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > New version is at: > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect > > News: > - it actually should work in all cases > - when rename is not cross directory, just store the new name instead > of a full path, as suggested by Amir > - when redirect path is too long fall back to EXDEV (the max length > should probably be a module param) > > About turning the feature on or off. Yes, maybe the empty checking is > too complicated. Going one simpler: > > - default to old behavior, turn on with mount option > - add module option and kernel compile option to turn on the feature by default > > I guess distros wil simply enable this by default, since back > compatibility is basically a non-issue. Looks good. I suppose module parameter exposed in /sys/module/overlay/parameters/ could be documented as recommended way for detecting presence of that feature. -- Konstantin ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-10 22:39 ` Miklos Szeredi 2016-11-11 9:41 ` Konstantin Khlebnikov @ 2016-11-13 10:00 ` Amir Goldstein 2016-11-14 16:25 ` Amir Goldstein 1 sibling, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-13 10:00 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 11, 2016 at 12:39 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > New version is at: > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect > > News: > - it actually should work in all cases > - when rename is not cross directory, just store the new name instead > of a full path, as suggested by Amir > - when redirect path is too long fall back to EXDEV (the max length > should probably be a module param) > Looks goods, except for the case of change from relative to absolute redirect of the victim dentry. IIUC, ovl_set_redirect() will return immediately because ovl_dentry_is_redirect() and will not get to setting the absolute redirect. It passed my sanity tests (including recycle test) and on top of my copy up lock changes. You can add Reviewed-by/Tested-by me to 1380846 ovl: redirect on rename-dir > About turning the feature on or off. Yes, maybe the empty checking is > too complicated. Going one simpler: > > - default to old behavior, turn on with mount option > - add module option and kernel compile option to turn on the feature by default > I don't see these changes on your branch. Are these your plans for a future version? Thanks, Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-13 10:00 ` Amir Goldstein @ 2016-11-14 16:25 ` Amir Goldstein 2016-11-16 22:00 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-14 16:25 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Sun, Nov 13, 2016 at 12:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Fri, Nov 11, 2016 at 12:39 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> New version is at: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect >> >> News: >> - it actually should work in all cases >> - when rename is not cross directory, just store the new name instead >> of a full path, as suggested by Amir >> - when redirect path is too long fall back to EXDEV (the max length >> should probably be a module param) >> > > Looks goods, except for the case of change from relative to absolute > redirect of the victim dentry. IIUC, ovl_set_redirect() will return immediately > because ovl_dentry_is_redirect() and will not get to setting the absolute > redirect. > I added some more tests to catch this problem at: https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir In the new version, upper recycling is optional and you can set a bound to the number of layers. This is needed to catch the bug because the scenario is: - Rename populated dir in same dir - Move the populated dir to another dir - Re-mount - Populated dir is empty The following test run demonstrates it: $ sudo ./run --ov=0 rename-move-dir ... TEST rename-move-dir.py:37: Rename populated dir and move into another ./run --rename /mnt/a/dir102 /mnt/a/dir102x ./run --rename /mnt/a/dir102 /mnt/a/dir102x -E ENOENT ./run --rename /mnt/a/dir102x /mnt/a/empty102/dir102 ./run --rename /mnt/a/dir102x /mnt/a/empty102/dir102 -E ENOENT ./run --open-file /mnt/a/dir102 -r -d -E ENOENT ./run --open-file /mnt/a/dir102x -r -d -E ENOENT ./run --open-file /mnt/a/empty102/dir102 -r -d /mnt/a/empty102/dir102/a: File is missing Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-14 16:25 ` Amir Goldstein @ 2016-11-16 22:00 ` Miklos Szeredi 2016-11-18 15:37 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-16 22:00 UTC (permalink / raw) To: Amir Goldstein Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 14, 2016 at 5:25 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Sun, Nov 13, 2016 at 12:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> Looks goods, except for the case of change from relative to absolute >> redirect of the victim dentry. IIUC, ovl_set_redirect() will return immediately >> because ovl_dentry_is_redirect() and will not get to setting the absolute >> redirect. >> > > I added some more tests to catch this problem at: > https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir Thanks for testing. Force pushed updated version to the usual place: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect This also has the xattr feature thing replaced with mount option, module param and kernel config option. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-16 22:00 ` Miklos Szeredi @ 2016-11-18 15:37 ` Amir Goldstein 2016-11-20 11:39 ` Amir Goldstein 2016-11-21 9:54 ` Miklos Szeredi 0 siblings, 2 replies; 52+ messages in thread From: Amir Goldstein @ 2016-11-18 15:37 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Thu, Nov 17, 2016 at 12:00 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Mon, Nov 14, 2016 at 5:25 PM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Sun, Nov 13, 2016 at 12:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: > > >> Looks goods, except for the case of change from relative to absolute > >> redirect of the victim dentry. IIUC, ovl_set_redirect() will return immediately > >> because ovl_dentry_is_redirect() and will not get to setting the absolute > >> redirect. > >> > > > > I added some more tests to catch this problem at: > > https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir > > Thanks for testing. > > Force pushed updated version to the usual place: > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect > Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig index 21ddac7..0daac51 100644 --- a/fs/overlayfs/Kconfig +++ b/fs/overlayfs/Kconfig @@ -15,7 +15,7 @@ config OVERLAY_FS_REDIRECT_DIR help If this config option is enabled then overlay filesystems will use redirects when renaming directories by default. In this case it is - still possible possible to turn off redirects globally with the + still possible to turn off redirects globally with the "redirect_dir=off" module option or on a filesystem instance basis with the "redirect_dir=off" mount option. diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index 5eaa9f9..a19fc5c 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -105,11 +105,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d, this = lookup_one_len_unlocked(name, base, namelen); if (IS_ERR(this)) { - if (PTR_ERR(this) == -ENOENT || - PTR_ERR(this) == -ENAMETOOLONG) { + err = PTR_ERR(this); + if (err == -ENOENT || err == -ENAMETOOLONG) { this = NULL; + goto out; } - goto out; + return err; } if (!this->d_inode) goto put_and_out; > This also has the xattr feature thing replaced with mount option, > module param and kernel config option. > I like the kernel config/module param/mount option for enabling/disabling the feature. But I still think that we should write the features xattr on the first redirect rename. The features xattr tell us what can be found on the layer, so we would be wise to keep it around for all sorts of backward compatibility aspect. Amir. ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-18 15:37 ` Amir Goldstein @ 2016-11-20 11:39 ` Amir Goldstein 2016-11-21 9:54 ` Miklos Szeredi 1 sibling, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-11-20 11:39 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 18, 2016 at 5:37 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Thu, Nov 17, 2016 at 12:00 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> On Mon, Nov 14, 2016 at 5:25 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> > On Sun, Nov 13, 2016 at 12:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> >> >> Looks goods, except for the case of change from relative to absolute >> >> redirect of the victim dentry. IIUC, ovl_set_redirect() will return immediately >> >> because ovl_dentry_is_redirect() and will not get to setting the absolute >> >> redirect. >> >> >> > >> > I added some more tests to catch this problem at: >> > https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir >> >> Thanks for testing. >> >> Force pushed updated version to the usual place: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect >> > > Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): > > diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig > index 21ddac7..0daac51 100644 > --- a/fs/overlayfs/Kconfig > +++ b/fs/overlayfs/Kconfig > @@ -15,7 +15,7 @@ config OVERLAY_FS_REDIRECT_DIR > help > If this config option is enabled then overlay filesystems will use > redirects when renaming directories by default. In this case it is > - still possible possible to turn off redirects globally with the > + still possible to turn off redirects globally with the > "redirect_dir=off" module option or on a filesystem instance basis > with the "redirect_dir=off" mount option. > > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c > index 5eaa9f9..a19fc5c 100644 > --- a/fs/overlayfs/namei.c > +++ b/fs/overlayfs/namei.c > @@ -105,11 +105,12 @@ static int ovl_lookup_single(struct dentry > *base, struct ovl_lookup_data *d, > > this = lookup_one_len_unlocked(name, base, namelen); > if (IS_ERR(this)) { > - if (PTR_ERR(this) == -ENOENT || > - PTR_ERR(this) == -ENAMETOOLONG) { > + err = PTR_ERR(this); > + if (err == -ENOENT || err == -ENAMETOOLONG) { > this = NULL; > + goto out; > } > - goto out; > + return err; > } > if (!this->d_inode) > goto put_and_out; > I just realized that this bug is already in overlayfs-next, so posted a patch to fix it. >> This also has the xattr feature thing replaced with mount option, >> module param and kernel config option. >> > > I like the kernel config/module param/mount option for > enabling/disabling the feature. > > But I still think that we should write the features xattr on the first > redirect rename. > The features xattr tell us what can be found on the layer, so we would > be wise to > keep it around for all sorts of backward compatibility aspect. > > Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-18 15:37 ` Amir Goldstein 2016-11-20 11:39 ` Amir Goldstein @ 2016-11-21 9:54 ` Miklos Szeredi 2016-11-21 10:13 ` Amir Goldstein 1 sibling, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-21 9:54 UTC (permalink / raw) To: Amir Goldstein Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Fri, Nov 18, 2016 at 4:37 PM, Amir Goldstein <amir73il@gmail.com> wrote: > Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): Thanks. Fixes force pushed to overlayfs-next. Also pushed the redirect patches to overlayfs-next, as they seem to have matured enough. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-21 9:54 ` Miklos Szeredi @ 2016-11-21 10:13 ` Amir Goldstein 2016-11-21 10:16 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-21 10:13 UTC (permalink / raw) To: Miklos Szeredi Cc: Raphael Hertzog, Konstantin Khlebnikov, Miklos Szeredi, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 21, 2016 at 11:54 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Nov 18, 2016 at 4:37 PM, Amir Goldstein <amir73il@gmail.com> wrote: > >> Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): > > Thanks. > > Fixes force pushed to overlayfs-next. All right. I had the (wrong) impression the next was not a rewindable branch. > > Also pushed the redirect patches to overlayfs-next, as they seem to > have matured enough. > Agreed. Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-21 10:13 ` Amir Goldstein @ 2016-11-21 10:16 ` Miklos Szeredi 2016-11-22 13:42 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-21 10:16 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, Raphael Hertzog, Konstantin Khlebnikov, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 21, 2016 at 11:13 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Nov 21, 2016 at 11:54 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Fri, Nov 18, 2016 at 4:37 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> >>> Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): >> >> Thanks. >> >> Fixes force pushed to overlayfs-next. > > All right. I had the (wrong) impression the next was not a rewindable branch. That depends on how many devel branches are broken by such an action. In case of overlayfs-next, there doesn't appear to be too much of that, so for now I feel free to mess with it. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-21 10:16 ` Miklos Szeredi @ 2016-11-22 13:42 ` Amir Goldstein 0 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-11-22 13:42 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, Raphael Hertzog, Konstantin Khlebnikov, linux-unionfs, Guillem Jover, linux-fsdevel, Linux Kernel Mailing List On Mon, Nov 21, 2016 at 12:16 PM, Miklos Szeredi <mszeredi@redhat.com> wrote: > On Mon, Nov 21, 2016 at 11:13 AM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Mon, Nov 21, 2016 at 11:54 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Fri, Nov 18, 2016 at 4:37 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>> >>>> Found one typo and one bug in error that can cause crash on dput(ERR_PTR(err)): >>> >>> Thanks. >>> >>> Fixes force pushed to overlayfs-next. >> >> All right. I had the (wrong) impression the next was not a rewindable branch. > > That depends on how many devel branches are broken by such an action. > In case of overlayfs-next, there doesn't appear to be too much of > that, so for now I feel free to mess with it. > All right. Just posted another minor fix to display redirect=xx on /proc/mounts when it is due. Feel free to squash or apply or whatever. *Strictly* FYI, here is something I have been working on, on top of redirect_dir, which I need for one of our use cases. Since it is working and passed all the sanity tests, I am letting you all know about it in case it's relevant for anyone else. Not even going to post the patches yet, just a link and an abstract. Amir. https://github.com/amir73il/linux.git #redirect_fh =============================== ovl: redirect merged dir by file handle on copy up When mounted with mount option redirect_dir=fh, every copy up of lower directory stores the lower dir file handle in redirect xattr of upper dir. After the redirect at copy up, renaming upper merged directory requires no further action. This method has some advantages over absolute path redirect: - it is more compact in stored xattr size - it is not limited by lengths of full paths - lookup redirect is more efficient for very nested directories It also has some disadvantages over absolute path redirect: - it requires setting the redirect xattr for all layers of merged dirs - it requires that all lower layers are on the same file system, which support exportfs ops - file handles will become stale if overlay lower directories where to be copied to another location Signed-off-by: Amir Goldstein <amir73il@gmail.com> ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi 2016-10-25 11:57 ` Raphael Hertzog @ 2016-10-25 12:49 ` Amir Goldstein 2016-10-26 11:26 ` Miklos Szeredi 2016-10-28 16:15 ` Al Viro 2 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-25 12:49 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Tue, Oct 25, 2016 at 10:34 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > Current code returns EXDEV when a directory would need to be copied up to > move. We could copy up the directory tree in this case, but there's > another solution: point to old lower directory from moved upper directory. > > This is achieved with a "trusted.overlay.redirect" xattr storing the path > relative to the root of the overlay. After such attribute has been set, > the directory can be moved without further actions required. > Nice! I played with a similar, but simpler idea last month: https://github.com/amir73il/linux/commit/a3907e732984fb4fd340544ca27c45cac8be8060 it only handle renames within same parent. Down the road, you may want to consider differentiating this simpler case of "name redirect", as it incurs less path lookups. An entry can always be promoted from "name redirect" to "full path redirect" > This is a backward incompatible feature, old kernels won't be able to > correctly mount an overlay containing redirected directories. > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> > --- > Documentation/filesystems/overlayfs.txt | 21 ++++++- > fs/overlayfs/copy_up.c | 20 ++----- > fs/overlayfs/dir.c | 86 +++++++++++++++++++--------- > fs/overlayfs/namei.c | 99 ++++++++++++++++++++++++++++++--- > fs/overlayfs/overlayfs.h | 4 ++ > fs/overlayfs/ovl_entry.h | 4 ++ > fs/overlayfs/super.c | 25 +++++---- > fs/overlayfs/util.c | 19 +++++++ > 8 files changed, 217 insertions(+), 61 deletions(-) > > diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt > index 5108425157ac..fae48ab3b36b 100644 > --- a/Documentation/filesystems/overlayfs.txt > +++ b/Documentation/filesystems/overlayfs.txt > @@ -130,6 +130,23 @@ directory. > Readdir on directories that are not merged is simply handled by the > underlying directory (upper or lower). > > +renaming directories > +-------------------- > + > +When renaming a directory that is on the lower layer or merged (i.e. the > +directory was not created on the upper layer to start with) overlayfs can > +handle it in two different ways: > + > +1) return EXDEV error: this error is returned by rename(2) when trying to > + move a file or directory across filesystem boundaries. Hence > + applications are usually prepared to hande this error (mv(1) for example > + recursively copies the directory tree). This is the default behavior. > + > +2) If the "redirect_dir" feature is enabled, then the directory will be > + copied up (but not the contents). Then the "trusted.overlay.redirect" > + extended attribute is set to the path of the original location from the > + root of the overlay. Finally the directory is moved to the new > + location. > > Non-directories > --------------- > @@ -201,8 +218,8 @@ If a file with multiple hard links is copied up, then this will > "break" the link. Changes will not be propagated to other names > referring to the same inode. > > -Directory trees are not copied up. If rename(2) is performed on a directory > -which is on the lower layer or is merged, then -EXDEV will be returned. > +Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged > +directory will fail with EXDEV. > > Changes to underlying filesystems > --------------------------------- > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c > index 49f6158bb04c..0d7075208099 100644 > --- a/fs/overlayfs/copy_up.c > +++ b/fs/overlayfs/copy_up.c > @@ -323,17 +323,11 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir, > /* > * Copy up a single dentry > * > - * Directory renames only allowed on "pure upper" (already created on > - * upper filesystem, never copied up). Directories which are on lower or > - * are merged may not be renamed. For these -EXDEV is returned and > - * userspace has to deal with it. This means, when copying up a > - * directory we can rely on it and ancestors being stable. > - * > - * Non-directory renames start with copy up of source if necessary. The > - * actual rename will only proceed once the copy up was successful. Copy > - * up uses upper parent i_mutex for exclusion. Since rename can change > - * d_parent it is possible that the copy up will lock the old parent. At > - * that point the file will have already been copied up anyway. > + * All renames start with copy up of source if necessary. The actual > + * rename will only proceed once the copy up was successful. Copy up uses > + * upper parent i_mutex for exclusion. Since rename can change d_parent it > + * is possible that the copy up will lock the old parent. At that point > + * the file will have already been copied up anyway. > */ > int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, > struct path *lowerpath, struct kstat *stat) > @@ -345,7 +339,6 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, > struct path parentpath; > struct dentry *lowerdentry = lowerpath->dentry; > struct dentry *upperdir; > - struct dentry *upperdentry; > const char *link = NULL; > > if (WARN_ON(!workdir)) > @@ -371,8 +364,7 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, > pr_err("overlayfs: failed to lock workdir+upperdir\n"); > goto out_unlock; > } > - upperdentry = ovl_dentry_upper(dentry); > - if (upperdentry) { > + if (ovl_dentry_upper(dentry)) { > /* Raced with another copy-up? Nothing to do, then... */ > err = 0; > goto out_unlock; > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > index 2c1057d747cb..065e0211f9b6 100644 > --- a/fs/overlayfs/dir.c > +++ b/fs/overlayfs/dir.c > @@ -757,6 +757,40 @@ static bool ovl_type_merge_or_lower(struct dentry *dentry) > return OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type); > } > > +static bool ovl_can_move(struct dentry *dentry) > +{ > + return ovl_redirect_dir(dentry->d_sb) || > + !d_is_dir(dentry) || !ovl_type_merge_or_lower(dentry); > +} > + > +static int ovl_set_redirect(struct dentry *dentry) > +{ > + char *buf; > + char *path; > + int err; > + > + if (ovl_dentry_is_redirect(dentry)) > + return 0; > + > + buf = __getname(); > + if (!buf) > + return -ENOMEM; > + > + path = dentry_path_raw(dentry, buf, PATH_MAX); > + err = PTR_ERR(path); > + if (IS_ERR(path)) > + goto putname; > + > + err = ovl_do_setxattr(ovl_dentry_upper(dentry), OVL_XATTR_REDIRECT, > + path, strlen(path), 0); > +putname: > + __putname(buf); > + if (!err) > + ovl_dentry_set_redirect(dentry); > + > + return err; > +} > + > static int ovl_rename(struct inode *olddir, struct dentry *old, > struct inode *newdir, struct dentry *new, > unsigned int flags) > @@ -784,9 +818,9 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, > > /* Don't copy up directory trees */ > err = -EXDEV; > - if (is_dir && ovl_type_merge_or_lower(old)) > + if (!ovl_can_move(old)) > goto out; > - if (!overwrite && new_is_dir && ovl_type_merge_or_lower(new)) > + if (!overwrite && !ovl_can_move(new)) > goto out; > > err = ovl_want_write(old); > @@ -837,7 +871,6 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, > > trap = lock_rename(new_upperdir, old_upperdir); > > - > olddentry = lookup_one_len(old->d_name.name, old_upperdir, > old->d_name.len); > err = PTR_ERR(olddentry); > @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, > if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) > goto out_dput; > > - if (is_dir && !old_opaque && ovl_lower_positive(new)) { > - err = ovl_set_opaque(olddentry); > - if (err) > - goto out_dput; > - ovl_dentry_set_opaque(old, true); > + if (is_dir) { > + if (ovl_type_merge_or_lower(old)) { > + err = ovl_set_redirect(old); There is a fair chance of getting ENOSPC/EDQUOT here and confuse user space. Would it be better to convert these non fatal errors with EXDEV, so user space will gracefully fallback to recursive rename/clone/copy? > + if (err) > + goto out_dput; > + } else if (!old_opaque && ovl_lower_positive(new)) { > + err = ovl_set_opaque(olddentry); > + if (err) > + goto out_dput; > + ovl_dentry_set_opaque(old, true); > + } > } > - if (!overwrite && > - new_is_dir && !new_opaque && ovl_lower_positive(old)) { > - err = ovl_set_opaque(newdentry); > - if (err) > - goto out_dput; > - ovl_dentry_set_opaque(new, true); > + if (!overwrite && new_is_dir) { > + if (ovl_type_merge_or_lower(new)) { > + err = ovl_set_redirect(new); > + if (err) > + goto out_dput; > + } else if (!new_opaque && ovl_lower_positive(old)) { > + err = ovl_set_opaque(newdentry); > + if (err) > + goto out_dput; > + ovl_dentry_set_opaque(new, true); > + } > } > > - if (old_opaque || new_opaque) { > - err = ovl_do_rename(old_upperdir->d_inode, olddentry, > - new_upperdir->d_inode, newdentry, > - flags); > - } else { > - /* No debug for the plain case */ > - BUG_ON(flags & ~RENAME_EXCHANGE); > - err = vfs_rename(old_upperdir->d_inode, olddentry, > - new_upperdir->d_inode, newdentry, > - NULL, flags); > - } > + err = ovl_do_rename(old_upperdir->d_inode, olddentry, > + new_upperdir->d_inode, newdentry, > + flags); > if (err) > goto out_dput; > > diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c > index f4057fcb0246..c7cacbb8ce09 100644 > --- a/fs/overlayfs/namei.c > +++ b/fs/overlayfs/namei.c > @@ -8,6 +8,7 @@ > */ > > #include <linux/fs.h> > +#include <linux/mount.h> > #include <linux/namei.h> > #include <linux/xattr.h> > #include "overlayfs.h" > @@ -48,6 +49,28 @@ static bool ovl_is_opaquedir(struct dentry *dentry) > return false; > } > > +static int ovl_check_redirect(struct dentry *dentry, char **bufp) > +{ > + int res; > + > + res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, NULL, 0); > + if (res > 0) { > + char *buf = kzalloc(res + 1, GFP_KERNEL); > + > + if (!buf) > + return -ENOMEM; > + > + res = vfs_getxattr(dentry, OVL_XATTR_REDIRECT, buf, res); > + if (res < 0) > + return res; > + > + kfree(*bufp); > + *bufp = buf; > + } > + > + return 0; > +} > + > /* > * Returns next layer in stack starting from top. > * Returns -1 if this is the last layer. > @@ -80,11 +103,13 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > unsigned int ctr = 0; > struct inode *inode = NULL; > bool upperopaque = false; > + bool upperredirect = false; > bool stop = false; > bool isdir = false; > struct dentry *this; > unsigned int i; > int err; > + char *redirect = NULL; > > old_cred = ovl_override_creds(dentry->d_sb); > upperdir = ovl_upperdentry_dereference(poe); > @@ -110,11 +135,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > isdir = true; > if (poe->numlower && ovl_is_opaquedir(this)) > stop = upperopaque = true; > + else if (ovl_redirect_dir(dentry->d_sb)) { > + err = ovl_check_redirect(this, > + &redirect); > + if (err) > + goto out; > + > + if (redirect) > + upperredirect = true; > + } > } > } > upperdentry = this; > } > > + if (redirect) > + poe = dentry->d_sb->s_root->d_fsdata; > + > if (!stop && poe->numlower) { > err = -ENOMEM; > stack = kcalloc(poe->numlower, sizeof(struct path), GFP_KERNEL); > @@ -125,15 +162,39 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > for (i = 0; !stop && i < poe->numlower; i++) { > struct path lowerpath = poe->lowerstack[i]; > > - this = ovl_lookup_real(lowerpath.dentry, &dentry->d_name); > - err = PTR_ERR(this); > - if (IS_ERR(this)) { > - /* > - * If it's positive, then treat ENAMETOOLONG as ENOENT. > - */ > - if (err == -ENAMETOOLONG && (upperdentry || ctr)) > - continue; > - goto out_put; > + if (!redirect) { > + this = ovl_lookup_real(lowerpath.dentry, &dentry->d_name); > + err = PTR_ERR(this); > + if (IS_ERR(this)) { > + /* > + * If it's positive, then treat ENAMETOOLONG as ENOENT. > + */ > + if (err == -ENAMETOOLONG && (upperdentry || ctr)) > + continue; > + goto out_put; > + } > + } else { > + struct path thispath; > + > + err = vfs_path_lookup(lowerpath.dentry, lowerpath.mnt, > + redirect, 0, &thispath); > + > + if (err) { > + if (err == -ENOENT || err == -ENAMETOOLONG) > + this = NULL; > + } else { > + this = thispath.dentry; > + mntput(thispath.mnt); > + if (!this->d_inode) { > + dput(this); > + this = NULL; > + } else if (ovl_dentry_weird(this)) { > + dput(this); > + err = -EREMOTE; > + } > + } > + if (err) > + goto out_put; > } > if (!this) > continue; > @@ -162,6 +223,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > stack[ctr].dentry = this; > stack[ctr].mnt = lowerpath.mnt; > ctr++; > + > + if (!stop && i != poe->numlower - 1 && > + d_is_dir(this) && ovl_redirect_dir(dentry->d_sb)) { > + err = ovl_check_redirect(this, &redirect); > + if (err) > + goto out_put; > + > + if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { > + poe = dentry->d_sb->s_root->d_fsdata; > + Now you are about to continue looping until new value of poe->numlower, which is >= then olf value of poe->numlower, but 'stack' was allocated according to old value of poe->numlower, so aren't you in danger of overflowing it? Please add a comment to explain the purpose of this loop rewind. > + for (i = 0; i < poe->numlower; i++) > + if (poe->lowerstack[i].mnt == lowerpath.mnt) > + break; > + if (WARN_ON(i == poe->numlower)) > + break; > + } > + } > } > > oe = ovl_alloc_entry(ctr); > @@ -192,9 +270,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > > revert_creds(old_cred); > oe->opaque = upperopaque; > + oe->redirect = upperredirect; > oe->__upperdentry = upperdentry; > memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr); > kfree(stack); > + kfree(redirect); > dentry->d_fsdata = oe; > d_add(dentry, inode); > > @@ -209,6 +289,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, > out_put_upper: > dput(upperdentry); > out: > + kfree(redirect); > revert_creds(old_cred); > return ERR_PTR(err); > } > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > index d61d5b9d0d91..9d80ce367ad8 100644 > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -20,6 +20,7 @@ enum ovl_path_type { > #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay." > #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque" > #define OVL_XATTR_FEATURES OVL_XATTR_PREFIX "features" > +#define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect" > > #define OVL_ISUPPER_MASK 1UL > > @@ -157,6 +158,9 @@ void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache); > bool ovl_dentry_is_opaque(struct dentry *dentry); > bool ovl_dentry_is_whiteout(struct dentry *dentry); > void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque); > +bool ovl_redirect_dir(struct super_block *sb); > +bool ovl_dentry_is_redirect(struct dentry *dentry); > +void ovl_dentry_set_redirect(struct dentry *dentry); > void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry); > void ovl_inode_init(struct inode *inode, struct inode *realinode, > bool is_upper); > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > index 3b7ba59ad27e..2b22645535ff 100644 > --- a/fs/overlayfs/ovl_entry.h > +++ b/fs/overlayfs/ovl_entry.h > @@ -26,6 +26,9 @@ struct ovl_fs { > struct ovl_config config; > /* creds of process who forced instantiation of super block */ > const struct cred *creator_cred; > + > + /* Check for "redirect" on directories */ > + bool redirect_dir; > }; > > /* private information held for every overlayfs dentry */ > @@ -36,6 +39,7 @@ struct ovl_entry { > struct { > u64 version; > bool opaque; > + bool redirect; > }; > struct rcu_head rcu; > }; > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index d6dc8d905d00..fc22a8a2e0d0 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -397,7 +397,7 @@ static struct dentry *ovl_workdir_create(struct vfsmount *mnt, > goto out_unlock; > } > > -static int ovl_check_features(struct dentry *root) > +static int ovl_check_features(struct ovl_fs *ufs, struct dentry *root) > { > int res; > char *buf, *tmp, *p; > @@ -421,8 +421,12 @@ static int ovl_check_features(struct dentry *root) > res = 0; > tmp = buf; > while ((p = strsep(&tmp, ",")) != NULL) { > - res = -EINVAL; > - pr_err("overlayfs: feature '%s' not supported\n", p); > + if (strcmp(p, "redirect_dir") == 0) { > + ufs->redirect_dir = true; > + } else { > + res = -EINVAL; > + pr_err("overlayfs: feature '%s' not supported\n", p); > + } > } > out_free: > kfree(buf); > @@ -494,8 +498,8 @@ static int ovl_mount_dir(const char *name, struct path *path) > return err; > } > > -static int ovl_lower_dir(const char *name, struct path *path, long *namelen, > - int *stack_depth, bool *remote) > +static int ovl_lower_dir(const char *name, struct path *path, > + struct ovl_fs *ufs, int *stack_depth, bool *remote) > { > int err; > struct kstatfs statfs; > @@ -504,7 +508,7 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, > if (err) > goto out; > > - err = ovl_check_features(path->dentry); > + err = ovl_check_features(ufs, path->dentry); > if (err) > goto out_put; > > @@ -513,7 +517,7 @@ static int ovl_lower_dir(const char *name, struct path *path, long *namelen, > pr_err("overlayfs: statfs failed on '%s'\n", name); > goto out_put; > } > - *namelen = max(*namelen, statfs.f_namelen); > + ufs->lower_namelen = max(ufs->lower_namelen, statfs.f_namelen); > *stack_depth = max(*stack_depth, path->mnt->mnt_sb->s_stack_depth); > > if (ovl_dentry_remote(path->dentry)) > @@ -730,7 +734,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > goto out_put_upperpath; > } > > - err = ovl_check_features(upperpath.dentry); > + err = ovl_check_features(ufs, upperpath.dentry); > if (err) > goto out_put_upperpath; > > @@ -771,9 +775,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > > lower = lowertmp; > for (numlower = 0; numlower < stacklen; numlower++) { > - err = ovl_lower_dir(lower, &stack[numlower], > - &ufs->lower_namelen, &sb->s_stack_depth, > - &remote); > + err = ovl_lower_dir(lower, &stack[numlower], ufs, > + &sb->s_stack_depth, &remote); > if (err) > goto out_put_lowerpath; > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > index 0d45a84468d2..06dae4cabd53 100644 > --- a/fs/overlayfs/util.c > +++ b/fs/overlayfs/util.c > @@ -176,6 +176,25 @@ void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque) > oe->opaque = opaque; > } > > +bool ovl_redirect_dir(struct super_block *sb) > +{ > + struct ovl_fs *ofs = sb->s_fs_info; > + > + return ofs->redirect_dir; > +} > + > +bool ovl_dentry_is_redirect(struct dentry *dentry) > +{ > + struct ovl_entry *oe = dentry->d_fsdata; > + return oe->redirect; > +} > + > +void ovl_dentry_set_redirect(struct dentry *dentry) > +{ > + struct ovl_entry *oe = dentry->d_fsdata; > + oe->redirect = true; > +} > + > void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry) > { > struct ovl_entry *oe = dentry->d_fsdata; > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 12:49 ` Amir Goldstein @ 2016-10-26 11:26 ` Miklos Szeredi 2016-10-26 12:11 ` Amir Goldstein ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-26 11:26 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Tue, Oct 25, 2016 at 2:49 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, >> if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) >> goto out_dput; >> >> - if (is_dir && !old_opaque && ovl_lower_positive(new)) { >> - err = ovl_set_opaque(olddentry); >> - if (err) >> - goto out_dput; >> - ovl_dentry_set_opaque(old, true); >> + if (is_dir) { >> + if (ovl_type_merge_or_lower(old)) { >> + err = ovl_set_redirect(old); > > There is a fair chance of getting ENOSPC/EDQUOT here and confuse user space. > Would it be better to convert these non fatal errors with EXDEV, so > user space will > gracefully fallback to recursive rename/clone/copy? Recursive copy up will surely consume more space than an xattr? >> @@ -162,6 +223,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, >> stack[ctr].dentry = this; >> stack[ctr].mnt = lowerpath.mnt; >> ctr++; >> + >> + if (!stop && i != poe->numlower - 1 && >> + d_is_dir(this) && ovl_redirect_dir(dentry->d_sb)) { >> + err = ovl_check_redirect(this, &redirect); >> + if (err) >> + goto out_put; >> + >> + if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { >> + poe = dentry->d_sb->s_root->d_fsdata; >> + > > Now you are about to continue looping until new value of poe->numlower, > which is >= then olf value of poe->numlower, but 'stack' was allocated > according to old value of poe->numlower, so aren't you in danger of > overflowing it? > > Please add a comment to explain the purpose of this loop rewind. We are jumping to a stack possibly wider than the current one and need to find the layer where to continue the downward traversal. I'll add the comment. BTW I don't remember having tested this, so it might possibly be buggy. Automatic multi-layer testing would really be good. What we basically need is: - create normal (two layer) overlay (with interesting constructs, whiteout, opaque dir, redirect) - umount - create three layer overlay where the two lower layers come from the previous upper/lower layers - do more interesting things There's one such test in xfstests but it would be good to have more. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 11:26 ` Miklos Szeredi @ 2016-10-26 12:11 ` Amir Goldstein 2016-10-26 12:51 ` Miklos Szeredi 2016-10-26 19:56 ` Amir Goldstein 2016-10-30 22:00 ` Amir Goldstein 2 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-26 12:11 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Wed, Oct 26, 2016 at 2:26 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Tue, Oct 25, 2016 at 2:49 PM, Amir Goldstein <amir73il@gmail.com> wrote: > >>> @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, >>> if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) >>> goto out_dput; >>> >>> - if (is_dir && !old_opaque && ovl_lower_positive(new)) { >>> - err = ovl_set_opaque(olddentry); >>> - if (err) >>> - goto out_dput; >>> - ovl_dentry_set_opaque(old, true); >>> + if (is_dir) { >>> + if (ovl_type_merge_or_lower(old)) { >>> + err = ovl_set_redirect(old); >> >> There is a fair chance of getting ENOSPC/EDQUOT here and confuse user space. >> Would it be better to convert these non fatal errors with EXDEV, so >> user space will >> gracefully fallback to recursive rename/clone/copy? > > Recursive copy up will surely consume more space than an xattr? Surely. But that is not what I meant. In Ext4, for example, the total size of extended attributes for an inode is limited to a single block, so you can get ENOSPC with alot of free space in the file system. You can also get ERANGE if the redirect path length > 256. I'm just saying that instead of failing the move with ERANGE or uncalled for ENOSPC, it is better for user space to get EXDEV, from which there is a sane fallback. BTW, mv (from linux-utils) falls back from -EXDEV to recursive move, which is quite efficient with clone up. > >>> @@ -162,6 +223,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, >>> stack[ctr].dentry = this; >>> stack[ctr].mnt = lowerpath.mnt; >>> ctr++; >>> + >>> + if (!stop && i != poe->numlower - 1 && >>> + d_is_dir(this) && ovl_redirect_dir(dentry->d_sb)) { >>> + err = ovl_check_redirect(this, &redirect); >>> + if (err) >>> + goto out_put; >>> + >>> + if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { >>> + poe = dentry->d_sb->s_root->d_fsdata; >>> + >> >> Now you are about to continue looping until new value of poe->numlower, >> which is >= then olf value of poe->numlower, but 'stack' was allocated >> according to old value of poe->numlower, so aren't you in danger of >> overflowing it? >> >> Please add a comment to explain the purpose of this loop rewind. > > We are jumping to a stack possibly wider than the current one and need > to find the layer where to continue the downward traversal. I'll add > the comment. > > BTW I don't remember having tested this, so it might possibly be > buggy. Automatic multi-layer testing would really be good. What we > basically need is: > > - create normal (two layer) overlay (with interesting constructs, > whiteout, opaque dir, redirect) > - umount > - create three layer overlay where the two lower layers come from the > previous upper/lower layers > - do more interesting things > > There's one such test in xfstests but it would be good to have more. > > Thanks, > Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 12:11 ` Amir Goldstein @ 2016-10-26 12:51 ` Miklos Szeredi 0 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-26 12:51 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Wed, Oct 26, 2016 at 2:11 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Wed, Oct 26, 2016 at 2:26 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Tue, Oct 25, 2016 at 2:49 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> >>>> @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, >>>> if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) >>>> goto out_dput; >>>> >>>> - if (is_dir && !old_opaque && ovl_lower_positive(new)) { >>>> - err = ovl_set_opaque(olddentry); >>>> - if (err) >>>> - goto out_dput; >>>> - ovl_dentry_set_opaque(old, true); >>>> + if (is_dir) { >>>> + if (ovl_type_merge_or_lower(old)) { >>>> + err = ovl_set_redirect(old); >>> >>> There is a fair chance of getting ENOSPC/EDQUOT here and confuse user space. >>> Would it be better to convert these non fatal errors with EXDEV, so >>> user space will >>> gracefully fallback to recursive rename/clone/copy? >> >> Recursive copy up will surely consume more space than an xattr? > > Surely. But that is not what I meant. > In Ext4, for example, the total size of extended attributes for an > inode is limited to a single block, > so you can get ENOSPC with alot of free space in the file system. Ah. > > You can also get ERANGE if the redirect path length > 256. The 256 limit is for the name of the xattr, not the value. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 11:26 ` Miklos Szeredi 2016-10-26 12:11 ` Amir Goldstein @ 2016-10-26 19:56 ` Amir Goldstein 2016-10-30 22:00 ` Amir Goldstein 2 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-10-26 19:56 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Wed, Oct 26, 2016 at 2:26 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Tue, Oct 25, 2016 at 2:49 PM, Amir Goldstein <amir73il@gmail.com> wrote: > >>> @@ -880,31 +913,34 @@ static int ovl_rename(struct inode *olddir, struct dentry *old, >>> if (WARN_ON(olddentry->d_inode == newdentry->d_inode)) >>> goto out_dput; >>> >>> - if (is_dir && !old_opaque && ovl_lower_positive(new)) { >>> - err = ovl_set_opaque(olddentry); >>> - if (err) >>> - goto out_dput; >>> - ovl_dentry_set_opaque(old, true); >>> + if (is_dir) { >>> + if (ovl_type_merge_or_lower(old)) { >>> + err = ovl_set_redirect(old); >> >> There is a fair chance of getting ENOSPC/EDQUOT here and confuse user space. >> Would it be better to convert these non fatal errors with EXDEV, so >> user space will >> gracefully fallback to recursive rename/clone/copy? > > Recursive copy up will surely consume more space than an xattr? > >>> @@ -162,6 +223,23 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, >>> stack[ctr].dentry = this; >>> stack[ctr].mnt = lowerpath.mnt; >>> ctr++; >>> + >>> + if (!stop && i != poe->numlower - 1 && >>> + d_is_dir(this) && ovl_redirect_dir(dentry->d_sb)) { >>> + err = ovl_check_redirect(this, &redirect); >>> + if (err) >>> + goto out_put; >>> + >>> + if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { >>> + poe = dentry->d_sb->s_root->d_fsdata; >>> + >> >> Now you are about to continue looping until new value of poe->numlower, >> which is >= then olf value of poe->numlower, but 'stack' was allocated >> according to old value of poe->numlower, so aren't you in danger of >> overflowing it? >> >> Please add a comment to explain the purpose of this loop rewind. > > We are jumping to a stack possibly wider than the current one and need > to find the layer where to continue the downward traversal. I'll add > the comment. > OK. my point was that you need to allocate an sb max depth stack in advance, in case you need to jump to a wider stack. > BTW I don't remember having tested this, so it might possibly be > buggy. Automatic multi-layer testing would really be good. What we > basically need is: > > - create normal (two layer) overlay (with interesting constructs, > whiteout, opaque dir, redirect) > - umount > - create three layer overlay where the two lower layers come from the > previous upper/lower layers > - do more interesting things > > There's one such test in xfstests but it would be good to have more. > I just sent 2 patches to fix 2 overlayfs xfstests failures. With these 2 changes, the entire quick test group passes on my setup (short of one test that also fails on ext4 and xfs). Now I will start to think about instrumenting generic xfstests with lower/upper files and then with rotating upper (i.e. layer stack). Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-26 11:26 ` Miklos Szeredi 2016-10-26 12:11 ` Amir Goldstein 2016-10-26 19:56 ` Amir Goldstein @ 2016-10-30 22:00 ` Amir Goldstein 2016-10-31 14:59 ` Miklos Szeredi 2 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-30 22:00 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel On Wed, Oct 26, 2016 at 2:26 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > ... > > BTW I don't remember having tested this, so it might possibly be > buggy. Automatic multi-layer testing would really be good. What we > basically need is: > > - create normal (two layer) overlay (with interesting constructs, > whiteout, opaque dir, redirect) > - umount > - create three layer overlay where the two lower layers come from the > previous upper/lower layers > - do more interesting things > > There's one such test in xfstests but it would be good to have more. > I have a POC for generic addition of layers for unionmount-testsuite I pushed it to my github: https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir remount(ctx) rotates upper to top of lower layers and adds a new upper layer For now, remount(ctx) is called after successful mkdir and rename dir to catch directory redirect bugs. As it happens, it is very easy to find a bug, because it is enough to just umount an mount the overlay without even adding a new layer after directory rename to corrupt the overlayfs lookup. Almost every rename-*-dir test demonstrates the problem Going forward, I intend to hook the remount(ctx) event in a cleaner way, so for every VFS op in every subtest, it is possible to configure a test where the operation happens: - without any remount - with umount + mount - with umount + mount with a new layer Cheers, Amir. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-30 22:00 ` Amir Goldstein @ 2016-10-31 14:59 ` Miklos Szeredi 2016-10-31 15:02 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-10-31 14:59 UTC (permalink / raw) To: Amir Goldstein Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, Miklos Szeredi On Sun, Oct 30, 2016 at 11:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: > I have a POC for generic addition of layers for unionmount-testsuite > I pushed it to my github: > https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir Cool. But... this seems to be missing the "remount_union" module. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-31 14:59 ` Miklos Szeredi @ 2016-10-31 15:02 ` Amir Goldstein 0 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-10-31 15:02 UTC (permalink / raw) To: Miklos Szeredi Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel On Mon, Oct 31, 2016 at 4:59 PM, Miklos Szeredi <mszeredi@redhat.com> wrote: > On Sun, Oct 30, 2016 at 11:00 PM, Amir Goldstein <amir73il@gmail.com> wrote: > >> I have a POC for generic addition of layers for unionmount-testsuite >> I pushed it to my github: >> https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir > > Cool. But... this seems to be missing the "remount_union" module. Doh! fixes and force pushed > > Thanks, > Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi 2016-10-25 11:57 ` Raphael Hertzog 2016-10-25 12:49 ` Amir Goldstein @ 2016-10-28 16:15 ` Al Viro 2016-11-03 15:50 ` Miklos Szeredi 2 siblings, 1 reply; 52+ messages in thread From: Al Viro @ 2016-10-28 16:15 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Tue, Oct 25, 2016 at 09:34:47AM +0200, Miklos Szeredi wrote: > Current code returns EXDEV when a directory would need to be copied up to > move. We could copy up the directory tree in this case, but there's > another solution: point to old lower directory from moved upper directory. > > This is achieved with a "trusted.overlay.redirect" xattr storing the path > relative to the root of the overlay. After such attribute has been set, > the directory can be moved without further actions required. > > This is a backward incompatible feature, old kernels won't be able to > correctly mount an overlay containing redirected directories. > + err = vfs_path_lookup(lowerpath.dentry, lowerpath.mnt, > + redirect, 0, &thispath); > + > + if (err) { > + if (err == -ENOENT || err == -ENAMETOOLONG) > + this = NULL; > + } else { > + this = thispath.dentry; > + mntput(thispath.mnt); > + if (!this->d_inode) { > + dput(this); > + this = NULL; > + } else if (ovl_dentry_weird(this)) { > + dput(this); > + err = -EREMOTE; > + } > + } I'm not happy with that one - you are relying upon the fairly subtle assertions here. 1) Had lowerpath.mnt *not* been a privately cloned one with nothing mounted on it, you would've been screwed. 2) Had that thing contained a "jumper" symlink (a-la procfs ones), you would've been screwed. Currently only procfs has those, and it would've been rejected before getting there, but this is brittle and non-obvious. 3) Any automount point in there (nfs4 referrals, etc.) can break the assumption that nothing could've been mounted on it. And _that_ might have not been stepped onto; back when the path had been stored, there'd been no automount point at all, so we have avoided ovl_dentry_weird() rejects, and by now nothing on the path had been visited yet, so ovl_dentry_weird() didn't have a chance to trigger. Note that calling it on the last dentry is no good - we might have crossed the automount point in the middle of that path, so this last dentry might be nice and shiny - and on another filesystem. So unlike (1) and (2) it's not just a fishy-looking thing that happens to work for non-local reasons; AFAICS, it's actually a bug. I'm not sure if vfs_path_lookup() is the right tool here. It might be usable for making such a tool, but as it is you are setting one hell of a trap for yourself... It might be made to work, if we figure out the right semantics for disabling symlinks on per-vfsmount basis (and no, the posted nolinks patches are not it) and mark these private clones with that and with similar "disable automount traversals" flag (again, needs the right semantics; the area is convoluted as it is). But in that case I would strongly recommend adding an exported wrapper around vfs_path_lookup() that would verify that these flags *are* set. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-10-28 16:15 ` Al Viro @ 2016-11-03 15:50 ` Miklos Szeredi 2016-11-04 9:29 ` Amir Goldstein 0 siblings, 1 reply; 52+ messages in thread From: Miklos Szeredi @ 2016-11-03 15:50 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Fri, Oct 28, 2016 at 6:15 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Oct 25, 2016 at 09:34:47AM +0200, Miklos Szeredi wrote: >> Current code returns EXDEV when a directory would need to be copied up to >> move. We could copy up the directory tree in this case, but there's >> another solution: point to old lower directory from moved upper directory. >> >> This is achieved with a "trusted.overlay.redirect" xattr storing the path >> relative to the root of the overlay. After such attribute has been set, >> the directory can be moved without further actions required. >> >> This is a backward incompatible feature, old kernels won't be able to >> correctly mount an overlay containing redirected directories. > >> + err = vfs_path_lookup(lowerpath.dentry, lowerpath.mnt, >> + redirect, 0, &thispath); >> + >> + if (err) { >> + if (err == -ENOENT || err == -ENAMETOOLONG) >> + this = NULL; >> + } else { >> + this = thispath.dentry; >> + mntput(thispath.mnt); >> + if (!this->d_inode) { >> + dput(this); >> + this = NULL; >> + } else if (ovl_dentry_weird(this)) { >> + dput(this); >> + err = -EREMOTE; >> + } >> + } > > I'm not happy with that one - you are relying upon the fairly subtle > assertions here. > 1) Had lowerpath.mnt *not* been a privately cloned one with nothing > mounted on it, you would've been screwed. > 2) Had that thing contained a "jumper" symlink (a-la procfs ones), > you would've been screwed. Currently only procfs has those, and it would've > been rejected before getting there, but this is brittle and non-obvious. > 3) Any automount point in there (nfs4 referrals, etc.) can > break the assumption that nothing could've been mounted on it. And _that_ > might have not been stepped onto; back when the path had been stored, there'd > been no automount point at all, so we have avoided ovl_dentry_weird() rejects, > and by now nothing on the path had been visited yet, so ovl_dentry_weird() > didn't have a chance to trigger. Note that calling it on the last dentry > is no good - we might have crossed the automount point in the middle of that > path, so this last dentry might be nice and shiny - and on another filesystem. > So unlike (1) and (2) it's not just a fishy-looking thing that happens to > work for non-local reasons; AFAICS, it's actually a bug. > > I'm not sure if vfs_path_lookup() is the right tool here. It might be > usable for making such a tool, but as it is you are setting one hell of > a trap for yourself... Agreed, it's not the right tool. A custom loop of lookup_one_len's should work much better and doesn't add all that much complexity. Updated patch pushed to: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect This version also passes the recycling tests by Amir and enables the redirect feature by default on an empty upperdir. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-03 15:50 ` Miklos Szeredi @ 2016-11-04 9:29 ` Amir Goldstein 2016-11-04 13:48 ` Miklos Szeredi 0 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-11-04 9:29 UTC (permalink / raw) To: Miklos Szeredi Cc: Al Viro, Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Thu, Nov 3, 2016 at 5:50 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Oct 28, 2016 at 6:15 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> On Tue, Oct 25, 2016 at 09:34:47AM +0200, Miklos Szeredi wrote: ... >> >> I'm not sure if vfs_path_lookup() is the right tool here. It might be >> usable for making such a tool, but as it is you are setting one hell of >> a trap for yourself... > > Agreed, it's not the right tool. A custom loop of lookup_one_len's > should work much better and doesn't add all that much complexity. > Updated patch pushed to: > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #redirect > > This version also passes the recycling tests by Amir and enables the > redirect feature by default on an empty upperdir. > Miklos, You did not address my comment about the 'stack' allocation overflow in ovl_lookup I believe the (possible) overflow is demonstrated by the following debug patch: diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index c7cacbb..7171bfb 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -231,5 +231,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, goto out_put; if (redirect && poe != dentry->d_sb->s_root->d_fsdata) { + int stackroom = poe->numlower - ctr; + poe = dentry->d_sb->s_root->d_fsdata; @@ -238,6 +240,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, break; if (WARN_ON(i == poe->numlower)) break; + if (WARN_ON(poe->numlower - i - 1 > stackroom)) + break; } } } -- 2.7.4 In cases where a directory is moved into another directory with merge history shorter then total number of layers, lookup will need to grow the 'stack' while redirecting. Bug will be hit only after remount or dcache drop, which was the reason I wrote the recycle test in the first place... I instrumented unionmount-tests with test name prints to kmsg (a la xfstests) Pushed to https://github.com/amir73il/unionmount-testsuite.git #ovl_rename_dir And as you can see, 5 subtests hit the overflow warning. [ 1759.692281] TEST rename-new-dir.py:161: Rename empty dir over removed empty lower dir [ 1759.747217] WARNING: CPU: 0 PID: 9065 at /home/amir/src/linux/fs/overlayfs/namei.c:271 ovl_lookup+0x81d/0x870 [overlay] [ 1759.748887] TEST rename-new-dir.py:172: Rename empty dir over removed populated lower dir [ 1759.836195] WARNING: CPU: 2 PID: 9065 at /home/amir/src/linux/fs/overlayfs/namei.c:271 ovl_lookup+0x81d/0x870 [overlay] [ 1763.519285] TEST rename-new-pop-dir.py:170: Rename new dir over removed unioned empty dir [ 1763.592055] WARNING: CPU: 3 PID: 9065 at /home/amir/src/linux/fs/overlayfs/namei.c:271 ovl_lookup+0x81d/0x870 [overlay] [ 1763.592989] TEST rename-new-pop-dir.py:183: Rename new dir over removed unioned dir, different files [ 1763.658290] WARNING: CPU: 3 PID: 9065 at /home/amir/src/linux/fs/overlayfs/namei.c:271 ovl_lookup+0x81d/0x870 [overlay] [ 1763.660379] TEST rename-new-pop-dir.py:197: Rename new dir over removed unioned dir, same files [ 1763.731482] WARNING: CPU: 0 PID: 9065 at /home/amir/src/linux/fs/overlayfs/namei.c:271 ovl_lookup+0x81d/0x870 [overlay] I hope I am not missing anything. Cheers, Amir. ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 3/3] ovl: redirect on rename-dir 2016-11-04 9:29 ` Amir Goldstein @ 2016-11-04 13:48 ` Miklos Szeredi 0 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-11-04 13:48 UTC (permalink / raw) To: Amir Goldstein Cc: Al Viro, Miklos Szeredi, linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel, linux-kernel On Fri, Nov 4, 2016 at 10:29 AM, Amir Goldstein <amir73il@gmail.com> wrote: > You did not address my comment about the 'stack' allocation overflow > in ovl_lookup > I believe the (possible) overflow is demonstrated by the following debug patch: Oops, missed that. Good spotting! And there's more shit that unionfs-testsuite didn't discover (not even involving multiple layers): rm -rf /lower /upper /work mkdir -p /lower/a/b/c /upper /work mount -t overlay overlay -oupperdir=/upper,lowerdir=/lower,workdir=/work /mnt mv /mnt/a /mnt/z mv /mnt/z/b /mnt/q ls /mnt/q umount /mnt mount -t overlay overlay -oupperdir=/upper,lowerdir=/lower,workdir=/work /mnt ls /mnt/q umount /mnt Next update coming up... Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/3] overlayfs: allow moving directory trees 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi ` (2 preceding siblings ...) 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi @ 2016-10-25 20:25 ` Amir Goldstein 2016-10-26 9:37 ` Amir Goldstein 2016-10-26 9:34 ` [PATCH] ovl: check for emptiness of redirect dir Amir Goldstein 4 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-25 20:25 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel On Tue, Oct 25, 2016 at 10:34 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: > This allows overlayfs to move directory trees (residing on lower layer) > without having to recursively copy up the whole tree first. > > This series is available in git at: > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git#redirect > Gave this a run through xfstests -g quick, pjdfstest and unionmount-testsuite. After cleaning out all the xerr= failure expectations from unionmount tests, all tests pass except for rename-pop-dir, which appears to be successful in removal of non empty dir. Changes to unionmount-testsuite are available on my github: https://github.com/amir73il/unionmount-testsuite/tree/ovl_rename_dir > And is on top of the overlayfs-next branch. > > --- > Miklos Szeredi (3): > ovl: check fs features > vfs: export vfs_path_lookup() > ovl: redirect on rename-dir > > Documentation/filesystems/overlayfs.txt | 33 ++++++++++- > fs/internal.h | 2 - > fs/overlayfs/copy_up.c | 20 ++----- > fs/overlayfs/dir.c | 86 +++++++++++++++++++--------- > fs/overlayfs/namei.c | 99 ++++++++++++++++++++++++++++++--- > fs/overlayfs/overlayfs.h | 5 ++ > fs/overlayfs/ovl_entry.h | 4 ++ > fs/overlayfs/super.c | 56 +++++++++++++++++-- > fs/overlayfs/util.c | 19 +++++++ > include/linux/namei.h | 2 + > 10 files changed, 268 insertions(+), 58 deletions(-) > > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/3] overlayfs: allow moving directory trees 2016-10-25 20:25 ` [PATCH 0/3] overlayfs: allow moving directory trees Amir Goldstein @ 2016-10-26 9:37 ` Amir Goldstein 0 siblings, 0 replies; 52+ messages in thread From: Amir Goldstein @ 2016-10-26 9:37 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-unionfs, Guillem Jover, Raphael Hertzog, linux-fsdevel On Tue, Oct 25, 2016 at 11:25 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Tue, Oct 25, 2016 at 10:34 AM, Miklos Szeredi <mszeredi@redhat.com> wrote: >> This allows overlayfs to move directory trees (residing on lower layer) >> without having to recursively copy up the whole tree first. >> >> This series is available in git at: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git#redirect >> > > Gave this a run through xfstests -g quick, pjdfstest and unionmount-testsuite. > After cleaning out all the xerr= failure expectations from unionmount tests, > all tests pass except for rename-pop-dir, which appears to be successful in > removal of non empty dir. > I sent out a patch to ovl_remove_upper() that fixes the problem. > Changes to unionmount-testsuite are available on my github: > https://github.com/amir73il/unionmount-testsuite/tree/ovl_rename_dir > >> And is on top of the overlayfs-next branch. >> >> --- >> Miklos Szeredi (3): >> ovl: check fs features >> vfs: export vfs_path_lookup() >> ovl: redirect on rename-dir >> >> Documentation/filesystems/overlayfs.txt | 33 ++++++++++- >> fs/internal.h | 2 - >> fs/overlayfs/copy_up.c | 20 ++----- >> fs/overlayfs/dir.c | 86 +++++++++++++++++++--------- >> fs/overlayfs/namei.c | 99 ++++++++++++++++++++++++++++++--- >> fs/overlayfs/overlayfs.h | 5 ++ >> fs/overlayfs/ovl_entry.h | 4 ++ >> fs/overlayfs/super.c | 56 +++++++++++++++++-- >> fs/overlayfs/util.c | 19 +++++++ >> include/linux/namei.h | 2 + >> 10 files changed, 268 insertions(+), 58 deletions(-) >> >> -- >> 2.5.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] ovl: check for emptiness of redirect dir 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi ` (3 preceding siblings ...) 2016-10-25 20:25 ` [PATCH 0/3] overlayfs: allow moving directory trees Amir Goldstein @ 2016-10-26 9:34 ` Amir Goldstein 2016-10-26 10:45 ` Miklos Szeredi 4 siblings, 1 reply; 52+ messages in thread From: Amir Goldstein @ 2016-10-26 9:34 UTC (permalink / raw) To: Miklos Szeredi Cc: Guillem Jover, Raphael Hertzog, linux-unionfs, linux-fsdevel Before introducing redirect_dir feature, the condition !ovl_lower_positive(dentry) for a directory, implied that it is a pure upper directory, which may be removed if empty. Now that directory can be redirect, it is possible that upper does not cover any lower (i.e. !ovl_lower_positive(dentry)), but the directory is a merge (with redirected path) and maybe non empty. Check for this case in ovl_remove_upper(). This change fixes the following test case from rename-pop-dir.py of unionmount-testsuite: """Remove dir and rename old name""" d = ctx.non_empty_dir() d2 = ctx.no_dir() ctx.rmdir(d, err=ENOTEMPTY) ctx.rename(d, d2) ctx.rmdir(d, err=ENOENT) ctx.rmdir(d2, err=ENOTEMPTY) ./run --ov rename-pop-dir /mnt/a/no_dir103: Expected error (Directory not empty) was not produced Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/overlayfs/dir.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 065e021..d750ed9 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -672,9 +672,18 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir) { struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent); struct inode *dir = upperdir->d_inode; - struct dentry *upper; + struct dentry *upper = NULL; + struct dentry *opaquedir = NULL; int err; + /* Redirect dir can be !ovl_lower_positive && OVL_TYPE_MERGE */ + if (is_dir && ovl_dentry_is_redirect(dentry)) { + opaquedir = ovl_check_empty_and_clear(dentry); + err = PTR_ERR(opaquedir); + if (IS_ERR(opaquedir)) + return err; + } + inode_lock_nested(dir, I_MUTEX_PARENT); upper = lookup_one_len(dentry->d_name.name, upperdir, dentry->d_name.len); @@ -683,14 +692,15 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir) goto out_unlock; err = -ESTALE; - if (upper == ovl_dentry_upper(dentry)) { - if (is_dir) - err = vfs_rmdir(dir, upper); - else - err = vfs_unlink(dir, upper, NULL); - ovl_dentry_version_inc(dentry->d_parent); - } - dput(upper); + if ((opaquedir && upper != opaquedir) || + upper != ovl_dentry_upper(dentry)) + goto out_dput_upper; + + if (is_dir) + err = vfs_rmdir(dir, upper); + else + err = vfs_unlink(dir, upper, NULL); + ovl_dentry_version_inc(dentry->d_parent); /* * Keeping this dentry hashed would mean having to release @@ -700,8 +710,11 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir) */ if (!err) d_drop(dentry); +out_dput_upper: + dput(upper); out_unlock: inode_unlock(dir); + dput(opaquedir); return err; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] ovl: check for emptiness of redirect dir 2016-10-26 9:34 ` [PATCH] ovl: check for emptiness of redirect dir Amir Goldstein @ 2016-10-26 10:45 ` Miklos Szeredi 0 siblings, 0 replies; 52+ messages in thread From: Miklos Szeredi @ 2016-10-26 10:45 UTC (permalink / raw) To: Amir Goldstein Cc: Guillem Jover, Raphael Hertzog, linux-unionfs, linux-fsdevel On Wed, Oct 26, 2016 at 11:34 AM, Amir Goldstein <amir73il@gmail.com> wrote: > Before introducing redirect_dir feature, the condition > !ovl_lower_positive(dentry) for a directory, implied that > it is a pure upper directory, which may be removed if empty. > > Now that directory can be redirect, it is possible that > upper does not cover any lower (i.e. !ovl_lower_positive(dentry)), > but the directory is a merge (with redirected path) and maybe > non empty. > > Check for this case in ovl_remove_upper(). > > This change fixes the following test case from rename-pop-dir.py > of unionmount-testsuite: > > """Remove dir and rename old name""" > d = ctx.non_empty_dir() > d2 = ctx.no_dir() > > ctx.rmdir(d, err=ENOTEMPTY) > ctx.rename(d, d2) > ctx.rmdir(d, err=ENOENT) > ctx.rmdir(d2, err=ENOTEMPTY) > > ./run --ov rename-pop-dir > /mnt/a/no_dir103: Expected error (Directory not empty) was not produced > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/overlayfs/dir.c | 31 ++++++++++++++++++++++--------- > 1 file changed, 22 insertions(+), 9 deletions(-) > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > index 065e021..d750ed9 100644 > --- a/fs/overlayfs/dir.c > +++ b/fs/overlayfs/dir.c > @@ -672,9 +672,18 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir) > { > struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent); > struct inode *dir = upperdir->d_inode; > - struct dentry *upper; > + struct dentry *upper = NULL; > + struct dentry *opaquedir = NULL; > int err; > > + /* Redirect dir can be !ovl_lower_positive && OVL_TYPE_MERGE */ > + if (is_dir && ovl_dentry_is_redirect(dentry)) { > + opaquedir = ovl_check_empty_and_clear(dentry); > + err = PTR_ERR(opaquedir); > + if (IS_ERR(opaquedir)) > + return err; > + } > + > inode_lock_nested(dir, I_MUTEX_PARENT); > upper = lookup_one_len(dentry->d_name.name, upperdir, > dentry->d_name.len); > @@ -683,14 +692,15 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir) > goto out_unlock; > > err = -ESTALE; > - if (upper == ovl_dentry_upper(dentry)) { > - if (is_dir) > - err = vfs_rmdir(dir, upper); > - else > - err = vfs_unlink(dir, upper, NULL); > - ovl_dentry_version_inc(dentry->d_parent); > - } > - dput(upper); > + if ((opaquedir && upper != opaquedir) || > + upper != ovl_dentry_upper(dentry)) This will always error out for the opaquedir case. So it needs to be: if ((opaquedir && upper != opaquedir) || (!opaquedir && upper != ovl_dentry_upper(dentry))) Pushed to "redirect" branch with this fix. Thanks, Miklos ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2016-11-22 13:42 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-25 7:34 [PATCH 0/3] overlayfs: allow moving directory trees Miklos Szeredi 2016-10-25 7:34 ` [PATCH 1/3] ovl: check fs features Miklos Szeredi 2016-10-25 7:34 ` Miklos Szeredi 2016-10-25 11:24 ` Amir Goldstein 2016-11-05 20:40 ` Amir Goldstein 2016-10-25 7:34 ` [PATCH 2/3] vfs: export vfs_path_lookup() Miklos Szeredi 2016-10-25 7:34 ` [PATCH 3/3] ovl: redirect on rename-dir Miklos Szeredi 2016-10-25 11:57 ` Raphael Hertzog 2016-10-26 11:12 ` Miklos Szeredi 2016-10-28 12:56 ` Raphael Hertzog 2016-10-28 12:59 ` Miklos Szeredi 2016-11-06 19:14 ` Konstantin Khlebnikov 2016-11-07 8:07 ` Miklos Szeredi 2016-11-07 9:58 ` Konstantin Khlebnikov 2016-11-07 10:04 ` Miklos Szeredi 2016-11-07 10:08 ` Konstantin Khlebnikov 2016-11-07 13:38 ` Amir Goldstein 2016-11-10 22:56 ` Amir Goldstein 2016-11-11 9:46 ` Konstantin Khlebnikov 2016-11-11 10:06 ` Miklos Szeredi 2016-11-11 12:42 ` Amir Goldstein 2016-11-13 9:11 ` Amir Goldstein 2016-11-07 11:03 ` Raphael Hertzog 2016-11-07 11:31 ` Konstantin Khlebnikov 2016-11-07 13:42 ` Raphael Hertzog 2016-11-10 22:39 ` Miklos Szeredi 2016-11-11 9:41 ` Konstantin Khlebnikov 2016-11-13 10:00 ` Amir Goldstein 2016-11-14 16:25 ` Amir Goldstein 2016-11-16 22:00 ` Miklos Szeredi 2016-11-18 15:37 ` Amir Goldstein 2016-11-20 11:39 ` Amir Goldstein 2016-11-21 9:54 ` Miklos Szeredi 2016-11-21 10:13 ` Amir Goldstein 2016-11-21 10:16 ` Miklos Szeredi 2016-11-22 13:42 ` Amir Goldstein 2016-10-25 12:49 ` Amir Goldstein 2016-10-26 11:26 ` Miklos Szeredi 2016-10-26 12:11 ` Amir Goldstein 2016-10-26 12:51 ` Miklos Szeredi 2016-10-26 19:56 ` Amir Goldstein 2016-10-30 22:00 ` Amir Goldstein 2016-10-31 14:59 ` Miklos Szeredi 2016-10-31 15:02 ` Amir Goldstein 2016-10-28 16:15 ` Al Viro 2016-11-03 15:50 ` Miklos Szeredi 2016-11-04 9:29 ` Amir Goldstein 2016-11-04 13:48 ` Miklos Szeredi 2016-10-25 20:25 ` [PATCH 0/3] overlayfs: allow moving directory trees Amir Goldstein 2016-10-26 9:37 ` Amir Goldstein 2016-10-26 9:34 ` [PATCH] ovl: check for emptiness of redirect dir Amir Goldstein 2016-10-26 10:45 ` Miklos Szeredi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.