Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] is ovl_fh->fid really intended to be misaligned?
@ 2019-11-14 15:47 Al Viro
  2019-11-14 17:02 ` J. Bruce Fields
  2019-11-14 19:55 ` Miklos Szeredi
  0 siblings, 2 replies; 8+ messages in thread
From: Al Viro @ 2019-11-14 15:47 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-nfs, J. Bruce Fields

AFAICS, this
        bytes = (fh->len - offsetof(struct ovl_fh, fid));
        real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
                                  bytes >> 2, (int)fh->type,
                                  connected ? ovl_acceptable : NULL, mnt);
in ovl_decode_real_fh() combined with
                origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
                                            connected);
in ovl_check_origin_fh(),
        /* First lookup overlay inode in inode cache by origin fh */
        err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
in ovl_lower_fh_to_d() and
        struct ovl_fh *fh = (struct ovl_fh *) fid;
...
                 ovl_lower_fh_to_d(sb, fh);
in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
exportfs_decode_fh() being 21 bytes ahead of that passed to
ovl_fh_to_dentry().  

However, alignment of struct fid pointers is 32 bits and quite a few
places dealing with those (including ->fh_to_dentry() instances)
do access them directly.  Argument of ->fh_to_dentry() is supposed
to be 32bit-aligned, and callers generally guarantee that.  Your
code, OTOH, violates the alignment systematically there - what
it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
always has two lower bits different from what it got itself.

What do we do with that?  One solution would be to insert sane padding
in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
disk, so that would require rather unpleasant operations reinserting
the padding on the fly ;-/

Another is to declare struct fid unaligned with explicit use of
__aligned in declaration and let all code normally dealing with
those pay the price.  Frankly, I don't like that - it's overlayfs
mess, so penalizing the much older users doesn't sound like a good idea.
Worse, any code that does (like overlayfs) cast the incoming
struct fid * to a pointer to its own struct will still be in
trouble - explicit cast is explicit cast, and it discards all
alignment information (as yours has done).

BTW, your ->encode_fh() appears to report the length greater than
the object it has returned...  Granted, the 3 overreported bytes
will be right after the end of 4n+1-byte kmalloc'ed area, so they
won't fall over the page boundary, but the values in those are left
uninitialized.  And they are sent in over-the-wire representations;
you ignore those on decode, but they are there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 15:47 [RFC] is ovl_fh->fid really intended to be misaligned? Al Viro
@ 2019-11-14 17:02 ` J. Bruce Fields
  2019-11-14 19:55 ` Miklos Szeredi
  1 sibling, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2019-11-14 17:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Amir Goldstein, linux-fsdevel, linux-nfs

On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> AFAICS, this
>         bytes = (fh->len - offsetof(struct ovl_fh, fid));
>         real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
>                                   bytes >> 2, (int)fh->type,
>                                   connected ? ovl_acceptable : NULL, mnt);
> in ovl_decode_real_fh() combined with
>                 origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
>                                             connected);
> in ovl_check_origin_fh(),
>         /* First lookup overlay inode in inode cache by origin fh */
>         err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> in ovl_lower_fh_to_d() and
>         struct ovl_fh *fh = (struct ovl_fh *) fid;
> ...
>                  ovl_lower_fh_to_d(sb, fh);
> in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> exportfs_decode_fh() being 21 bytes ahead of that passed to
> ovl_fh_to_dentry().  
> 
> However, alignment of struct fid pointers is 32 bits and quite a few
> places dealing with those (including ->fh_to_dentry() instances)
> do access them directly.  Argument of ->fh_to_dentry() is supposed
> to be 32bit-aligned, and callers generally guarantee that.  Your
> code, OTOH, violates the alignment systematically there - what
> it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> always has two lower bits different from what it got itself.
> 
> What do we do with that?  One solution would be to insert sane padding
> in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> disk, so that would require rather unpleasant operations reinserting
> the padding on the fly ;-/

Note that filehandles given to clients also have unlimited lifetimes, so
once we give them out we're committed to accepting them forever, at
least in theory.  The on-disk xattrs are probably the bigger deal,
though.

Would inserting the padding on the fly really be that bad?

> Another is to declare struct fid unaligned with explicit use of
> __aligned in declaration and let all code normally dealing with
> those pay the price.  Frankly, I don't like that - it's overlayfs
> mess, so penalizing the much older users doesn't sound like a good idea.
> Worse, any code that does (like overlayfs) cast the incoming
> struct fid * to a pointer to its own struct will still be in
> trouble - explicit cast is explicit cast, and it discards all
> alignment information (as yours has done).
> 
> BTW, your ->encode_fh() appears to report the length greater than
> the object it has returned...  Granted, the 3 overreported bytes
> will be right after the end of 4n+1-byte kmalloc'ed area, so they
> won't fall over the page boundary, but the values in those are left
> uninitialized.  And they are sent in over-the-wire representations;
> you ignore those on decode, but they are there.

So it's a minor information leak, at least.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 15:47 [RFC] is ovl_fh->fid really intended to be misaligned? Al Viro
  2019-11-14 17:02 ` J. Bruce Fields
@ 2019-11-14 19:55 ` Miklos Szeredi
  2019-11-14 20:07   ` Amir Goldstein
  2019-11-14 20:55   ` Al Viro
  1 sibling, 2 replies; 8+ messages in thread
From: Miklos Szeredi @ 2019-11-14 19:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Amir Goldstein, linux-fsdevel, linux-nfs, J. Bruce Fields

On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> AFAICS, this
>         bytes = (fh->len - offsetof(struct ovl_fh, fid));
>         real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
>                                   bytes >> 2, (int)fh->type,
>                                   connected ? ovl_acceptable : NULL, mnt);
> in ovl_decode_real_fh() combined with
>                 origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
>                                             connected);
> in ovl_check_origin_fh(),
>         /* First lookup overlay inode in inode cache by origin fh */
>         err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> in ovl_lower_fh_to_d() and
>         struct ovl_fh *fh = (struct ovl_fh *) fid;
> ...
>                  ovl_lower_fh_to_d(sb, fh);
> in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> exportfs_decode_fh() being 21 bytes ahead of that passed to
> ovl_fh_to_dentry().  
> 
> However, alignment of struct fid pointers is 32 bits and quite a few
> places dealing with those (including ->fh_to_dentry() instances)
> do access them directly.  Argument of ->fh_to_dentry() is supposed
> to be 32bit-aligned, and callers generally guarantee that.  Your
> code, OTOH, violates the alignment systematically there - what
> it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> always has two lower bits different from what it got itself.
> 
> What do we do with that?  One solution would be to insert sane padding
> in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> disk, so that would require rather unpleasant operations reinserting
> the padding on the fly ;-/

Something like this?  Totally untested...

Thanks,
Miklos

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b801c6353100..60a4ca72cb4e 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -253,7 +253,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
 
 	BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
 	fh_len = offsetof(struct ovl_fh, fid) + buflen;
-	fh = kmalloc(fh_len, GFP_KERNEL);
+	fh = kzalloc(fh_len, GFP_KERNEL);
 	if (!fh) {
 		fh = ERR_PTR(-ENOMEM);
 		goto out;
@@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
 	 */
 	if (is_upper)
 		fh->flags |= OVL_FH_FLAG_PATH_UPPER;
-	fh->len = fh_len;
+	fh->len = fh_len - OVL_FH_WIRE_OFFSET;
 	fh->uuid = *uuid;
 	memcpy(fh->fid, buf, buflen);
 
@@ -300,7 +300,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	/*
 	 * Do not fail when upper doesn't support xattrs.
 	 */
-	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
+	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN,
+				 fh ? OVL_FH_START(fh) : NULL,
 				 fh ? fh->len : 0, 0);
 	kfree(fh);
 
@@ -317,7 +318,8 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
 	if (IS_ERR(fh))
 		return PTR_ERR(fh);
 
-	err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh, fh->len, 0);
+	err = ovl_do_setxattr(index, OVL_XATTR_UPPER,
+			      OVL_FH_START(fh), fh->len, 0);
 
 	kfree(fh);
 	return err;
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 73c9775215b3..dedda95c7746 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 	if (fh->len > buflen)
 		goto fail;
 
-	memcpy(buf, (char *)fh, fh->len);
+	memcpy(buf, OVL_FH_START(fh), fh->len);
 	err = fh->len;
 
 out:
@@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
 
 	/* Round up to dwords */
 	*max_len = (len + 3) >> 2;
+	memset(fid + len, 0, (*max_len << 2) - len);
 	return OVL_FILEID;
 }
 
@@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
 				       int fh_len, int fh_type)
 {
 	struct dentry *dentry = NULL;
-	struct ovl_fh *fh = (struct ovl_fh *) fid;
+	struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;
 	int len = fh_len << 2;
 	unsigned int flags = 0;
 	int err;
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index e9717c2f7d45..f22a65359df1 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -86,7 +86,8 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
  */
 int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
 {
-	if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
+	if (fh_len < sizeof(struct ovl_fh) - OVL_FH_WIRE_OFFSET ||
+	    fh_len < fh->len)
 		return -EINVAL;
 
 	if (fh->magic != OVL_FH_MAGIC)
@@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
 	if (res == 0)
 		return NULL;
 
-	fh = kzalloc(res, GFP_KERNEL);
+	fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
 	if (!fh)
 		return ERR_PTR(-ENOMEM);
 
-	res = vfs_getxattr(dentry, name, fh, res);
+	res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
 	if (res < 0)
 		goto fail;
 
@@ -161,7 +162,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 	if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid))
 		return NULL;
 
-	bytes = (fh->len - offsetof(struct ovl_fh, fid));
+	bytes = (fh->len + OVL_FH_WIRE_OFFSET - offsetof(struct ovl_fh, fid));
 	real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
 				  bytes >> 2, (int)fh->type,
 				  connected ? ovl_acceptable : NULL, mnt);
@@ -433,7 +434,8 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,
 
 	err = ovl_verify_fh(dentry, name, fh);
 	if (set && err == -ENODATA)
-		err = ovl_do_setxattr(dentry, name, fh, fh->len, 0);
+		err = ovl_do_setxattr(dentry, name,
+				      OVL_FH_START(fh), fh->len, 0);
 	if (err)
 		goto fail;
 
@@ -512,12 +514,12 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)
 
 	err = -ENOMEM;
 	len = index->d_name.len / 2;
-	fh = kzalloc(len, GFP_KERNEL);
+	fh = kzalloc(len + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
 	if (!fh)
 		goto fail;
 
 	err = -EINVAL;
-	if (hex2bin((u8 *)fh, index->d_name.name, len))
+	if (hex2bin(OVL_FH_START(fh), index->d_name.name, len))
 		goto fail;
 
 	err = ovl_check_fh_len(fh, len);
@@ -603,7 +605,7 @@ static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name)
 	if (!n)
 		return -ENOMEM;
 
-	s  = bin2hex(n, fh, fh->len);
+	s  = bin2hex(n, OVL_FH_START(fh), fh->len);
 	*name = (struct qstr) QSTR_INIT(n, s - n);
 
 	return 0;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6934bcf030f0..c62083671a12 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -74,8 +74,13 @@ enum ovl_entry_flag {
 /* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
 #define OVL_FILEID	0xfb
 
-/* On-disk and in-memeory format for redirect by file handle */
+#define OVL_FH_WIRE_OFFSET 3
+#define OVL_FH_START(fh) ((void *)(fh) + OVL_FH_WIRE_OFFSET)
 struct ovl_fh {
+	/* make sure fid is 32bit aligned */
+	u8 padding[OVL_FH_WIRE_OFFSET];
+
+	/* Wire/xattr encoding begins here*/
 	u8 version;	/* 0 */
 	u8 magic;	/* 0xfb */
 	u8 len;		/* size of this header + size of fid */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 19:55 ` Miklos Szeredi
@ 2019-11-14 20:07   ` Amir Goldstein
  2019-11-14 23:13     ` Amir Goldstein
  2019-11-14 20:55   ` Al Viro
  1 sibling, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2019-11-14 20:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, linux-fsdevel, Linux NFS Mailing List, J. Bruce Fields

On Thu, Nov 14, 2019 at 9:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> > AFAICS, this
> >         bytes = (fh->len - offsetof(struct ovl_fh, fid));
> >         real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> >                                   bytes >> 2, (int)fh->type,
> >                                   connected ? ovl_acceptable : NULL, mnt);
> > in ovl_decode_real_fh() combined with
> >                 origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
> >                                             connected);
> > in ovl_check_origin_fh(),
> >         /* First lookup overlay inode in inode cache by origin fh */
> >         err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> > in ovl_lower_fh_to_d() and
> >         struct ovl_fh *fh = (struct ovl_fh *) fid;
> > ...
> >                  ovl_lower_fh_to_d(sb, fh);
> > in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> > exportfs_decode_fh() being 21 bytes ahead of that passed to
> > ovl_fh_to_dentry().
> >
> > However, alignment of struct fid pointers is 32 bits and quite a few
> > places dealing with those (including ->fh_to_dentry() instances)
> > do access them directly.  Argument of ->fh_to_dentry() is supposed
> > to be 32bit-aligned, and callers generally guarantee that.  Your
> > code, OTOH, violates the alignment systematically there - what
> > it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> > always has two lower bits different from what it got itself.
> >
> > What do we do with that?  One solution would be to insert sane padding
> > in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> > disk, so that would require rather unpleasant operations reinserting
> > the padding on the fly ;-/
>
> Something like this?  Totally untested...
>

I was going to suggest something similar using

struct ovl_fhv1 {
        u8 pad[3];
        struct ovl_fh fhv0;
} __packed;

New overlayfs exported file handles on-wire could be ovl_fhv1,
but we can easily convert old ovl_fhv to ovl_fhv1
on-the-fly on decode (if we care about those few users at all)

xattrs would still be stored and read as ovl_fh v0.

Thanks,
Amir.

>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index b801c6353100..60a4ca72cb4e 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -253,7 +253,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
>
>         BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
>         fh_len = offsetof(struct ovl_fh, fid) + buflen;
> -       fh = kmalloc(fh_len, GFP_KERNEL);
> +       fh = kzalloc(fh_len, GFP_KERNEL);
>         if (!fh) {
>                 fh = ERR_PTR(-ENOMEM);
>                 goto out;
> @@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
>          */
>         if (is_upper)
>                 fh->flags |= OVL_FH_FLAG_PATH_UPPER;
> -       fh->len = fh_len;
> +       fh->len = fh_len - OVL_FH_WIRE_OFFSET;
>         fh->uuid = *uuid;
>         memcpy(fh->fid, buf, buflen);
>
> @@ -300,7 +300,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
>         /*
>          * Do not fail when upper doesn't support xattrs.
>          */
> -       err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
> +       err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN,
> +                                fh ? OVL_FH_START(fh) : NULL,
>                                  fh ? fh->len : 0, 0);
>         kfree(fh);
>
> @@ -317,7 +318,8 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
>         if (IS_ERR(fh))
>                 return PTR_ERR(fh);
>
> -       err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh, fh->len, 0);
> +       err = ovl_do_setxattr(index, OVL_XATTR_UPPER,
> +                             OVL_FH_START(fh), fh->len, 0);
>
>         kfree(fh);
>         return err;
> diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> index 73c9775215b3..dedda95c7746 100644
> --- a/fs/overlayfs/export.c
> +++ b/fs/overlayfs/export.c
> @@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
>         if (fh->len > buflen)
>                 goto fail;
>
> -       memcpy(buf, (char *)fh, fh->len);
> +       memcpy(buf, OVL_FH_START(fh), fh->len);
>         err = fh->len;
>
>  out:
> @@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
>
>         /* Round up to dwords */
>         *max_len = (len + 3) >> 2;
> +       memset(fid + len, 0, (*max_len << 2) - len);
>         return OVL_FILEID;
>  }
>
> @@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
>                                        int fh_len, int fh_type)
>  {
>         struct dentry *dentry = NULL;
> -       struct ovl_fh *fh = (struct ovl_fh *) fid;
> +       struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;
>         int len = fh_len << 2;
>         unsigned int flags = 0;
>         int err;
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index e9717c2f7d45..f22a65359df1 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -86,7 +86,8 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
>   */
>  int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
>  {
> -       if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
> +       if (fh_len < sizeof(struct ovl_fh) - OVL_FH_WIRE_OFFSET ||
> +           fh_len < fh->len)
>                 return -EINVAL;
>
>         if (fh->magic != OVL_FH_MAGIC)
> @@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
>         if (res == 0)
>                 return NULL;
>
> -       fh = kzalloc(res, GFP_KERNEL);
> +       fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
>         if (!fh)
>                 return ERR_PTR(-ENOMEM);
>
> -       res = vfs_getxattr(dentry, name, fh, res);
> +       res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
>         if (res < 0)
>                 goto fail;
>
> @@ -161,7 +162,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
>         if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid))
>                 return NULL;
>
> -       bytes = (fh->len - offsetof(struct ovl_fh, fid));
> +       bytes = (fh->len + OVL_FH_WIRE_OFFSET - offsetof(struct ovl_fh, fid));
>         real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
>                                   bytes >> 2, (int)fh->type,
>                                   connected ? ovl_acceptable : NULL, mnt);
> @@ -433,7 +434,8 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,
>
>         err = ovl_verify_fh(dentry, name, fh);
>         if (set && err == -ENODATA)
> -               err = ovl_do_setxattr(dentry, name, fh, fh->len, 0);
> +               err = ovl_do_setxattr(dentry, name,
> +                                     OVL_FH_START(fh), fh->len, 0);
>         if (err)
>                 goto fail;
>
> @@ -512,12 +514,12 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)
>
>         err = -ENOMEM;
>         len = index->d_name.len / 2;
> -       fh = kzalloc(len, GFP_KERNEL);
> +       fh = kzalloc(len + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
>         if (!fh)
>                 goto fail;
>
>         err = -EINVAL;
> -       if (hex2bin((u8 *)fh, index->d_name.name, len))
> +       if (hex2bin(OVL_FH_START(fh), index->d_name.name, len))
>                 goto fail;
>
>         err = ovl_check_fh_len(fh, len);
> @@ -603,7 +605,7 @@ static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name)
>         if (!n)
>                 return -ENOMEM;
>
> -       s  = bin2hex(n, fh, fh->len);
> +       s  = bin2hex(n, OVL_FH_START(fh), fh->len);
>         *name = (struct qstr) QSTR_INIT(n, s - n);
>
>         return 0;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 6934bcf030f0..c62083671a12 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -74,8 +74,13 @@ enum ovl_entry_flag {
>  /* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
>  #define OVL_FILEID     0xfb
>
> -/* On-disk and in-memeory format for redirect by file handle */
> +#define OVL_FH_WIRE_OFFSET 3
> +#define OVL_FH_START(fh) ((void *)(fh) + OVL_FH_WIRE_OFFSET)
>  struct ovl_fh {
> +       /* make sure fid is 32bit aligned */
> +       u8 padding[OVL_FH_WIRE_OFFSET];
> +
> +       /* Wire/xattr encoding begins here*/
>         u8 version;     /* 0 */
>         u8 magic;       /* 0xfb */
>         u8 len;         /* size of this header + size of fid */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 19:55 ` Miklos Szeredi
  2019-11-14 20:07   ` Amir Goldstein
@ 2019-11-14 20:55   ` Al Viro
  1 sibling, 0 replies; 8+ messages in thread
From: Al Viro @ 2019-11-14 20:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Amir Goldstein, linux-fsdevel, linux-nfs, J. Bruce Fields

On Thu, Nov 14, 2019 at 08:55:44PM +0100, Miklos Szeredi wrote:

>  	BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
>  	fh_len = offsetof(struct ovl_fh, fid) + buflen;

IOW, 3 bytes longer now.  OK...

> -	fh = kmalloc(fh_len, GFP_KERNEL);
> +	fh = kzalloc(fh_len, GFP_KERNEL);
>  	if (!fh) {
>  		fh = ERR_PTR(-ENOMEM);
>  		goto out;
> @@ -271,7 +271,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
>  	 */
>  	if (is_upper)
>  		fh->flags |= OVL_FH_FLAG_PATH_UPPER;
> -	fh->len = fh_len;
> +	fh->len = fh_len - OVL_FH_WIRE_OFFSET;

... same value as before

>  	fh->uuid = *uuid;
>  	memcpy(fh->fid, buf, buflen);

Is there any point to two allocations + memcpy here?  It's not as if you kept those
ovl_fh instances allocated for a long time - the caller frees them shortly.  So
why bother with buf at all?  Just allocate your fh max-sized and bloody pass &fh->fid
to exportfs_encode_fh() there.  I would suggest this for declaration, if you want
to pad it in front:
struct ovl_fh {
	u8 __pad[OVL_FH_WIRE_OFFSET];
        u8 version;     /* 0 */
        u8 magic;       /* 0xfb */
        u8 len;         /* size of this header + size of fid */
        u8 flags;       /* OVL_FH_FLAG_* */
        u8 type;        /* fid_type of fid */
        uuid_t uuid;    /* uuid of filesystem */
	union {
		struct fid fid;      /* file identifier */
		u8 storage[];
	};
} __packed;
with BUILD_BUG_ON(offsetof(struct ovl_fh, fid) % 4); somewhere in there.
Size calculation would be
        fh_len = offsetof(struct ovl_fh, storage[MAX_HANDLE_SIZE]);


And check that resulting fhandle size does *not* exceed 32 words, just to make sure.

> @@ -234,7 +234,7 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
>  	if (fh->len > buflen)
>  		goto fail;
>  
> -	memcpy(buf, (char *)fh, fh->len);
> +	memcpy(buf, OVL_FH_START(fh), fh->len);
>  	err = fh->len;

Incidentally, memcpy() doesn't need any casts - it takes any data pointer types
just fine...

>  out:
> @@ -260,6 +260,7 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
>  
>  	/* Round up to dwords */
>  	*max_len = (len + 3) >> 2;
> +	memset(fid + len, 0, (*max_len << 2) - len);

No.  This is broken - fid is u32 pointer.  What you want here is

memcpy(fid, OVL_FH_START(fh), fh->len);
memset((char *)fid + fh->len, 0, OVL_FH_START(fh), OVL_FH_WIRE_OFFSET);
len = fh->len + OVL_FH_WIRE_OFFSET;

in the d_to_fh part, then just have *max_len = len / 4;

Or just do the max-sized allocation for fh and have ovl_encode_real_fh()
zero everything past fh->len; then this would just be a plain memcpy() and
to hell with separate memset()...

Incidentally, I really don't understand why these three functions separated
from each other.  Why not do all of that in ovl_encode_fh()?  AFAICS, the
logics gets simpler than way.


>  	return OVL_FILEID;
>  }


> @@ -781,7 +782,7 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
>  				       int fh_len, int fh_type)
>  {
>  	struct dentry *dentry = NULL;
> -	struct ovl_fh *fh = (struct ovl_fh *) fid;
> +	struct ovl_fh *fh = (void *) fid - OVL_FH_WIRE_OFFSET;

Not enough, I'm afraid...  Look what happens when you get to passing the
payload to ovl_decode_real_fh().  You pass a misaligned pointer (1 mod 4)
in there, then an equally misaligned pointer is passed to exportfs_decode_fh().
You really need to move that sucker here - the underlying fhandle is really
misaligned in what gets passed to you.


> @@ -119,11 +120,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
>  	if (res == 0)
>  		return NULL;
>  
> -	fh = kzalloc(res, GFP_KERNEL);
> +	fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
>  	if (!fh)
>  		return ERR_PTR(-ENOMEM);
>  
> -	res = vfs_getxattr(dentry, name, fh, res);
> +	res = vfs_getxattr(dentry, name, fh + OVL_FH_WIRE_OFFSET, res);
>  	if (res < 0)
>  		goto fail;

BTW, is there any point in precisely-sized allocations here?  Again,
the result won't live long, and we know the upper limit on the size,
so why bother with two vfs_getxattr() calls, etc?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 20:07   ` Amir Goldstein
@ 2019-11-14 23:13     ` Amir Goldstein
  2019-11-14 23:49       ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2019-11-14 23:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, linux-fsdevel, Linux NFS Mailing List, J. Bruce Fields,
	overlayfs

[-- Attachment #1: Type: text/plain, Size: 2785 bytes --]

On Thu, Nov 14, 2019 at 10:07 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Nov 14, 2019 at 9:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, Nov 14, 2019 at 03:47:23PM +0000, Al Viro wrote:
> > > AFAICS, this
> > >         bytes = (fh->len - offsetof(struct ovl_fh, fid));
> > >         real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
> > >                                   bytes >> 2, (int)fh->type,
> > >                                   connected ? ovl_acceptable : NULL, mnt);
> > > in ovl_decode_real_fh() combined with
> > >                 origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
> > >                                             connected);
> > > in ovl_check_origin_fh(),
> > >         /* First lookup overlay inode in inode cache by origin fh */
> > >         err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
> > > in ovl_lower_fh_to_d() and
> > >         struct ovl_fh *fh = (struct ovl_fh *) fid;
> > > ...
> > >                  ovl_lower_fh_to_d(sb, fh);
> > > in ovl_fh_to_dentry() leads to the pointer to struct fid passed to
> > > exportfs_decode_fh() being 21 bytes ahead of that passed to
> > > ovl_fh_to_dentry().
> > >
> > > However, alignment of struct fid pointers is 32 bits and quite a few
> > > places dealing with those (including ->fh_to_dentry() instances)
> > > do access them directly.  Argument of ->fh_to_dentry() is supposed
> > > to be 32bit-aligned, and callers generally guarantee that.  Your
> > > code, OTOH, violates the alignment systematically there - what
> > > it passes to layers' ->fh_to_dentry() (by way of exportfs_decode_fh())
> > > always has two lower bits different from what it got itself.
> > >
> > > What do we do with that?  One solution would be to insert sane padding
> > > in ovl_fh, but the damn thing appears to be stored as-is in xattrs on
> > > disk, so that would require rather unpleasant operations reinserting
> > > the padding on the fly ;-/
> >
> > Something like this?  Totally untested...
> >
>
> I was going to suggest something similar using
>
> struct ovl_fhv1 {
>         u8 pad[3];
>         struct ovl_fh fhv0;
> } __packed;
>
> New overlayfs exported file handles on-wire could be ovl_fhv1,
> but we can easily convert old ovl_fhv to ovl_fhv1
> on-the-fly on decode (if we care about those few users at all)
>
> xattrs would still be stored and read as ovl_fh v0.
>

See attached.
IMHO it looks much easier to verify that these changes are correct
compared to your open coded offset shifting all over the place.

It even passed the exportfs tests first try.
Only some index tests are failing.

If you like this version, I can fix up the failures and add Al's
suggestions to simplify code with OVL_FH_MAX_SIZE
memory allocations.

Thanks,
Amir.

[-- Attachment #2: 0001-ovl-make-sure-real-file-handle-is-32bit-aligned-in-m.patch.txt --]
[-- Type: text/plain, Size: 11043 bytes --]

From af5b954f17a219e9a00eebbac605f0cdffaf179d Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Fri, 15 Nov 2019 00:54:33 +0200
Subject: [PATCH] ovl: make sure real file handle is 32bit aligned in memory

WIP

Seprate on-disk encoding from in-memory and on-wire resresentation
of overlay file handle.

In-memory and on-wire we only ever pass around pointers to struct
ovl_fh, which encapsulates at offset 3 the on-disk format struct
ovl_fb. struct ovl_fb encapsulates at offset 21 the real file handle.
That makes sure that the real file handle is always 32bit aligned
in-memory when passed down to the underlying filesystem.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c   | 28 ++++++++++++++-------------
 fs/overlayfs/export.c    | 22 ++++++++++-----------
 fs/overlayfs/namei.c     | 42 ++++++++++++++++++++--------------------
 fs/overlayfs/overlayfs.h | 29 ++++++++++++++++++++++-----
 4 files changed, 70 insertions(+), 51 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b801c6353100..7f94f3223710 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -251,18 +251,20 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
 	    WARN_ON(fh_type == FILEID_INVALID))
 		goto out;
 
-	BUILD_BUG_ON(MAX_HANDLE_SZ + offsetof(struct ovl_fh, fid) > 255);
-	fh_len = offsetof(struct ovl_fh, fid) + buflen;
+	/* Make sure the real fid stays 32bit aligned */
+	BUILD_BUG_ON(OVL_FH_FID_OFFSET % 4);
+	BUILD_BUG_ON(MAX_HANDLE_SZ + OVL_FH_FID_OFFSET > 255);
+	fh_len = OVL_FH_FID_OFFSET + buflen;
 	fh = kmalloc(fh_len, GFP_KERNEL);
 	if (!fh) {
 		fh = ERR_PTR(-ENOMEM);
 		goto out;
 	}
 
-	fh->version = OVL_FH_VERSION;
-	fh->magic = OVL_FH_MAGIC;
-	fh->type = fh_type;
-	fh->flags = OVL_FH_FLAG_CPU_ENDIAN;
+	fh->fb.version = OVL_FH_VERSION;
+	fh->fb.magic = OVL_FH_MAGIC;
+	fh->fb.type = fh_type;
+	fh->fb.flags = OVL_FH_FLAG_CPU_ENDIAN;
 	/*
 	 * When we will want to decode an overlay dentry from this handle
 	 * and all layers are on the same fs, if we get a disconncted real
@@ -270,10 +272,10 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
 	 * it to upperdentry or to lowerstack is by checking this flag.
 	 */
 	if (is_upper)
-		fh->flags |= OVL_FH_FLAG_PATH_UPPER;
-	fh->len = fh_len;
-	fh->uuid = *uuid;
-	memcpy(fh->fid, buf, buflen);
+		fh->fb.flags |= OVL_FH_FLAG_PATH_UPPER;
+	fh->fb.len = fh_len - OVL_FH_WIRE_OFFSET;
+	fh->fb.uuid = *uuid;
+	memcpy(fh->fb.fid, buf, buflen);
 
 out:
 	kfree(buf);
@@ -300,8 +302,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 	/*
 	 * Do not fail when upper doesn't support xattrs.
 	 */
-	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh,
-				 fh ? fh->len : 0, 0);
+	err = ovl_check_setxattr(dentry, upper, OVL_XATTR_ORIGIN, fh->buf,
+				 fh ? fh->fb.len : 0, 0);
 	kfree(fh);
 
 	return err;
@@ -317,7 +319,7 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
 	if (IS_ERR(fh))
 		return PTR_ERR(fh);
 
-	err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh, fh->len, 0);
+	err = ovl_do_setxattr(index, OVL_XATTR_UPPER, fh->buf, fh->fb.len, 0);
 
 	kfree(fh);
 	return err;
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 73c9775215b3..7408ac2f3a04 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -231,11 +231,12 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 		return PTR_ERR(fh);
 
 	err = -EOVERFLOW;
-	if (fh->len > buflen)
+	if (fh->fb.len + OVL_FH_WIRE_OFFSET > buflen)
 		goto fail;
 
-	memcpy(buf, (char *)fh, fh->len);
-	err = fh->len;
+	buflen = fh->fb.len + OVL_FH_WIRE_OFFSET;
+	memcpy(buf, fh, buflen);
+	err = buflen;
 
 out:
 	kfree(fh);
@@ -243,8 +244,8 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
 
 fail:
 	pr_warn_ratelimited("overlayfs: failed to encode file handle (%pd2, err=%i, buflen=%d, len=%d, type=%d)\n",
-			    dentry, err, buflen, fh ? (int)fh->len : 0,
-			    fh ? fh->type : 0);
+			    dentry, err, buflen, fh ? (int)fh->fb.len : 0,
+			    fh ? fh->fb.type : 0);
 	goto out;
 }
 
@@ -256,11 +257,8 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
 	if (res <= 0)
 		return FILEID_INVALID;
 
-	len = res;
-
-	/* Round up to dwords */
-	*max_len = (len + 3) >> 2;
-	return OVL_FILEID;
+	*max_len = res >> 2;
+	return OVL_FILEID_V1;
 }
 
 static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len,
@@ -787,14 +785,14 @@ static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid,
 	int err;
 
 	err = -EINVAL;
-	if (fh_type != OVL_FILEID)
+	if (fh_type != OVL_FILEID_V1)
 		goto out_err;
 
 	err = ovl_check_fh_len(fh, len);
 	if (err)
 		goto out_err;
 
-	flags = fh->flags;
+	flags = fh->fb.flags;
 	dentry = (flags & OVL_FH_FLAG_PATH_UPPER) ?
 		 ovl_upper_fh_to_d(sb, fh) :
 		 ovl_lower_fh_to_d(sb, fh);
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index f47c591402d7..0d193faa740c 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -84,21 +84,21 @@ static int ovl_acceptable(void *ctx, struct dentry *dentry)
  * Return -ENODATA for "origin unknown".
  * Return <0 for an invalid file handle.
  */
-int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
+int ovl_check_fb_len(struct ovl_fb *fb, int fb_len)
 {
-	if (fh_len < sizeof(struct ovl_fh) || fh_len < fh->len)
+	if (fb_len < sizeof(struct ovl_fb) || fb_len < fb->len)
 		return -EINVAL;
 
-	if (fh->magic != OVL_FH_MAGIC)
+	if (fb->magic != OVL_FH_MAGIC)
 		return -EINVAL;
 
 	/* Treat larger version and unknown flags as "origin unknown" */
-	if (fh->version > OVL_FH_VERSION || fh->flags & ~OVL_FH_FLAG_ALL)
+	if (fb->version > OVL_FH_VERSION || fb->flags & ~OVL_FH_FLAG_ALL)
 		return -ENODATA;
 
 	/* Treat endianness mismatch as "origin unknown" */
-	if (!(fh->flags & OVL_FH_FLAG_ANY_ENDIAN) &&
-	    (fh->flags & OVL_FH_FLAG_BIG_ENDIAN) != OVL_FH_FLAG_CPU_ENDIAN)
+	if (!(fb->flags & OVL_FH_FLAG_ANY_ENDIAN) &&
+	    (fb->flags & OVL_FH_FLAG_BIG_ENDIAN) != OVL_FH_FLAG_CPU_ENDIAN)
 		return -ENODATA;
 
 	return 0;
@@ -119,15 +119,15 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name)
 	if (res == 0)
 		return NULL;
 
-	fh = kzalloc(res, GFP_KERNEL);
+	fh = kzalloc(res + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
 	if (!fh)
 		return ERR_PTR(-ENOMEM);
 
-	res = vfs_getxattr(dentry, name, fh, res);
+	res = vfs_getxattr(dentry, name, fh->buf, res);
 	if (res < 0)
 		goto fail;
 
-	err = ovl_check_fh_len(fh, res);
+	err = ovl_check_fb_len(&fh->fb, res);
 	if (err < 0) {
 		if (err == -ENODATA)
 			goto out;
@@ -158,12 +158,12 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 	 * Make sure that the stored uuid matches the uuid of the lower
 	 * layer where file handle will be decoded.
 	 */
-	if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid))
+	if (!uuid_equal(&fh->fb.uuid, &mnt->mnt_sb->s_uuid))
 		return NULL;
 
-	bytes = (fh->len - offsetof(struct ovl_fh, fid));
-	real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
-				  bytes >> 2, (int)fh->type,
+	bytes = (fh->fb.len - offsetof(struct ovl_fb, fid));
+	real = exportfs_decode_fh(mnt, (struct fid *)fh->fb.fid,
+				  bytes >> 2, (int)fh->fb.type,
 				  connected ? ovl_acceptable : NULL, mnt);
 	if (IS_ERR(real)) {
 		/*
@@ -173,7 +173,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 		 * index entries correctly.
 		 */
 		if (real == ERR_PTR(-ESTALE) &&
-		    !(fh->flags & OVL_FH_FLAG_PATH_UPPER))
+		    !(fh->fb.flags & OVL_FH_FLAG_PATH_UPPER))
 			real = NULL;
 		return real;
 	}
@@ -410,7 +410,7 @@ static int ovl_verify_fh(struct dentry *dentry, const char *name,
 	if (IS_ERR(ofh))
 		return PTR_ERR(ofh);
 
-	if (fh->len != ofh->len || memcmp(fh, ofh, fh->len))
+	if (fh->fb.len != ofh->fb.len || memcmp(fh->buf, ofh->buf, fh->fb.len))
 		err = -ESTALE;
 
 	kfree(ofh);
@@ -441,7 +441,7 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,
 
 	err = ovl_verify_fh(dentry, name, fh);
 	if (set && err == -ENODATA)
-		err = ovl_do_setxattr(dentry, name, fh, fh->len, 0);
+		err = ovl_do_setxattr(dentry, name, fh->buf, fh->fb.len, 0);
 	if (err)
 		goto fail;
 
@@ -515,17 +515,17 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)
 		goto fail;
 
 	err = -EINVAL;
-	if (index->d_name.len < sizeof(struct ovl_fh)*2)
+	if (index->d_name.len < sizeof(struct ovl_fb)*2)
 		goto fail;
 
 	err = -ENOMEM;
 	len = index->d_name.len / 2;
-	fh = kzalloc(len, GFP_KERNEL);
+	fh = kzalloc(len + OVL_FH_WIRE_OFFSET, GFP_KERNEL);
 	if (!fh)
 		goto fail;
 
 	err = -EINVAL;
-	if (hex2bin((u8 *)fh, index->d_name.name, len))
+	if (hex2bin(fh->buf, index->d_name.name, len))
 		goto fail;
 
 	err = ovl_check_fh_len(fh, len);
@@ -607,11 +607,11 @@ static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name)
 {
 	char *n, *s;
 
-	n = kcalloc(fh->len, 2, GFP_KERNEL);
+	n = kcalloc(fh->fb.len, 2, GFP_KERNEL);
 	if (!n)
 		return -ENOMEM;
 
-	s  = bin2hex(n, fh, fh->len);
+	s  = bin2hex(n, fh->buf, fh->fb.len);
 	*name = (struct qstr) QSTR_INIT(n, s - n);
 
 	return 0;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6934bcf030f0..a1ccfbe99b50 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -71,11 +71,12 @@ enum ovl_entry_flag {
 #error Endianness not defined
 #endif
 
-/* The type returned by overlay exportfs ops when encoding an ovl_fh handle */
-#define OVL_FILEID	0xfb
+/* The type returned by overlay exportfs ops when encoding ovl_fh handles */
+#define OVL_FILEID_V0	0xfb
+#define OVL_FILEID_V1	0xfa
 
-/* On-disk and in-memeory format for redirect by file handle */
-struct ovl_fh {
+/* On-disk format for "origin" file handle */
+struct ovl_fb {
 	u8 version;	/* 0 */
 	u8 magic;	/* 0xfb */
 	u8 len;		/* size of this header + size of fid */
@@ -85,6 +86,18 @@ struct ovl_fh {
 	u8 fid[0];	/* file identifier */
 } __packed;
 
+/* In-memory and on-wire format for overlay file handle */
+struct ovl_fh {
+	u8 padding[3];	/* make sure fb.fid is 32bit aligned */
+	union {
+		struct ovl_fb fb;
+		char buf[0];
+	};
+} __packed;
+
+#define OVL_FH_WIRE_OFFSET offsetof(struct ovl_fh, fb)
+#define OVL_FH_FID_OFFSET (OVL_FH_WIRE_OFFSET + offsetof(struct ovl_fb, fid))
+
 static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	int err = vfs_rmdir(dir, dentry);
@@ -302,7 +315,13 @@ static inline void ovl_inode_unlock(struct inode *inode)
 
 
 /* namei.c */
-int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
+int ovl_check_fb_len(struct ovl_fb *fb, int fb_len);
+
+static inline int ovl_check_fh_len(struct ovl_fh *fh, int fh_len)
+{
+	return ovl_check_fb_len(&fh->fb, fh_len - OVL_FH_WIRE_OFFSET);
+}
+
 struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 				  bool connected);
 int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 23:13     ` Amir Goldstein
@ 2019-11-14 23:49       ` Al Viro
  2019-11-15  6:07         ` Amir Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-11-14 23:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, linux-fsdevel, Linux NFS Mailing List,
	J. Bruce Fields, overlayfs

On Fri, Nov 15, 2019 at 01:13:15AM +0200, Amir Goldstein wrote:

> See attached.
> IMHO it looks much easier to verify that these changes are correct
> compared to your open coded offset shifting all over the place.
> 
> It even passed the exportfs tests first try.
> Only some index tests are failing.
> 
> If you like this version, I can fix up the failures and add Al's
> suggestions to simplify code with OVL_FH_MAX_SIZE
> memory allocations.

Huh?  Correct me if I'm wrong, but doesn't that patch make it
reject all old fhandles just on the type check?  That includes
anything sent over the wire, or stored in xattrs, or represented
as names in indexdir...

_If_ we can change the fhandle layout at will, everything's
easy - you can add padding in any way you like.  But fhandles
are a lot more permanent than this approach would require -
mere rebooting the server into a new kernel must not make the
clients see -ESTALE on everything they'd imported from it!
And then there's implied "you can throw indexdir contents at
any time", which is also not a good thing to do.

Sure, introducing a variant with better layout would be nice,
and using a new type for it is pretty much required, but
you can't just discard old-layout fhandles you'd ever issued.

I'm afraid it's a lot stickier than you think; decisions on
fhandle layout are nearly as permanent as those on the
storage layouts for a filesystem.  Especially in case of
overlayfs, where they are literally a part of the storage
layout - the names in indexdir and the contents of
trusted.overlay.upper xattr on subdirectories in there...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] is ovl_fh->fid really intended to be misaligned?
  2019-11-14 23:49       ` Al Viro
@ 2019-11-15  6:07         ` Amir Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2019-11-15  6:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, linux-fsdevel, Linux NFS Mailing List,
	J. Bruce Fields, overlayfs

On Fri, Nov 15, 2019 at 1:49 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Nov 15, 2019 at 01:13:15AM +0200, Amir Goldstein wrote:
>
> > See attached.
> > IMHO it looks much easier to verify that these changes are correct
> > compared to your open coded offset shifting all over the place.
> >
> > It even passed the exportfs tests first try.
> > Only some index tests are failing.
> >
> > If you like this version, I can fix up the failures and add Al's
> > suggestions to simplify code with OVL_FH_MAX_SIZE
> > memory allocations.
>
> Huh?  Correct me if I'm wrong, but doesn't that patch make it
> reject all old fhandles just on the type check?  That includes
> anything sent over the wire, or stored in xattrs, or represented
> as names in indexdir...

That's not what the patch does.
It does reject OVL_FILEID_V0 in ovl_fh_to_dentry(), but
that can easily be fixed to convert them on-the-fly.
I just left this part out.

The patch does not change on-disk format, not in xattr
and not in index names - or not intentionally anyway, see:

ovl_get_fh():
...
// reading xattr into aligned buffer at offset 3
        res = vfs_getxattr(dentry, name, fh->buf, res);
        if (res < 0)
                goto fail;
...
// checking old v0 on-disk format
        err = ovl_check_fb_len(&fh->fb, res);
...

ovl_verify_set_fh():
...
// encode correctly aligned buffer
        fh = ovl_encode_real_fh(real, is_upper);
...
// store old format into xattr
                err = ovl_do_setxattr(dentry, name, fh->buf, fh->fb.len, 0);

Similarly in ovl_get_index_name_fh():
// here was a bug
        n = kcalloc(fh->fb.len + OVL_FH_WIRE_OFFSET, 2, GFP_KERNEL);
...
// hex encode the old format
        s  = bin2hex(n, fh->buf, fh->fb.len);

>
> _If_ we can change the fhandle layout at will, everything's
> easy - you can add padding in any way you like.  But fhandles
> are a lot more permanent than this approach would require -
> mere rebooting the server into a new kernel must not make the
> clients see -ESTALE on everything they'd imported from it!
> And then there's implied "you can throw indexdir contents at
> any time", which is also not a good thing to do.
>

I think I understand the confusion.
There is no requirement that the file handles stored on-disk
will be the same format as those exported on wire.
The file handles coming from wire are rarely compared
with those on-disk and vice versa. IIRC, there is a single
place that uses an fh from wire to lookup on-disk:

ovl_lower_fh_to_d():
...
         /* Then lookup indexed upper/whiteout by origin fh */
        if (ofs->indexdir) {
                index = ovl_get_index_fh(ofs, fh);

But that is the exception.
Most of the time, the on-disk fh are used to make sure that
union inode numbers are persistently unique across the overlay.

> Sure, introducing a variant with better layout would be nice,
> and using a new type for it is pretty much required, but
> you can't just discard old-layout fhandles you'd ever issued.
>

In truth, we have ample evidence that ovl nfs export is not
widely in use. We know that docker and such have disabled
index because it exposed mount leak bugs that they have.
And we got too few nfs export bug reports (single reporter
IIRC).
But still, as I said, it is quite easy to respect OVL_FILEID_V0
file handles that were handed out to nfs clients after upgrade.

Unless I am still missing something...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 15:47 [RFC] is ovl_fh->fid really intended to be misaligned? Al Viro
2019-11-14 17:02 ` J. Bruce Fields
2019-11-14 19:55 ` Miklos Szeredi
2019-11-14 20:07   ` Amir Goldstein
2019-11-14 23:13     ` Amir Goldstein
2019-11-14 23:49       ` Al Viro
2019-11-15  6:07         ` Amir Goldstein
2019-11-14 20:55   ` Al Viro

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git