All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
@ 2017-11-30 20:56 NeilBrown
  2017-12-04  3:27 ` [PATCH] fhandle: avoid -EINVAL if requested size is too large NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: NeilBrown @ 2017-11-30 20:56 UTC (permalink / raw)
  To: Trond Myklebust, Anna.Schumaker, Al Viro, Andrew Morton, Linus Torvalds
  Cc: lkml, linux-nfs, linux-fsdevel, Lennart Poettering

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]


Amazon EFS provides an NFSv4.1 filesystem which appears to use
(close to) full length (128 byte) file handles.
This causes the handle reported by name_to_handle_at() to exceed
MAX_HANDLE_SZ, resulting in
  EOVERFLOW if 128 bytes were requested, or
  EINVAL if the size reported by the previous call was requested.

To fix this, increase MAX_HANDLE_SIZE a little, and add a BUILD_BUG
so that this sort of inconsistent error reporting won't happen again.

Link: https://github.com/systemd/systemd/issues/7082#issuecomment-347380436
Signed-off-by: NeilBrown <neilb@suse.com>
---

Sorry for the scatter-gun To: list.  This is really more of a VFS patch
than an NFS patch and sending patches in either direction has been a bit
hit-and-miss lately, so I'm hoping Andrew will take it unless someone
else objects or beats him to it...

Thanks,
NeilBrown

 fs/nfs/export.c          | 2 ++
 include/linux/exportfs.h | 7 +++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 83fd09fc8f77..23b2fc3ab2bb 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -39,6 +39,8 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
 	size_t fh_size = offsetof(struct nfs_fh, data) + server_fh->size;
 	int len = EMBED_FH_OFF + XDR_QUADLEN(fh_size);
 
+	BUILD_BUG_ON_MSG(EMBED_FH_OFF + NFS_MAXFHSIZE > MAX_HANDLE_SZ,
+		"MAX_HANDLE_SZ is too small");
 	dprintk("%s: max fh len %d inode %p parent %p",
 		__func__, *max_len, inode, parent);
 
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 0d3037419bc7..e7ab1b071c5e 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -11,8 +11,11 @@ struct iomap;
 struct super_block;
 struct vfsmount;
 
-/* limit the handle size to NFSv4 handle size now */
-#define MAX_HANDLE_SZ 128
+/* Must be larger than NFSv4 file handle, but small
+ * enough for an on-stack allocation. overlayfs doesn't
+ * want this too close to 255.
+ */
+#define MAX_HANDLE_SZ 200
 
 /*
  * The fileid_type identifies how the file within the filesystem is encoded.
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] fhandle: avoid -EINVAL if requested size is too large.
  2017-11-30 20:56 [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS NeilBrown
@ 2017-12-04  3:27 ` NeilBrown
  2017-12-06 19:05 ` [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS J. Bruce Fields
  2017-12-07  2:07 ` Linus Torvalds
  2 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2017-12-04  3:27 UTC (permalink / raw)
  To: Trond Myklebust, Anna.Schumaker, Al Viro, Andrew Morton, Linus Torvalds
  Cc: lkml, linux-nfs, linux-fsdevel, Lennart Poettering

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]


Lennart Poettering observes that if the newly increased
MAX_HANDLE_SZ is exported to user space, and then used in an
application running on an old kernel, name_to_handle_at() will
report -EINVAL, which is unhelpful and inconsistent with
the documentation.

So:
 1/ add a comment making it clear that the new value must
    not be exposed to user-space.
 2/ remove the completely unnecessary restriction on the size
    of buffer provided by the application.

Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: NeilBrown <neilb@suse.com>
---

This is a followup to the previous fhandle-for-Amazon-EFS
patch which adds further refinements.

Thanks,
NeilBrown


 fs/fhandle.c             | 2 +-
 include/linux/exportfs.h | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 0ace128f5d23..30bf1c20e143 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -35,7 +35,7 @@ static long do_sys_name_to_handle(struct path *path,
 		return -EFAULT;
 
 	if (f_handle.handle_bytes > MAX_HANDLE_SZ)
-		return -EINVAL;
+		f_handle.handle_bytes = MAX_HANDLE_SZ;
 
 	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
 			 GFP_KERNEL);
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index e7ab1b071c5e..097704e80c89 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -14,6 +14,10 @@ struct vfsmount;
 /* Must be larger than NFSv4 file handle, but small
  * enough for an on-stack allocation. overlayfs doesn't
  * want this too close to 255.
+ * NOTE: This value MUST NOT be exported to user-space.
+ * Applications must only ever see MAX_HANDLE_SZ == 128.
+ * If they try a larger number on older kernels, they
+ * will get -EINVAL which will be confusing.
  */
 #define MAX_HANDLE_SZ 200
 
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-11-30 20:56 [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS NeilBrown
  2017-12-04  3:27 ` [PATCH] fhandle: avoid -EINVAL if requested size is too large NeilBrown
@ 2017-12-06 19:05 ` J. Bruce Fields
  2017-12-07  2:07 ` Linus Torvalds
  2 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2017-12-06 19:05 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, Anna.Schumaker, Al Viro, Andrew Morton,
	Linus Torvalds, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering

On Fri, Dec 01, 2017 at 07:56:38AM +1100, NeilBrown wrote:
> 
> Amazon EFS provides an NFSv4.1 filesystem which appears to use
> (close to) full length (128 byte) file handles.

I wonder why?

Anyway, it seems  unfortunate that systemd should need to worry about
filehandle sizes at all, given that (as far as I can tell) all it's
really calling name_to_handle_at() for is the mount_id, so it can
recognize mountpoints.

Userland NFS servers were a major motivation for the filehandle code.
And filehandles larger than 128 bytes aren't much use to them.
Hopefully they'll be able to identify the problem and fail early on.

Oh well.  ACK to both patches from me.

--b.

> This causes the handle reported by name_to_handle_at() to exceed
> MAX_HANDLE_SZ, resulting in
>   EOVERFLOW if 128 bytes were requested, or
>   EINVAL if the size reported by the previous call was requested.
> 
> To fix this, increase MAX_HANDLE_SIZE a little, and add a BUILD_BUG
> so that this sort of inconsistent error reporting won't happen again.
> 
> Link: https://github.com/systemd/systemd/issues/7082#issuecomment-347380436
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Sorry for the scatter-gun To: list.  This is really more of a VFS patch
> than an NFS patch and sending patches in either direction has been a bit
> hit-and-miss lately, so I'm hoping Andrew will take it unless someone
> else objects or beats him to it...
> 
> Thanks,
> NeilBrown
> 
>  fs/nfs/export.c          | 2 ++
>  include/linux/exportfs.h | 7 +++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> index 83fd09fc8f77..23b2fc3ab2bb 100644
> --- a/fs/nfs/export.c
> +++ b/fs/nfs/export.c
> @@ -39,6 +39,8 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
>  	size_t fh_size = offsetof(struct nfs_fh, data) + server_fh->size;
>  	int len = EMBED_FH_OFF + XDR_QUADLEN(fh_size);
>  
> +	BUILD_BUG_ON_MSG(EMBED_FH_OFF + NFS_MAXFHSIZE > MAX_HANDLE_SZ,
> +		"MAX_HANDLE_SZ is too small");
>  	dprintk("%s: max fh len %d inode %p parent %p",
>  		__func__, *max_len, inode, parent);
>  
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 0d3037419bc7..e7ab1b071c5e 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -11,8 +11,11 @@ struct iomap;
>  struct super_block;
>  struct vfsmount;
>  
> -/* limit the handle size to NFSv4 handle size now */
> -#define MAX_HANDLE_SZ 128
> +/* Must be larger than NFSv4 file handle, but small
> + * enough for an on-stack allocation. overlayfs doesn't
> + * want this too close to 255.
> + */
> +#define MAX_HANDLE_SZ 200
>  
>  /*
>   * The fileid_type identifies how the file within the filesystem is encoded.
> -- 
> 2.14.0.rc0.dirty
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-11-30 20:56 [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS NeilBrown
  2017-12-04  3:27 ` [PATCH] fhandle: avoid -EINVAL if requested size is too large NeilBrown
  2017-12-06 19:05 ` [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS J. Bruce Fields
@ 2017-12-07  2:07 ` Linus Torvalds
  2017-12-07  3:20   ` NeilBrown
  2 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2017-12-07  2:07 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, Anna Schumaker, Al Viro, Andrew Morton, lkml,
	linux-nfs, linux-fsdevel, Lennart Poettering

On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>
> -/* limit the handle size to NFSv4 handle size now */
> -#define MAX_HANDLE_SZ 128
> +/* Must be larger than NFSv4 file handle, but small
> + * enough for an on-stack allocation. overlayfs doesn't
> + * want this too close to 255.
> + */
> +#define MAX_HANDLE_SZ 200

This really smells for so many reasons.

Also, that really is starting to be a fairly big stack allocation, and
it seems to be used in exactly one place (show_mark_fhandle), which
makes me go "why is that on the stack anyway?".

Could we just allocate a buffer at open time or something?

               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-07  2:07 ` Linus Torvalds
@ 2017-12-07  3:20   ` NeilBrown
  2017-12-07  4:04     ` Amir Goldstein
  2017-12-07  5:34     ` Matthew Wilcox
  0 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2017-12-07  3:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Trond Myklebust, Anna Schumaker, Al Viro, Andrew Morton, lkml,
	linux-nfs, linux-fsdevel, Lennart Poettering

[-- Attachment #1: Type: text/plain, Size: 5602 bytes --]

On Wed, Dec 06 2017, Linus Torvalds wrote:

> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>>
>> -/* limit the handle size to NFSv4 handle size now */
>> -#define MAX_HANDLE_SZ 128
>> +/* Must be larger than NFSv4 file handle, but small
>> + * enough for an on-stack allocation. overlayfs doesn't
>> + * want this too close to 255.
>> + */
>> +#define MAX_HANDLE_SZ 200
>
> This really smells for so many reasons.
>
> Also, that really is starting to be a fairly big stack allocation, and
> it seems to be used in exactly one place (show_mark_fhandle), which
> makes me go "why is that on the stack anyway?".
>
> Could we just allocate a buffer at open time or something?
>
>                Linus

"open time" would be when /proc/X/fdinfo/Y was opened in
seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.

We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
the earliest 'notify' specific code to run.  There is no
opportunity to return an error but GFP_KERNEL allocations under 1 page
never fail..

This patch allocates a single buffer for all inodes reported for a given
inotify fdinfo, and if the allocation files, the filehandle is silently
left blank.  More surgery would be needed to be able to return an error.

Is that at all suitable?

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Subject: fs/notify: don't put file handle buffer on stack.

A file handle buffer is not tiny, and could need to be larger in future,
so it isn't safe to allocate one on the stack.  Instead, we need to
kmalloc().

There is no way to return an error status from a ->show_fdinfo()
function, so if the kmalloc fails, we silently exclude the filehandle
from the output.  As it is at the end of line, this shouldn't
upset parsing too much.

Signed-off-by: NeilBrown <neilb@suse.com>

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index d478629c728b..20d863b9ae16 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -23,56 +23,58 @@
 
 static void show_fdinfo(struct seq_file *m, struct file *f,
 			void (*show)(struct seq_file *m,
-				     struct fsnotify_mark *mark))
+				     struct fsnotify_mark *mark,
+				     struct fid *fh))
 {
 	struct fsnotify_group *group = f->private_data;
 	struct fsnotify_mark *mark;
+	struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
 
 	mutex_lock(&group->mark_mutex);
 	list_for_each_entry(mark, &group->marks_list, g_list) {
-		show(m, mark);
+		show(m, mark, fh);
 		if (seq_has_overflowed(m))
 			break;
 	}
 	mutex_unlock(&group->mark_mutex);
+	kfree(fh);
 }
 
 #if defined(CONFIG_EXPORTFS)
-static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
+static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
+			      struct fid *fhbuf)
 {
-	struct {
-		struct file_handle handle;
-		u8 pad[MAX_HANDLE_SZ];
-	} f;
 	int size, ret, i;
+	unsigned char *bytes;
 
-	f.handle.handle_bytes = sizeof(f.pad);
-	size = f.handle.handle_bytes >> 2;
+	if (!fhbuf)
+		return;
+	size = MAX_HANDLE_SZ >> 2;
 
-	ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
+	ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
 	if ((ret == FILEID_INVALID) || (ret < 0)) {
 		WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
 		return;
 	}
 
-	f.handle.handle_type = ret;
-	f.handle.handle_bytes = size * sizeof(u32);
-
-	seq_printf(m, "fhandle-bytes:%x fhandle-type:%x f_handle:",
-		   f.handle.handle_bytes, f.handle.handle_type);
+	seq_printf(m, "fhandle-bytes:%zx fhandle-type:%x f_handle:",
+		   size * sizeof(u32), ret);
 
-	for (i = 0; i < f.handle.handle_bytes; i++)
-		seq_printf(m, "%02x", (int)f.handle.f_handle[i]);
+	bytes = (unsigned char *)(fhbuf->raw);
+	for (i = 0; i < size * sizeof(u32); i++)
+		seq_printf(m, "%02x", bytes[i]);
 }
 #else
-static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
+static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
+			      struct fid *fhbuf)
 {
 }
 #endif
 
 #ifdef CONFIG_INOTIFY_USER
 
-static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
+static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark,
+			   struct fid *fhbuf)
 {
 	struct inotify_inode_mark *inode_mark;
 	struct inode *inode;
@@ -93,7 +95,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 		seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:%x ",
 			   inode_mark->wd, inode->i_ino, inode->i_sb->s_dev,
 			   mask, mark->ignored_mask);
-		show_mark_fhandle(m, inode);
+		show_mark_fhandle(m, inode, fhbuf);
 		seq_putc(m, '\n');
 		iput(inode);
 	}
@@ -108,7 +110,8 @@ void inotify_show_fdinfo(struct seq_file *m, struct file *f)
 
 #ifdef CONFIG_FANOTIFY
 
-static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
+static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark,
+			    struct fid *fhbuf)
 {
 	unsigned int mflags = 0;
 	struct inode *inode;
@@ -123,7 +126,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 		seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ",
 			   inode->i_ino, inode->i_sb->s_dev,
 			   mflags, mark->mask, mark->ignored_mask);
-		show_mark_fhandle(m, inode);
+		show_mark_fhandle(m, inode, fhbuf);
 		seq_putc(m, '\n');
 		iput(inode);
 	} else if (mark->connector->flags & FSNOTIFY_OBJ_TYPE_VFSMOUNT) {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-07  3:20   ` NeilBrown
@ 2017-12-07  4:04     ` Amir Goldstein
  2017-12-08  2:17       ` NeilBrown
  2017-12-07  5:34     ` Matthew Wilcox
  1 sibling, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2017-12-07  4:04 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Trond Myklebust, Anna Schumaker, Al Viro,
	Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering, Pavel Emelyanov, Jan Kara

On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
> On Wed, Dec 06 2017, Linus Torvalds wrote:
>
>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>>>
>>> -/* limit the handle size to NFSv4 handle size now */
>>> -#define MAX_HANDLE_SZ 128
>>> +/* Must be larger than NFSv4 file handle, but small
>>> + * enough for an on-stack allocation. overlayfs doesn't
>>> + * want this too close to 255.
>>> + */
>>> +#define MAX_HANDLE_SZ 200
>>
>> This really smells for so many reasons.
>>
>> Also, that really is starting to be a fairly big stack allocation, and
>> it seems to be used in exactly one place (show_mark_fhandle), which
>> makes me go "why is that on the stack anyway?".
>>
>> Could we just allocate a buffer at open time or something?
>>
>>                Linus
>
> "open time" would be when /proc/X/fdinfo/Y was opened in
> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
>
> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
> the earliest 'notify' specific code to run.  There is no
> opportunity to return an error but GFP_KERNEL allocations under 1 page
> never fail..
>
> This patch allocates a single buffer for all inodes reported for a given
> inotify fdinfo, and if the allocation files, the filehandle is silently
> left blank.  More surgery would be needed to be able to return an error.
>
> Is that at all suitable?
>
> Thanks,
> NeilBrown
>
> From: NeilBrown <neilb@suse.com>
> Subject: fs/notify: don't put file handle buffer on stack.
>
> A file handle buffer is not tiny, and could need to be larger in future,
> so it isn't safe to allocate one on the stack.  Instead, we need to
> kmalloc().
>
> There is no way to return an error status from a ->show_fdinfo()
> function, so if the kmalloc fails, we silently exclude the filehandle
> from the output.  As it is at the end of line, this shouldn't
> upset parsing too much.

It shouldn't upset parsing because that would be the same out
output as without CONFIG_EXPORTFS. AFAIK this information
is used by CRUI.

>
> Signed-off-by: NeilBrown <neilb@suse.com>
>
> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> index d478629c728b..20d863b9ae16 100644
> --- a/fs/notify/fdinfo.c
> +++ b/fs/notify/fdinfo.c
> @@ -23,56 +23,58 @@
>
>  static void show_fdinfo(struct seq_file *m, struct file *f,
>                         void (*show)(struct seq_file *m,
> -                                    struct fsnotify_mark *mark))
> +                                    struct fsnotify_mark *mark,
> +                                    struct fid *fh))
>  {
>         struct fsnotify_group *group = f->private_data;
>         struct fsnotify_mark *mark;
> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
>
>         mutex_lock(&group->mark_mutex);
>         list_for_each_entry(mark, &group->marks_list, g_list) {
> -               show(m, mark);
> +               show(m, mark, fh);
>                 if (seq_has_overflowed(m))
>                         break;
>         }
>         mutex_unlock(&group->mark_mutex);
> +       kfree(fh);
>  }
>
>  #if defined(CONFIG_EXPORTFS)
> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
> +                             struct fid *fhbuf)
>  {
> -       struct {
> -               struct file_handle handle;
> -               u8 pad[MAX_HANDLE_SZ];
> -       } f;
>         int size, ret, i;
> +       unsigned char *bytes;
>
> -       f.handle.handle_bytes = sizeof(f.pad);
> -       size = f.handle.handle_bytes >> 2;
> +       if (!fhbuf)
> +               return;
> +       size = MAX_HANDLE_SZ >> 2;
>
> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
>         if ((ret == FILEID_INVALID) || (ret < 0)) {
>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);

This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
to watch over fs that doesn't support exportfs. Care to clean it up?
Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-07  3:20   ` NeilBrown
  2017-12-07  4:04     ` Amir Goldstein
@ 2017-12-07  5:34     ` Matthew Wilcox
  1 sibling, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2017-12-07  5:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linus Torvalds, Trond Myklebust, Anna Schumaker, Al Viro,
	Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering

On Thu, Dec 07, 2017 at 02:20:05PM +1100, NeilBrown wrote:
> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
> the earliest 'notify' specific code to run.  There is no
> opportunity to return an error but GFP_KERNEL allocations under 1 page
> never fail..

"never"

 * The default allocator behavior depends on the request size. We have a concept
 * of so called costly allocations (with order > PAGE_ALLOC_COSTLY_ORDER).
 * !costly allocations are too essential to fail so they are implicitly
 * non-failing by default (with some exceptions like OOM victims might fail so
 * the caller still has to check for failures) while costly requests try to be
 * not disruptive and back off even without invoking the OOM killer.
 * The following three modifiers might be used to override some of these
 * implicit rules

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-07  4:04     ` Amir Goldstein
@ 2017-12-08  2:17       ` NeilBrown
  2017-12-19 12:42           ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2017-12-08  2:17 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Linus Torvalds, Trond Myklebust, Anna Schumaker, Al Viro,
	Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering, Pavel Emelyanov, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 5019 bytes --]

On Thu, Dec 07 2017, Amir Goldstein wrote:

> On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
>> On Wed, Dec 06 2017, Linus Torvalds wrote:
>>
>>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>>>>
>>>> -/* limit the handle size to NFSv4 handle size now */
>>>> -#define MAX_HANDLE_SZ 128
>>>> +/* Must be larger than NFSv4 file handle, but small
>>>> + * enough for an on-stack allocation. overlayfs doesn't
>>>> + * want this too close to 255.
>>>> + */
>>>> +#define MAX_HANDLE_SZ 200
>>>
>>> This really smells for so many reasons.
>>>
>>> Also, that really is starting to be a fairly big stack allocation, and
>>> it seems to be used in exactly one place (show_mark_fhandle), which
>>> makes me go "why is that on the stack anyway?".
>>>
>>> Could we just allocate a buffer at open time or something?
>>>
>>>                Linus
>>
>> "open time" would be when /proc/X/fdinfo/Y was opened in
>> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
>>
>> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
>> the earliest 'notify' specific code to run.  There is no
>> opportunity to return an error but GFP_KERNEL allocations under 1 page
>> never fail..
>>
>> This patch allocates a single buffer for all inodes reported for a given
>> inotify fdinfo, and if the allocation files, the filehandle is silently
>> left blank.  More surgery would be needed to be able to return an error.
>>
>> Is that at all suitable?
>>
>> Thanks,
>> NeilBrown
>>
>> From: NeilBrown <neilb@suse.com>
>> Subject: fs/notify: don't put file handle buffer on stack.
>>
>> A file handle buffer is not tiny, and could need to be larger in future,
>> so it isn't safe to allocate one on the stack.  Instead, we need to
>> kmalloc().
>>
>> There is no way to return an error status from a ->show_fdinfo()
>> function, so if the kmalloc fails, we silently exclude the filehandle
>> from the output.  As it is at the end of line, this shouldn't
>> upset parsing too much.
>
> It shouldn't upset parsing because that would be the same out
> output as without CONFIG_EXPORTFS. AFAIK this information
> is used by CRUI.
>
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>>
>> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
>> index d478629c728b..20d863b9ae16 100644
>> --- a/fs/notify/fdinfo.c
>> +++ b/fs/notify/fdinfo.c
>> @@ -23,56 +23,58 @@
>>
>>  static void show_fdinfo(struct seq_file *m, struct file *f,
>>                         void (*show)(struct seq_file *m,
>> -                                    struct fsnotify_mark *mark))
>> +                                    struct fsnotify_mark *mark,
>> +                                    struct fid *fh))
>>  {
>>         struct fsnotify_group *group = f->private_data;
>>         struct fsnotify_mark *mark;
>> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
>>
>>         mutex_lock(&group->mark_mutex);
>>         list_for_each_entry(mark, &group->marks_list, g_list) {
>> -               show(m, mark);
>> +               show(m, mark, fh);
>>                 if (seq_has_overflowed(m))
>>                         break;
>>         }
>>         mutex_unlock(&group->mark_mutex);
>> +       kfree(fh);
>>  }
>>
>>  #if defined(CONFIG_EXPORTFS)
>> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
>> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
>> +                             struct fid *fhbuf)
>>  {
>> -       struct {
>> -               struct file_handle handle;
>> -               u8 pad[MAX_HANDLE_SZ];
>> -       } f;
>>         int size, ret, i;
>> +       unsigned char *bytes;
>>
>> -       f.handle.handle_bytes = sizeof(f.pad);
>> -       size = f.handle.handle_bytes >> 2;
>> +       if (!fhbuf)
>> +               return;
>> +       size = MAX_HANDLE_SZ >> 2;
>>
>> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
>> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
>>         if ((ret == FILEID_INVALID) || (ret < 0)) {
>>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
>
> This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
> to watch over fs that doesn't support exportfs. Care to clean it up?
> Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?

If I were going to clean it up, I would need to do more than remove the
WARN_ONCE(), which almost certainly never fires.

exportfs_encode_inode_fh() should only be called if sb->s_export_op is
not NULL.
When it is NULL, it means that the filesystem doesn't support file
handles.
do_sys_name_to_handle() tests this, as does nfsd.  But this inotify code
doesn't.
So it can report a "file handle" for a file for which file handles
aren't supported.  It will use the default export_encode_fh which
reports i_ino and i_generation, which may or may not be stable or
meaningful.

So yes, this code does need a bit of cleaning up....

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-08  2:17       ` NeilBrown
@ 2017-12-19 12:42           ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2017-12-19 12:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Linus Torvalds, Trond Myklebust, Anna Schumaker,
	Al Viro, Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering, Pavel Emelyanov, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 5397 bytes --]

On Fri 08-12-17 13:17:31, NeilBrown wrote:
> On Thu, Dec 07 2017, Amir Goldstein wrote:
> 
> > On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
> >> On Wed, Dec 06 2017, Linus Torvalds wrote:
> >>
> >>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
> >>>>
> >>>> -/* limit the handle size to NFSv4 handle size now */
> >>>> -#define MAX_HANDLE_SZ 128
> >>>> +/* Must be larger than NFSv4 file handle, but small
> >>>> + * enough for an on-stack allocation. overlayfs doesn't
> >>>> + * want this too close to 255.
> >>>> + */
> >>>> +#define MAX_HANDLE_SZ 200
> >>>
> >>> This really smells for so many reasons.
> >>>
> >>> Also, that really is starting to be a fairly big stack allocation, and
> >>> it seems to be used in exactly one place (show_mark_fhandle), which
> >>> makes me go "why is that on the stack anyway?".
> >>>
> >>> Could we just allocate a buffer at open time or something?
> >>>
> >>>                Linus
> >>
> >> "open time" would be when /proc/X/fdinfo/Y was opened in
> >> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
> >>
> >> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
> >> the earliest 'notify' specific code to run.  There is no
> >> opportunity to return an error but GFP_KERNEL allocations under 1 page
> >> never fail..
> >>
> >> This patch allocates a single buffer for all inodes reported for a given
> >> inotify fdinfo, and if the allocation files, the filehandle is silently
> >> left blank.  More surgery would be needed to be able to return an error.
> >>
> >> Is that at all suitable?
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >> From: NeilBrown <neilb@suse.com>
> >> Subject: fs/notify: don't put file handle buffer on stack.
> >>
> >> A file handle buffer is not tiny, and could need to be larger in future,
> >> so it isn't safe to allocate one on the stack.  Instead, we need to
> >> kmalloc().
> >>
> >> There is no way to return an error status from a ->show_fdinfo()
> >> function, so if the kmalloc fails, we silently exclude the filehandle
> >> from the output.  As it is at the end of line, this shouldn't
> >> upset parsing too much.
> >
> > It shouldn't upset parsing because that would be the same out
> > output as without CONFIG_EXPORTFS. AFAIK this information
> > is used by CRUI.
> >
> >>
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >>
> >> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> >> index d478629c728b..20d863b9ae16 100644
> >> --- a/fs/notify/fdinfo.c
> >> +++ b/fs/notify/fdinfo.c
> >> @@ -23,56 +23,58 @@
> >>
> >>  static void show_fdinfo(struct seq_file *m, struct file *f,
> >>                         void (*show)(struct seq_file *m,
> >> -                                    struct fsnotify_mark *mark))
> >> +                                    struct fsnotify_mark *mark,
> >> +                                    struct fid *fh))
> >>  {
> >>         struct fsnotify_group *group = f->private_data;
> >>         struct fsnotify_mark *mark;
> >> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
> >>
> >>         mutex_lock(&group->mark_mutex);
> >>         list_for_each_entry(mark, &group->marks_list, g_list) {
> >> -               show(m, mark);
> >> +               show(m, mark, fh);
> >>                 if (seq_has_overflowed(m))
> >>                         break;
> >>         }
> >>         mutex_unlock(&group->mark_mutex);
> >> +       kfree(fh);
> >>  }
> >>
> >>  #if defined(CONFIG_EXPORTFS)
> >> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
> >> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
> >> +                             struct fid *fhbuf)
> >>  {
> >> -       struct {
> >> -               struct file_handle handle;
> >> -               u8 pad[MAX_HANDLE_SZ];
> >> -       } f;
> >>         int size, ret, i;
> >> +       unsigned char *bytes;
> >>
> >> -       f.handle.handle_bytes = sizeof(f.pad);
> >> -       size = f.handle.handle_bytes >> 2;
> >> +       if (!fhbuf)
> >> +               return;
> >> +       size = MAX_HANDLE_SZ >> 2;
> >>
> >> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
> >> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
> >>         if ((ret == FILEID_INVALID) || (ret < 0)) {
> >>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
> >
> > This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
> > to watch over fs that doesn't support exportfs. Care to clean it up?
> > Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?
> 
> If I were going to clean it up, I would need to do more than remove the
> WARN_ONCE(), which almost certainly never fires.
> 
> exportfs_encode_inode_fh() should only be called if sb->s_export_op is
> not NULL.
> When it is NULL, it means that the filesystem doesn't support file
> handles.
> do_sys_name_to_handle() tests this, as does nfsd.  But this inotify code
> doesn't.
> So it can report a "file handle" for a file for which file handles
> aren't supported.  It will use the default export_encode_fh which
> reports i_ino and i_generation, which may or may not be stable or
> meaningful.
> 
> So yes, this code does need a bit of cleaning up....

So something like the patch below?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fsnotify-Do-not-show-file-handles-for-unsupported-fi.patch --]
[-- Type: text/x-patch, Size: 935 bytes --]

>From 66a6c05ae2fbe6cfcb24ca3088de39885a6fa5b8 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 19 Dec 2017 13:38:54 +0100
Subject: [PATCH] fsnotify: Do not show file handles for unsupported
 filesystems

Filesystems not setting their s_export_op do not support file handles.
Do no try to encode them using exportfs_encode_inode_fh() since that may
fail or return garbage.

Reported-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fdinfo.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index d478629c728b..041c2b0cc145 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -46,6 +46,9 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
 	} f;
 	int size, ret, i;
 
+	if (!inode->i_sb->s_export_op)
+		return;
+
 	f.handle.handle_bytes = sizeof(f.pad);
 	size = f.handle.handle_bytes >> 2;
 
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
@ 2017-12-19 12:42           ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2017-12-19 12:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Linus Torvalds, Trond Myklebust, Anna Schumaker,
	Al Viro, Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering, Pavel Emelyanov, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 5397 bytes --]

On Fri 08-12-17 13:17:31, NeilBrown wrote:
> On Thu, Dec 07 2017, Amir Goldstein wrote:
> 
> > On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
> >> On Wed, Dec 06 2017, Linus Torvalds wrote:
> >>
> >>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
> >>>>
> >>>> -/* limit the handle size to NFSv4 handle size now */
> >>>> -#define MAX_HANDLE_SZ 128
> >>>> +/* Must be larger than NFSv4 file handle, but small
> >>>> + * enough for an on-stack allocation. overlayfs doesn't
> >>>> + * want this too close to 255.
> >>>> + */
> >>>> +#define MAX_HANDLE_SZ 200
> >>>
> >>> This really smells for so many reasons.
> >>>
> >>> Also, that really is starting to be a fairly big stack allocation, and
> >>> it seems to be used in exactly one place (show_mark_fhandle), which
> >>> makes me go "why is that on the stack anyway?".
> >>>
> >>> Could we just allocate a buffer at open time or something?
> >>>
> >>>                Linus
> >>
> >> "open time" would be when /proc/X/fdinfo/Y was opened in
> >> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
> >>
> >> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
> >> the earliest 'notify' specific code to run.  There is no
> >> opportunity to return an error but GFP_KERNEL allocations under 1 page
> >> never fail..
> >>
> >> This patch allocates a single buffer for all inodes reported for a given
> >> inotify fdinfo, and if the allocation files, the filehandle is silently
> >> left blank.  More surgery would be needed to be able to return an error.
> >>
> >> Is that at all suitable?
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >> From: NeilBrown <neilb@suse.com>
> >> Subject: fs/notify: don't put file handle buffer on stack.
> >>
> >> A file handle buffer is not tiny, and could need to be larger in future,
> >> so it isn't safe to allocate one on the stack.  Instead, we need to
> >> kmalloc().
> >>
> >> There is no way to return an error status from a ->show_fdinfo()
> >> function, so if the kmalloc fails, we silently exclude the filehandle
> >> from the output.  As it is at the end of line, this shouldn't
> >> upset parsing too much.
> >
> > It shouldn't upset parsing because that would be the same out
> > output as without CONFIG_EXPORTFS. AFAIK this information
> > is used by CRUI.
> >
> >>
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >>
> >> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> >> index d478629c728b..20d863b9ae16 100644
> >> --- a/fs/notify/fdinfo.c
> >> +++ b/fs/notify/fdinfo.c
> >> @@ -23,56 +23,58 @@
> >>
> >>  static void show_fdinfo(struct seq_file *m, struct file *f,
> >>                         void (*show)(struct seq_file *m,
> >> -                                    struct fsnotify_mark *mark))
> >> +                                    struct fsnotify_mark *mark,
> >> +                                    struct fid *fh))
> >>  {
> >>         struct fsnotify_group *group = f->private_data;
> >>         struct fsnotify_mark *mark;
> >> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
> >>
> >>         mutex_lock(&group->mark_mutex);
> >>         list_for_each_entry(mark, &group->marks_list, g_list) {
> >> -               show(m, mark);
> >> +               show(m, mark, fh);
> >>                 if (seq_has_overflowed(m))
> >>                         break;
> >>         }
> >>         mutex_unlock(&group->mark_mutex);
> >> +       kfree(fh);
> >>  }
> >>
> >>  #if defined(CONFIG_EXPORTFS)
> >> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
> >> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
> >> +                             struct fid *fhbuf)
> >>  {
> >> -       struct {
> >> -               struct file_handle handle;
> >> -               u8 pad[MAX_HANDLE_SZ];
> >> -       } f;
> >>         int size, ret, i;
> >> +       unsigned char *bytes;
> >>
> >> -       f.handle.handle_bytes = sizeof(f.pad);
> >> -       size = f.handle.handle_bytes >> 2;
> >> +       if (!fhbuf)
> >> +               return;
> >> +       size = MAX_HANDLE_SZ >> 2;
> >>
> >> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
> >> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
> >>         if ((ret == FILEID_INVALID) || (ret < 0)) {
> >>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
> >
> > This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
> > to watch over fs that doesn't support exportfs. Care to clean it up?
> > Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?
> 
> If I were going to clean it up, I would need to do more than remove the
> WARN_ONCE(), which almost certainly never fires.
> 
> exportfs_encode_inode_fh() should only be called if sb->s_export_op is
> not NULL.
> When it is NULL, it means that the filesystem doesn't support file
> handles.
> do_sys_name_to_handle() tests this, as does nfsd.  But this inotify code
> doesn't.
> So it can report a "file handle" for a file for which file handles
> aren't supported.  It will use the default export_encode_fh which
> reports i_ino and i_generation, which may or may not be stable or
> meaningful.
> 
> So yes, this code does need a bit of cleaning up....

So something like the patch below?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fsnotify-Do-not-show-file-handles-for-unsupported-fi.patch --]
[-- Type: text/x-patch, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
  2017-12-19 12:42           ` Jan Kara
  (?)
@ 2017-12-20 21:23           ` NeilBrown
  -1 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2017-12-20 21:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, Linus Torvalds, Trond Myklebust, Anna Schumaker,
	Al Viro, Andrew Morton, lkml, linux-nfs, linux-fsdevel,
	Lennart Poettering, Pavel Emelyanov, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 6859 bytes --]

On Tue, Dec 19 2017, Jan Kara wrote:

> On Fri 08-12-17 13:17:31, NeilBrown wrote:
>> On Thu, Dec 07 2017, Amir Goldstein wrote:
>> 
>> > On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
>> >> On Wed, Dec 06 2017, Linus Torvalds wrote:
>> >>
>> >>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>> >>>>
>> >>>> -/* limit the handle size to NFSv4 handle size now */
>> >>>> -#define MAX_HANDLE_SZ 128
>> >>>> +/* Must be larger than NFSv4 file handle, but small
>> >>>> + * enough for an on-stack allocation. overlayfs doesn't
>> >>>> + * want this too close to 255.
>> >>>> + */
>> >>>> +#define MAX_HANDLE_SZ 200
>> >>>
>> >>> This really smells for so many reasons.
>> >>>
>> >>> Also, that really is starting to be a fairly big stack allocation, and
>> >>> it seems to be used in exactly one place (show_mark_fhandle), which
>> >>> makes me go "why is that on the stack anyway?".
>> >>>
>> >>> Could we just allocate a buffer at open time or something?
>> >>>
>> >>>                Linus
>> >>
>> >> "open time" would be when /proc/X/fdinfo/Y was opened in
>> >> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
>> >>
>> >> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
>> >> the earliest 'notify' specific code to run.  There is no
>> >> opportunity to return an error but GFP_KERNEL allocations under 1 page
>> >> never fail..
>> >>
>> >> This patch allocates a single buffer for all inodes reported for a given
>> >> inotify fdinfo, and if the allocation files, the filehandle is silently
>> >> left blank.  More surgery would be needed to be able to return an error.
>> >>
>> >> Is that at all suitable?
>> >>
>> >> Thanks,
>> >> NeilBrown
>> >>
>> >> From: NeilBrown <neilb@suse.com>
>> >> Subject: fs/notify: don't put file handle buffer on stack.
>> >>
>> >> A file handle buffer is not tiny, and could need to be larger in future,
>> >> so it isn't safe to allocate one on the stack.  Instead, we need to
>> >> kmalloc().
>> >>
>> >> There is no way to return an error status from a ->show_fdinfo()
>> >> function, so if the kmalloc fails, we silently exclude the filehandle
>> >> from the output.  As it is at the end of line, this shouldn't
>> >> upset parsing too much.
>> >
>> > It shouldn't upset parsing because that would be the same out
>> > output as without CONFIG_EXPORTFS. AFAIK this information
>> > is used by CRUI.
>> >
>> >>
>> >> Signed-off-by: NeilBrown <neilb@suse.com>
>> >>
>> >> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
>> >> index d478629c728b..20d863b9ae16 100644
>> >> --- a/fs/notify/fdinfo.c
>> >> +++ b/fs/notify/fdinfo.c
>> >> @@ -23,56 +23,58 @@
>> >>
>> >>  static void show_fdinfo(struct seq_file *m, struct file *f,
>> >>                         void (*show)(struct seq_file *m,
>> >> -                                    struct fsnotify_mark *mark))
>> >> +                                    struct fsnotify_mark *mark,
>> >> +                                    struct fid *fh))
>> >>  {
>> >>         struct fsnotify_group *group = f->private_data;
>> >>         struct fsnotify_mark *mark;
>> >> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
>> >>
>> >>         mutex_lock(&group->mark_mutex);
>> >>         list_for_each_entry(mark, &group->marks_list, g_list) {
>> >> -               show(m, mark);
>> >> +               show(m, mark, fh);
>> >>                 if (seq_has_overflowed(m))
>> >>                         break;
>> >>         }
>> >>         mutex_unlock(&group->mark_mutex);
>> >> +       kfree(fh);
>> >>  }
>> >>
>> >>  #if defined(CONFIG_EXPORTFS)
>> >> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
>> >> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
>> >> +                             struct fid *fhbuf)
>> >>  {
>> >> -       struct {
>> >> -               struct file_handle handle;
>> >> -               u8 pad[MAX_HANDLE_SZ];
>> >> -       } f;
>> >>         int size, ret, i;
>> >> +       unsigned char *bytes;
>> >>
>> >> -       f.handle.handle_bytes = sizeof(f.pad);
>> >> -       size = f.handle.handle_bytes >> 2;
>> >> +       if (!fhbuf)
>> >> +               return;
>> >> +       size = MAX_HANDLE_SZ >> 2;
>> >>
>> >> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
>> >> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
>> >>         if ((ret == FILEID_INVALID) || (ret < 0)) {
>> >>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
>> >
>> > This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
>> > to watch over fs that doesn't support exportfs. Care to clean it up?
>> > Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?
>> 
>> If I were going to clean it up, I would need to do more than remove the
>> WARN_ONCE(), which almost certainly never fires.
>> 
>> exportfs_encode_inode_fh() should only be called if sb->s_export_op is
>> not NULL.
>> When it is NULL, it means that the filesystem doesn't support file
>> handles.
>> do_sys_name_to_handle() tests this, as does nfsd.  But this inotify code
>> doesn't.
>> So it can report a "file handle" for a file for which file handles
>> aren't supported.  It will use the default export_encode_fh which
>> reports i_ino and i_generation, which may or may not be stable or
>> meaningful.
>> 
>> So yes, this code does need a bit of cleaning up....
>
> So something like the patch below?
>

I prefer to fix it in exportfs, as in
 https://patchwork.kernel.org/patch/10104253/

Thanks,
NeilBrown


> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> From 66a6c05ae2fbe6cfcb24ca3088de39885a6fa5b8 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 19 Dec 2017 13:38:54 +0100
> Subject: [PATCH] fsnotify: Do not show file handles for unsupported
>  filesystems
>
> Filesystems not setting their s_export_op do not support file handles.
> Do no try to encode them using exportfs_encode_inode_fh() since that may
> fail or return garbage.
>
> Reported-by: NeilBrown <neilb@suse.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/fdinfo.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> index d478629c728b..041c2b0cc145 100644
> --- a/fs/notify/fdinfo.c
> +++ b/fs/notify/fdinfo.c
> @@ -46,6 +46,9 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
>  	} f;
>  	int size, ret, i;
>  
> +	if (!inode->i_sb->s_export_op)
> +		return;
> +
>  	f.handle.handle_bytes = sizeof(f.pad);
>  	size = f.handle.handle_bytes >> 2;
>  
> -- 
> 2.12.3

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
@ 2017-11-30 20:56 NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2017-11-30 20:56 UTC (permalink / raw)
  To: Trond Myklebust, Anna.Schumaker, Al Viro, Andrew Morton, Linus Torvalds
  Cc: lkml, linux-nfs, linux-fsdevel, Lennart Poettering

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]


Amazon EFS provides an NFSv4.1 filesystem which appears to use
(close to) full length (128 byte) file handles.
This causes the handle reported by name_to_handle_at() to exceed
MAX_HANDLE_SZ, resulting in
  EOVERFLOW if 128 bytes were requested, or
  EINVAL if the size reported by the previous call was requested.

To fix this, increase MAX_HANDLE_SIZE a little, and add a BUILD_BUG
so that this sort of inconsistent error reporting won't happen again.

Link: https://github.com/systemd/systemd/issues/7082#issuecomment-347380436
Signed-off-by: NeilBrown <neilb@suse.com>
---

Sorry for the scatter-gun To: list.  This is really more of a VFS patch
than an NFS patch and sending patches in either direction has been a bit
hit-and-miss lately, so I'm hoping Andrew will take it unless someone
else objects or beats him to it...

Thanks,
NeilBrown

 fs/nfs/export.c          | 2 ++
 include/linux/exportfs.h | 7 +++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 83fd09fc8f77..23b2fc3ab2bb 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -39,6 +39,8 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
 	size_t fh_size = offsetof(struct nfs_fh, data) + server_fh->size;
 	int len = EMBED_FH_OFF + XDR_QUADLEN(fh_size);
 
+	BUILD_BUG_ON_MSG(EMBED_FH_OFF + NFS_MAXFHSIZE > MAX_HANDLE_SZ,
+		"MAX_HANDLE_SZ is too small");
 	dprintk("%s: max fh len %d inode %p parent %p",
 		__func__, *max_len, inode, parent);
 
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 0d3037419bc7..e7ab1b071c5e 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -11,8 +11,11 @@ struct iomap;
 struct super_block;
 struct vfsmount;
 
-/* limit the handle size to NFSv4 handle size now */
-#define MAX_HANDLE_SZ 128
+/* Must be larger than NFSv4 file handle, but small
+ * enough for an on-stack allocation. overlayfs doesn't
+ * want this too close to 255.
+ */
+#define MAX_HANDLE_SZ 200
 
 /*
  * The fileid_type identifies how the file within the filesystem is encoded.
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-20 21:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 20:56 [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS NeilBrown
2017-12-04  3:27 ` [PATCH] fhandle: avoid -EINVAL if requested size is too large NeilBrown
2017-12-06 19:05 ` [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS J. Bruce Fields
2017-12-07  2:07 ` Linus Torvalds
2017-12-07  3:20   ` NeilBrown
2017-12-07  4:04     ` Amir Goldstein
2017-12-08  2:17       ` NeilBrown
2017-12-19 12:42         ` Jan Kara
2017-12-19 12:42           ` Jan Kara
2017-12-20 21:23           ` NeilBrown
2017-12-07  5:34     ` Matthew Wilcox
2017-11-30 20:56 NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.