All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ovl: don't expose EOPENSTALE to userspace
@ 2017-04-25 13:57 Amir Goldstein
  2017-04-26 13:47 ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2017-04-25 13:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Trond Myklebust, Jeff Layton, J . Bruce Fields, Tyler Hicks,
	Al Viro, linux-unionfs, linux-fsdevel, stable

An overlay dentry holds a lower dentry that may belong to an NFS mount.

Overlayfs calls ovl_path_open() to get a file descriptor of a lower file
for copying up its data and for a lower directory for listing its
content.

If that lower dentry gets invalidated after ovl_dentry_revalidate() and
before ovl_path_open(), the internal error code 518 (EOPENSTALE), which
is not a POSIX error code, will be exposed to the user.

Check the internal return value -EOPENSTALE in ovl_path_open(), just
the same as it is checked in path_openat() and replace it with the POSIX
error code ESTALE.

Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/util.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Miklos,

Following a similar bug report and a fix I posted for fanotify,
I realized that overlayfs can also expose EOPENSTALE to userspace.

I did not try to reproduce this with real lower NFS, only tested
patch correctness using a debug patch.

I guess a reproducer could work with ovl_copy_up_data() that is
delayed for a long time without concurrent copy up in kernel <= v4.10
and lower file is pulled underneath while copy up is pending.
However, with concurrent copy up in v4.11 that reproducer will be
of no use.

The only other suspect I found for dentry_open() of an NFS dentry
without checking -EOPENSTALE is ecryptfs.

David,
I think fscache won't be storing it local storage on NFS, right?

Added CC stable. Although I am not sure how important this fix is,
seems simple enough that breaking POSIX would justify patching stable.

Anyway, if you think this calls for a VFS helper I don't mind
doing it, I'm just terrible with names so please suggest one.
In that case, should we make dentry_open() static? non exportable?

Thanks,
Amir.

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 89789bc..931d199 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -328,7 +328,11 @@ bool ovl_is_whiteout(struct dentry *dentry)
 
 struct file *ovl_path_open(struct path *path, int flags)
 {
-	return dentry_open(path, flags | O_NOATIME, current_cred());
+	struct file *f = dentry_open(path, flags | O_NOATIME, current_cred());
+
+	if (unlikely(IS_ERR(f) && PTR_ERR(f) == -EOPENSTALE))
+		return ERR_PTR(-ESTALE);
+	return f;
 }
 
 int ovl_copy_up_start(struct dentry *dentry)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
  2017-04-25 13:57 [PATCH] ovl: don't expose EOPENSTALE to userspace Amir Goldstein
@ 2017-04-26 13:47 ` Miklos Szeredi
  2017-04-26 14:06     ` Marko Rauhamaa
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2017-04-26 13:47 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Trond Myklebust, Jeff Layton, J . Bruce Fields, Tyler Hicks,
	Al Viro, linux-unionfs, linux-fsdevel, stable

On Tue, Apr 25, 2017 at 04:57:55PM +0300, Amir Goldstein wrote:
> An overlay dentry holds a lower dentry that may belong to an NFS mount.
> 
> Overlayfs calls ovl_path_open() to get a file descriptor of a lower file
> for copying up its data and for a lower directory for listing its
> content.
> 
> If that lower dentry gets invalidated after ovl_dentry_revalidate() and
> before ovl_path_open(), the internal error code 518 (EOPENSTALE), which
> is not a POSIX error code, will be exposed to the user.
> 
> Check the internal return value -EOPENSTALE in ovl_path_open(), just
> the same as it is checked in path_openat() and replace it with the POSIX
> error code ESTALE.

I'm looking at the EOPENSTALE story and it very much looks like we can just
replace the single use with ESTALE and handle the lookup retry logic nuances
inside the lookup code...

Below is untested patch.

Thanks,
Miklos


diff --git a/Documentation/filesystems/path-lookup.md b/Documentation/filesystems/path-lookup.md
index 1b39e084a2b2..41f2b25b6ac3 100644
--- a/Documentation/filesystems/path-lookup.md
+++ b/Documentation/filesystems/path-lookup.md
@@ -1,4 +1,4 @@
-<head>
+\<head>
 <style> p { max-width:50em} ol, ul {max-width: 40em}</style>
 </head>
 
@@ -1150,11 +1150,10 @@ code.
  created file will be performed by `vfs_open()`, just as if the name
  were found in the dcache.
 
-2. `vfs_open()` can fail with `-EOPENSTALE` if the cached information
- wasn't quite current enough.  Rather than restarting the lookup from
- the top with `LOOKUP_REVAL` set, `lookup_open()` is called instead,
- giving the filesystem a chance to resolve small inconsistencies.
- If that doesn't work, only then is the lookup restarted from the top.
+2. `vfs_open()` can fail with `-ESTALE` if the cached information wasn't
+ quite current enough.  The lookup will be restarted first without
+ LOOKUP_REVAL and then with.
+
 
 3. An open with O_CREAT **does** follow a symlink in the final component,
      unlike other creation system calls (like `mkdir`).  So the sequence:
diff --git a/fs/namei.c b/fs/namei.c
index d41fab78798b..af0b31340ada 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3187,7 +3187,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
  */
 static int do_last(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op,
-		   int *opened)
+		   int *opened, bool firstpass)
 {
 	struct dentry *dir = nd->path.dentry;
 	int open_flag = op->open_flag;
@@ -3342,8 +3342,16 @@ static int do_last(struct nameidata *nd,
 		goto out;
 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
 	error = vfs_open(&nd->path, file, current_cred());
-	if (error)
+	if (error) {
+		/*
+		 * If we got ESTALE here it likely means that only need to
+		 * revalidate the last component.  So first restart without
+		 * LOOKUP_REVAL.
+		 */
+		if (error == -ESTALE && firstpass)
+			error = -ECHILD;
 		goto out;
+	}
 	*opened |= FILE_OPENED;
 opened:
 	error = open_check_o_direct(file);
@@ -3458,6 +3466,7 @@ static struct file *path_openat(struct nameidata *nd,
 	struct file *file;
 	int opened = 0;
 	int error;
+	bool firstpass = flags & LOOKUP_RCU;
 
 	file = get_empty_filp();
 	if (IS_ERR(file))
@@ -3483,7 +3492,7 @@ static struct file *path_openat(struct nameidata *nd,
 		return ERR_CAST(s);
 	}
 	while (!(error = link_path_walk(s, nd)) &&
-		(error = do_last(nd, file, op, &opened)) > 0) {
+	       (error = do_last(nd, file, op, &opened, firstpass)) > 0) {
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		s = trailing_symlink(nd);
 		if (IS_ERR(s)) {
@@ -3497,15 +3506,9 @@ static struct file *path_openat(struct nameidata *nd,
 		BUG_ON(!error);
 		put_filp(file);
 	}
-	if (unlikely(error)) {
-		if (error == -EOPENSTALE) {
-			if (flags & LOOKUP_RCU)
-				error = -ECHILD;
-			else
-				error = -ESTALE;
-		}
+	if (unlikely(error))
 		file = ERR_PTR(error);
-	}
+
 	return file;
 }
 
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 0efba77789b9..ecd2e1b165b1 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -39,7 +39,7 @@ nfs4_file_open(struct inode *inode, struct file *filp)
 	 *
 	 * We only get this far for a cached positive dentry.  We skipped
 	 * revalidation, so handle it here by dropping the dentry and returning
-	 * -EOPENSTALE.  The VFS will retry the lookup/create/open.
+	 * -ESTALE.  The VFS will retry the lookup/create/open.
 	 */
 
 	dprintk("NFS: open file(%pd2)\n", dentry);
@@ -99,7 +99,7 @@ nfs4_file_open(struct inode *inode, struct file *filp)
 
 out_drop:
 	d_drop(dentry);
-	err = -EOPENSTALE;
+	err = -ESTALE;
 	goto out_put_ctx;
 }
 
diff --git a/include/linux/errno.h b/include/linux/errno.h
index 7ce9fb1b7d28..28a979008af2 100644
--- a/include/linux/errno.h
+++ b/include/linux/errno.h
@@ -16,7 +16,6 @@
 #define ENOIOCTLCMD	515	/* No ioctl command */
 #define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */
 #define EPROBE_DEFER	517	/* Driver requests probe retry */
-#define EOPENSTALE	518	/* open found a stale dentry */
 
 /* Defined for the NFSv3 protocol */
 #define EBADHANDLE	521	/* Illegal NFS file handle */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
  2017-04-26 13:47 ` Miklos Szeredi
@ 2017-04-26 14:06     ` Marko Rauhamaa
  0 siblings, 0 replies; 7+ messages in thread
From: Marko Rauhamaa @ 2017-04-26 14:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Trond Myklebust, Jeff Layton, J . Bruce Fields,
	Tyler Hicks, Al Viro, linux-unionfs, linux-fsdevel, stable

Miklos Szeredi <miklos@szeredi.hu>:

> I'm looking at the EOPENSTALE story and it very much looks like we can
> just replace the single use with ESTALE and handle the lookup retry
> logic nuances inside the lookup code...

The fanotify problem is not simply a matter of choosing a POSIX name for
an error. The question is, what problems should an fanotify application
be prepared to handle and what should it do about them?

Since a misbehaving fanotify application is likely to hang the entire
operating system, it needs very clear guidelines for correct behavior.
In particular, when the application does a read(2) on an fanotify file
descriptor and gets back an error code, how is the application to
recover gracefully and safely?

Amir's patch shields the fanotify application from EOPENSTALE. I would
very much like an extensive list of errors that read(2) on a fanotify fd
can return. As it stands, I'm only aware of EAGAIN in the nonblocking
case and EINTR in the blocking case -- and even those haven't been
explicitly documented.


Marko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
@ 2017-04-26 14:06     ` Marko Rauhamaa
  0 siblings, 0 replies; 7+ messages in thread
From: Marko Rauhamaa @ 2017-04-26 14:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Trond Myklebust, Jeff Layton, J . Bruce Fields,
	Tyler Hicks, Al Viro, linux-unionfs, linux-fsdevel, stable

Miklos Szeredi <miklos@szeredi.hu>:

> I'm looking at the EOPENSTALE story and it very much looks like we can
> just replace the single use with ESTALE and handle the lookup retry
> logic nuances inside the lookup code...

The fanotify problem is not simply a matter of choosing a POSIX name for
an error. The question is, what problems should an fanotify application
be prepared to handle and what should it do about them?

Since a misbehaving fanotify application is likely to hang the entire
operating system, it needs very clear guidelines for correct behavior.
In particular, when the application does a read(2) on an fanotify file
descriptor and gets back an error code, how is the application to
recover gracefully and safely?

Amir's patch shields the fanotify application from EOPENSTALE. I would
very much like an extensive list of errors that read(2) on a fanotify fd
can return. As it stands, I'm only aware of EAGAIN in the nonblocking
case and EINTR in the blocking case -- and even those haven't been
explicitly documented.


Marko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
  2017-04-26 14:06     ` Marko Rauhamaa
  (?)
@ 2017-04-26 15:45     ` Amir Goldstein
  2017-04-27  7:40         ` Marko Rauhamaa
  -1 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2017-04-26 15:45 UTC (permalink / raw)
  To: Marko Rauhamaa
  Cc: Miklos Szeredi, Trond Myklebust, Jeff Layton, J . Bruce Fields,
	Tyler Hicks, Al Viro, linux-unionfs, linux-fsdevel, stable [v4.9]

On Wed, Apr 26, 2017 at 5:06 PM, Marko Rauhamaa
<marko.rauhamaa@f-secure.com> wrote:
> Miklos Szeredi <miklos@szeredi.hu>:
>
>> I'm looking at the EOPENSTALE story and it very much looks like we can
>> just replace the single use with ESTALE and handle the lookup retry
>> logic nuances inside the lookup code...
>
> The fanotify problem is not simply a matter of choosing a POSIX name for
> an error. The question is, what problems should an fanotify application
> be prepared to handle and what should it do about them?
>
> Since a misbehaving fanotify application is likely to hang the entire
> operating system, it needs very clear guidelines for correct behavior.
> In particular, when the application does a read(2) on an fanotify file
> descriptor and gets back an error code, how is the application to
> recover gracefully and safely?
>
> Amir's patch shields the fanotify application from EOPENSTALE. I would
> very much like an extensive list of errors that read(2) on a fanotify fd
> can return. As it stands, I'm only aware of EAGAIN in the nonblocking
> case and EINTR in the blocking case -- and even those haven't been
> explicitly documented.
>

There are more error that you can get same way that you got
EOPENSTALE. The fact that I filtered EOPENSTALE is fixing a POSIX bug,
but it does not fix the general problem you described.

For example, I know you can get ENODEV, because I got it on
out test env. This is the case of a "stale device node" - by the time
you get to read an access event generated on a device file, the device
that this file represents does not exists and cannot be opened.

As with the case of EOPENSTALE, your app should just read again
when that happens.

You can either get the error from read() or not. This is documented in
man page:
       *  If  a  call  to  read(2) processes multiple events from the fanotify
          queue and an error occurs, the return value will be the total length
          of the events successfully copied to the user-space buffer before
          the error occurred.  The return value will not be -1, and errno will
          not be set.  Thus, the reading application has no way to
detect the error.


Amir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
  2017-04-26 15:45     ` Amir Goldstein
@ 2017-04-27  7:40         ` Marko Rauhamaa
  0 siblings, 0 replies; 7+ messages in thread
From: Marko Rauhamaa @ 2017-04-27  7:40 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Trond Myklebust, Jeff Layton, J . Bruce Fields,
	Tyler Hicks, Al Viro, linux-unionfs, linux-fsdevel, stable [v4.9]

Amir Goldstein <amir73il@gmail.com>:
> There are more error that you can get same way that you got
> EOPENSTALE. The fact that I filtered EOPENSTALE is fixing a POSIX bug,
> but it does not fix the general problem you described.
>
> For example, I know you can get ENODEV, because I got it on
> out test env. This is the case of a "stale device node" - by the time
> you get to read an access event generated on a device file, the device
> that this file represents does not exists and cannot be opened.
>
> As with the case of EOPENSTALE, your app should just read again
> when that happens.

Thing is, I must treat every unknown error value as critical and exit.
That's because the problem might be persistent and spinning in a read
loop hangs the system.

Unless, of course, the system call API explicitly declares all problems
as transient and tells the application to retry indefinitely. ENODEV
certainly doesn't have the ring of a transient error.

I don't really see why read(2) on the fanotify fd should return any
random errors. The fanotify fd is purely a software device and should
only have a fixed, documented set of failure modes. The other file
descriptor (metadata->fd) is tied to a real file and could fail in a
million ways; that's understandable.


Marko

-- 
+358 44 990 4795
Skype: marko.rauhamaa_f-secure

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ovl: don't expose EOPENSTALE to userspace
@ 2017-04-27  7:40         ` Marko Rauhamaa
  0 siblings, 0 replies; 7+ messages in thread
From: Marko Rauhamaa @ 2017-04-27  7:40 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Trond Myklebust, Jeff Layton, J . Bruce Fields,
	Tyler Hicks, Al Viro, linux-unionfs, linux-fsdevel, stable [v4.9]

Amir Goldstein <amir73il@gmail.com>:
> There are more error that you can get same way that you got
> EOPENSTALE. The fact that I filtered EOPENSTALE is fixing a POSIX bug,
> but it does not fix the general problem you described.
>
> For example, I know you can get ENODEV, because I got it on
> out test env. This is the case of a "stale device node" - by the time
> you get to read an access event generated on a device file, the device
> that this file represents does not exists and cannot be opened.
>
> As with the case of EOPENSTALE, your app should just read again
> when that happens.

Thing is, I must treat every unknown error value as critical and exit.
That's because the problem might be persistent and spinning in a read
loop hangs the system.

Unless, of course, the system call API explicitly declares all problems
as transient and tells the application to retry indefinitely. ENODEV
certainly doesn't have the ring of a transient error.

I don't really see why read(2) on the fanotify fd should return any
random errors. The fanotify fd is purely a software device and should
only have a fixed, documented set of failure modes. The other file
descriptor (metadata->fd) is tied to a real file and could fail in a
million ways; that's understandable.


Marko

-- 
+358 44 990 4795
Skype: marko.rauhamaa_f-secure

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-27  7:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 13:57 [PATCH] ovl: don't expose EOPENSTALE to userspace Amir Goldstein
2017-04-26 13:47 ` Miklos Szeredi
2017-04-26 14:06   ` Marko Rauhamaa
2017-04-26 14:06     ` Marko Rauhamaa
2017-04-26 15:45     ` Amir Goldstein
2017-04-27  7:40       ` Marko Rauhamaa
2017-04-27  7:40         ` Marko Rauhamaa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.