All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Malahal Naineni <malahal@us.ibm.com>,
	Steve Dickson <SteveD@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk,
	hch@infradead.org, michael.brantley@deshaw.com,
	sven.breuner@itwm.fraunhofer.de, chuck.lever@oracle.com,
	pstaubach@exagrid.com, trond.myklebust@fys.uio.no,
	rees@umich.edu
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call
Date: Mon, 23 Apr 2012 09:50:21 -0400	[thread overview]
Message-ID: <20120423095021.1a91a23b@tlielax.poochiereds.net> (raw)
In-Reply-To: <20120423133412.GB13681@fieldses.org>

On Mon, 23 Apr 2012 09:34:12 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Apr 23, 2012 at 09:12:55AM -0400, Jeff Layton wrote:
> > On Mon, 23 Apr 2012 09:00:09 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Mon, Apr 23, 2012 at 08:00:12AM -0400, Jeff Layton wrote:
> > > > On Sun, 22 Apr 2012 07:40:57 +0200
> > > > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > 
> > > > > On Fri, Apr 20, 2012 at 11:13 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > On Fri, 20 Apr 2012 15:37:26 -0500
> > > > > > Malahal Naineni <malahal@us.ibm.com> wrote:
> > > > > >
> > > > > >> Steve Dickson [SteveD@redhat.com] wrote:
> > > > > >> > > 2) if we assume that it is fairly representative of one, how can we
> > > > > >> > > achieve retrying indefinitely with NFS, or at least some large finite
> > > > > >> > > amount?
> > > > > >> > The amount of looping would be peer speculation. If the problem can
> > > > > >> > not be handled by one simple retry I would say we simply pass the
> > > > > >> > error up to the app... Its an application issue...
> > > > > >>
> > > > > >> As someone said, ESTALE is an incorrect errno for a path based call.
> > > > > >> How about turning ESTALE into ENOENT after a retry or few retries?
> > > > > >>
> > > > > >
> > > > > > It's not really the same thing. One could envision an application
> > > > > > that's repeatedly renaming a new file on top of another one. The file
> > > > > > is never missing from the namespace of the server, but you could still
> > > > > > end up getting an ESTALE.
> > > > > >
> > > > > > That would break other atomicity guarantees in an even worse way, IMO...
> > > > > 
> > > > > For directory operations ESTALE *is* equivalent to ENOENT if already
> > > > > retrying with LOOKUP_REVAL.  Think about it.  Atomic replacement by
> > > > > another directory with rename(2) is not an excuse here actually.
> > > > > Local filesystems too can end up with IS_DEAD directory after lookup
> > > > > in that case.
> > > > > 
> > > > 
> > > > Doesn't that violate POSIX? rename(2) is supposed to be atomic, and I
> > > > can't see where there's any exception for that for directories.
> > > 
> > > Hm, but that only allows atomic replacement of the last component of a
> > > path.
> > > 
> > > Suppose you're looking up a path, you've so far reached intermediate
> > > directory "D", and the next step of the lookup (of some entry in D)
> > > returns ESTALE.  Then either:
> > > 
> > > 	- D has since been unlinked, and ENOENT is obviously right.
> > > 	- D was unlinked and then replaced by something else, in which
> > > 	  case there was still a moment when ENOENT was correct.
> > > 	- D was replaced atomically by a rename.  But for the rename to
> > > 	  work it must have been replacing an empty directory, so there
> > > 	  was still a moment when ENOENT would have been correct.
> > 
> > I don't think so...D should always exist in the namespace, so ENOENT
> > would not be correct.
> 
> The operation above is a lookup in D, not a lookup of D.
>
> > Just because it was empty doesn't mean that it
> > didn't exist...
> > 
> > > 	  (Exception: if D was actually a regular file or some other
> > > 	  non-directory object, then ENOTDIR would be the right error:
> > > 	  but if you're able to get at least object type atomically with
> > > 	  a lookup, then you should have noticed this already on lookup
> > > 	  of D.)
> > > 
> > > I think that's what Miklos meant?
> > > 
> > > --b.
> > 
> > Here's an example -- suppose we have two directories: /foo
> > and /bar. /bar is empty. We call:
> > 
> >     rename("/foo","/bar");
> > 
> > ...and at the same time, someone is calling:
> > 
> >     stat("/bar");
> > 
> > ...the calls race and in this condition the stat() gets ESTALE back
> > -- /bar got replaced after we did the lookup.
> > 
> > According to POSIX, the name "/bar" should never be absent from the
> > namespace in this situation, so I'm not sure I understand why returning
> > ENOENT here would be acceptable.
> 
> Yes, agreed, my assertion was just that an ESTALE on a lookup of a
> non-final component is probably equivalent to ENOENT.
> 
> I'm not sure if that's what Miklos meant.
> 

Ahh ok, sorry I misunderstood. Yeah in that case I suppose it would
be ok to replace ESTALE with ENOENT. Ok, so to illustrate...

Suppose we're trying to stat("/bar/baz") instead in the above example.
Then we could just return ENOENT instead on an ESTALE return for the
reasons that Bruce outlined. If the dir was stale, then there was a
at least one point in time where we *know* that "baz" didn't exist.

That doesn't seem like it'll work as a general solution though since it
wouldn't apply to an ESTALE on the last component. For that we'd need
to do something different -- retry the operation in some form, but it
might be potential optimization in the path walking code to avoid
retrying in some cases.

-- 
Jeff Layton <jlayton@redhat.com>

WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>,
	Malahal Naineni <malahal-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	Steve Dickson <SteveD-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	michael.brantley-Iq/kdjr4a97QT0dZR+AlfA@public.gmane.org,
	sven.breuner-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org,
	chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	pstaubach-83r9SdEf25FBDgjK7y7TUQ@public.gmane.org,
	trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org,
	rees-63aXycvo3TyHXe+LvDLADg@public.gmane.org
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call
Date: Mon, 23 Apr 2012 09:50:21 -0400	[thread overview]
Message-ID: <20120423095021.1a91a23b@tlielax.poochiereds.net> (raw)
In-Reply-To: <20120423133412.GB13681-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>

On Mon, 23 Apr 2012 09:34:12 -0400
"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Mon, Apr 23, 2012 at 09:12:55AM -0400, Jeff Layton wrote:
> > On Mon, 23 Apr 2012 09:00:09 -0400
> > "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> > 
> > > On Mon, Apr 23, 2012 at 08:00:12AM -0400, Jeff Layton wrote:
> > > > On Sun, 22 Apr 2012 07:40:57 +0200
> > > > Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote:
> > > > 
> > > > > On Fri, Apr 20, 2012 at 11:13 PM, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > > On Fri, 20 Apr 2012 15:37:26 -0500
> > > > > > Malahal Naineni <malahal-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > >
> > > > > >> Steve Dickson [SteveD-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org] wrote:
> > > > > >> > > 2) if we assume that it is fairly representative of one, how can we
> > > > > >> > > achieve retrying indefinitely with NFS, or at least some large finite
> > > > > >> > > amount?
> > > > > >> > The amount of looping would be peer speculation. If the problem can
> > > > > >> > not be handled by one simple retry I would say we simply pass the
> > > > > >> > error up to the app... Its an application issue...
> > > > > >>
> > > > > >> As someone said, ESTALE is an incorrect errno for a path based call.
> > > > > >> How about turning ESTALE into ENOENT after a retry or few retries?
> > > > > >>
> > > > > >
> > > > > > It's not really the same thing. One could envision an application
> > > > > > that's repeatedly renaming a new file on top of another one. The file
> > > > > > is never missing from the namespace of the server, but you could still
> > > > > > end up getting an ESTALE.
> > > > > >
> > > > > > That would break other atomicity guarantees in an even worse way, IMO...
> > > > > 
> > > > > For directory operations ESTALE *is* equivalent to ENOENT if already
> > > > > retrying with LOOKUP_REVAL.  Think about it.  Atomic replacement by
> > > > > another directory with rename(2) is not an excuse here actually.
> > > > > Local filesystems too can end up with IS_DEAD directory after lookup
> > > > > in that case.
> > > > > 
> > > > 
> > > > Doesn't that violate POSIX? rename(2) is supposed to be atomic, and I
> > > > can't see where there's any exception for that for directories.
> > > 
> > > Hm, but that only allows atomic replacement of the last component of a
> > > path.
> > > 
> > > Suppose you're looking up a path, you've so far reached intermediate
> > > directory "D", and the next step of the lookup (of some entry in D)
> > > returns ESTALE.  Then either:
> > > 
> > > 	- D has since been unlinked, and ENOENT is obviously right.
> > > 	- D was unlinked and then replaced by something else, in which
> > > 	  case there was still a moment when ENOENT was correct.
> > > 	- D was replaced atomically by a rename.  But for the rename to
> > > 	  work it must have been replacing an empty directory, so there
> > > 	  was still a moment when ENOENT would have been correct.
> > 
> > I don't think so...D should always exist in the namespace, so ENOENT
> > would not be correct.
> 
> The operation above is a lookup in D, not a lookup of D.
>
> > Just because it was empty doesn't mean that it
> > didn't exist...
> > 
> > > 	  (Exception: if D was actually a regular file or some other
> > > 	  non-directory object, then ENOTDIR would be the right error:
> > > 	  but if you're able to get at least object type atomically with
> > > 	  a lookup, then you should have noticed this already on lookup
> > > 	  of D.)
> > > 
> > > I think that's what Miklos meant?
> > > 
> > > --b.
> > 
> > Here's an example -- suppose we have two directories: /foo
> > and /bar. /bar is empty. We call:
> > 
> >     rename("/foo","/bar");
> > 
> > ...and at the same time, someone is calling:
> > 
> >     stat("/bar");
> > 
> > ...the calls race and in this condition the stat() gets ESTALE back
> > -- /bar got replaced after we did the lookup.
> > 
> > According to POSIX, the name "/bar" should never be absent from the
> > namespace in this situation, so I'm not sure I understand why returning
> > ENOENT here would be acceptable.
> 
> Yes, agreed, my assertion was just that an ESTALE on a lookup of a
> non-final component is probably equivalent to ENOENT.
> 
> I'm not sure if that's what Miklos meant.
> 

Ahh ok, sorry I misunderstood. Yeah in that case I suppose it would
be ok to replace ESTALE with ENOENT. Ok, so to illustrate...

Suppose we're trying to stat("/bar/baz") instead in the above example.
Then we could just return ENOENT instead on an ESTALE return for the
reasons that Bruce outlined. If the dir was stale, then there was a
at least one point in time where we *know* that "baz" didn't exist.

That doesn't seem like it'll work as a general solution though since it
wouldn't apply to an ESTALE on the last component. For that we'd need
to do something different -- retry the operation in some form, but it
might be potential optimization in the path walking code to avoid
retrying in some cases.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-04-23 13:50 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-13 11:25 [PATCH RFC] vfs: make fstatat retry on ESTALE errors from getattr call Jeff Layton
2012-04-13 11:25 ` Jeff Layton
2012-04-13 12:02 ` Jim Rees
2012-04-13 12:02   ` Jim Rees
2012-04-13 12:09   ` Jeff Layton
2012-04-13 12:09     ` Jeff Layton
2012-04-13 15:05 ` Malahal Naineni
2012-04-13 15:42   ` Jeff Layton
2012-04-13 16:07     ` Steve Dickson
2012-04-13 17:10       ` Jeff Layton
2012-04-13 17:10         ` Jeff Layton
2012-04-13 17:34       ` Peter Staubach
2012-04-13 17:34         ` Peter Staubach
2012-04-13 23:00         ` Jeff Layton
2012-04-13 23:00           ` Jeff Layton
2012-04-14  0:57         ` Trond Myklebust
2012-04-15 19:03     ` Bernd Schubert
2012-04-15 19:27       ` J. Bruce Fields
2012-04-15 19:27         ` J. Bruce Fields
2012-04-16 14:23         ` Bernd Schubert
2012-04-15 19:57       ` Chuck Lever
2012-04-15 19:57         ` Chuck Lever
2012-04-16 11:23         ` Jeff Layton
2012-04-17 11:53         ` Steve Dickson
2012-04-16 11:36       ` Jeff Layton
2012-04-16 11:36         ` Jeff Layton
2012-04-16 12:54         ` Peter Staubach
2012-04-16 12:54           ` Peter Staubach
2012-04-16 16:04           ` Jeff Layton
2012-04-16 14:44         ` Bernd Schubert
2012-04-16 17:46           ` Jeff Layton
2012-04-16 17:46             ` Jeff Layton
2012-04-16 19:33             ` Myklebust, Trond
2012-04-16 19:33               ` Myklebust, Trond
2012-04-16 19:33               ` Myklebust, Trond
2012-04-16 19:43               ` Jeff Layton
2012-04-16 20:25                 ` Myklebust, Trond
2012-04-16 20:25                   ` Myklebust, Trond
2012-04-16 20:25                   ` Myklebust, Trond
2012-04-16 23:05                   ` Jeff Layton
2012-04-17 11:46                     ` Steve Dickson
2012-04-17 11:46                       ` Steve Dickson
2012-04-17 13:36                       ` Jeff Layton
2012-04-17 13:36                         ` Jeff Layton
2012-04-17 14:14                         ` Steve Dickson
2012-04-17 14:14                           ` Steve Dickson
2012-04-17 14:27                           ` Miklos Szeredi
2012-04-17 15:02                             ` Jeff Layton
2012-04-17 15:50                               ` Miklos Szeredi
2012-04-17 15:50                                 ` Miklos Szeredi
2012-04-17 16:03                                 ` Jeff Layton
2012-04-17 16:03                                   ` Jeff Layton
2012-04-17 15:59                               ` Steve Dickson
2012-04-17 15:59                                 ` Steve Dickson
2012-04-17 13:12                     ` Miklos Szeredi
2012-04-17 13:32                       ` Jeff Layton
2012-04-17 14:03                         ` Miklos Szeredi
2012-04-17 14:22                           ` Jeff Layton
2012-04-17 14:22                             ` Jeff Layton
2012-04-17 14:04                         ` Myklebust, Trond
2012-04-17 14:04                           ` Myklebust, Trond
2012-04-17 14:04                           ` Myklebust, Trond
2012-04-17 14:20                           ` Jeff Layton
2012-04-17 15:45                             ` J. Bruce Fields
2012-04-17 15:45                               ` J. Bruce Fields
2012-04-17 16:02                               ` Miklos Szeredi
2012-04-17 16:02                                 ` Miklos Szeredi
2012-04-17 13:39                     ` Peter Staubach
2012-04-17 14:08                       ` Myklebust, Trond
2012-04-17 14:08                         ` Myklebust, Trond
2012-04-17 14:08                         ` Myklebust, Trond
2012-04-17 14:48                         ` Peter Staubach
2012-04-17 14:48                           ` Peter Staubach
2012-04-17 14:48                           ` Peter Staubach
2012-04-18 15:16                           ` Jeff Layton
2012-04-18 15:16                             ` Jeff Layton
2012-04-16 19:43             ` Scott Lovenberg
2012-04-16 19:43               ` Scott Lovenberg
2012-04-16 16:55 ` [PATCH RFC v2] " Jeff Layton
2012-04-18 11:52 ` [PATCH RFC v3] vfs: make fstatat retry once " Jeff Layton
2012-04-18 11:52   ` Jeff Layton
2012-04-20 14:40   ` Jeff Layton
2012-04-20 20:18     ` Steve Dickson
2012-04-20 20:18       ` Steve Dickson
2012-04-20 20:37       ` Malahal Naineni
2012-04-20 20:37         ` Malahal Naineni
2012-04-20 21:13         ` Jeff Layton
2012-04-22  5:40           ` Miklos Szeredi
2012-04-23 12:00             ` Jeff Layton
2012-04-23 12:00               ` Jeff Layton
2012-04-23 13:00               ` J. Bruce Fields
2012-04-23 13:00                 ` J. Bruce Fields
2012-04-23 13:12                 ` Jeff Layton
2012-04-23 13:12                   ` Jeff Layton
2012-04-23 13:34                   ` J. Bruce Fields
2012-04-23 13:34                     ` J. Bruce Fields
2012-04-23 13:50                     ` Jeff Layton [this message]
2012-04-23 13:50                       ` Jeff Layton
2012-04-23 13:54                       ` J. Bruce Fields
2012-04-23 14:51                         ` Miklos Szeredi
2012-04-23 15:02                           ` Chuck Lever
2012-04-23 15:02                             ` Chuck Lever
2012-04-23 15:23                             ` Miklos Szeredi
2012-04-23 17:45                               ` Peter Staubach
2012-04-23 15:16                           ` Jeff Layton
2012-04-23 15:16                             ` Jeff Layton
2012-04-23 15:28                             ` Miklos Szeredi
2012-04-23 18:59                               ` Jeff Layton
2012-04-20 21:13       ` Jeff Layton
2012-04-20 21:13         ` Jeff Layton
2012-04-23 14:55         ` Steve Dickson
2012-04-23 14:55           ` Steve Dickson
2012-04-23 15:32           ` Jeff Layton
2012-04-23 15:32             ` Jeff Layton
2012-04-23 18:06             ` Steve Dickson
2012-04-23 18:06               ` Steve Dickson
2012-04-23 18:33               ` Jeff Layton
2012-04-23 20:38               ` Peter Staubach
2012-04-23 20:38                 ` Peter Staubach
2012-04-24 14:50                 ` Jeff Layton
2012-04-24 15:54                   ` Miklos Szeredi
2012-04-24 15:54                     ` Miklos Szeredi
2012-04-24 16:34                     ` Jeff Layton
2012-04-25  9:41                       ` Miklos Szeredi
2012-04-25  9:41                         ` Miklos Szeredi
2012-04-25 12:04                         ` Jeff Layton
2012-04-25 12:04                           ` Jeff Layton
2012-04-23 17:43           ` Peter Staubach
2012-04-23 17:43             ` Peter Staubach
2012-04-23 19:06           ` Malahal Naineni
2012-04-23 19:06             ` Malahal Naineni
2012-04-22  4:16     ` Ric Wheeler
2012-04-22  4:16       ` Ric Wheeler
2012-04-23 11:20       ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120423095021.1a91a23b@tlielax.poochiereds.net \
    --to=jlayton@redhat.com \
    --cc=SteveD@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=malahal@us.ibm.com \
    --cc=michael.brantley@deshaw.com \
    --cc=miklos@szeredi.hu \
    --cc=pstaubach@exagrid.com \
    --cc=rees@umich.edu \
    --cc=sven.breuner@itwm.fraunhofer.de \
    --cc=trond.myklebust@fys.uio.no \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.