All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Always report a writeback error once
@ 2018-04-23 20:42 Matthew Wilcox
  2018-04-23 20:57 ` Andres Freund
  2018-04-24 12:24 ` Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2018-04-23 20:42 UTC (permalink / raw)
  To: linux-fsdevel, Jeff Layton, linux-kernel; +Cc: Andres Freund


The errseq_t infrastructure assumes that errors which occurred before
the file descriptor was opened are of no interest to the application.
This turns out to be a regression for some applications, notably Postgres.

Before errseq_t, a writeback error would be reported exactly once (as
long as the inode remained in memory), so Postgres could open a file,
call fsync() and find out whether there had been a writeback error on
that file from another process.

This patch restores that behaviour by reporting errors to file descriptors
which are opened after the error occurred, but before it was reported
to any file descriptor.

Cc: stable@vger.kernel.org
Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting")
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..093f1fba4ee0 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
 errseq_t errseq_sample(errseq_t *eseq)
 {
 	errseq_t old = READ_ONCE(*eseq);
-	errseq_t new = old;
 
-	/*
-	 * For the common case of no errors ever having been set, we can skip
-	 * marking the SEEN bit. Once an error has been set, the value will
-	 * never go back to zero.
-	 */
-	if (old != 0) {
-		new |= ERRSEQ_SEEN;
-		if (old != new)
-			cmpxchg(eseq, old, new);
-	}
-	return new;
+	/* If nobody has seen this error yet, then we can be the first. */
+	if (!(old & ERRSEQ_SEEN))
+		old = 0;
+	return old;
 }
 EXPORT_SYMBOL(errseq_sample);
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 20:42 [PATCH] Always report a writeback error once Matthew Wilcox
@ 2018-04-23 20:57 ` Andres Freund
  2018-04-23 21:43   ` Matthew Wilcox
  2018-04-24 12:24 ` Jeff Layton
  1 sibling, 1 reply; 7+ messages in thread
From: Andres Freund @ 2018-04-23 20:57 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Jeff Layton, linux-kernel

Hi,

On 2018-04-23 13:42:08 -0700, Matthew Wilcox wrote:
> The errseq_t infrastructure assumes that errors which occurred before
> the file descriptor was opened are of no interest to the application.
> This turns out to be a regression for some applications, notably Postgres.
> 
> Before errseq_t, a writeback error would be reported exactly once (as
> long as the inode remained in memory), so Postgres could open a file,
> call fsync() and find out whether there had been a writeback error on
> that file from another process.
> 
> This patch restores that behaviour by reporting errors to file descriptors
> which are opened after the error occurred, but before it was reported
> to any file descriptor.
> 
> Cc: stable@vger.kernel.org
> Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting")
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> 
> diff --git a/lib/errseq.c b/lib/errseq.c
> index df782418b333..093f1fba4ee0 100644
> --- a/lib/errseq.c
> +++ b/lib/errseq.c
> @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
>  errseq_t errseq_sample(errseq_t *eseq)
>  {

There's a comment above this:
 *
 * This function allows callers to sample an errseq_t value, marking it as
 * "seen" if required.

>  	errseq_t old = READ_ONCE(*eseq);
> -	errseq_t new = old;
>  
> -	/*
> -	 * For the common case of no errors ever having been set, we can skip
> -	 * marking the SEEN bit. Once an error has been set, the value will
> -	 * never go back to zero.
> -	 */
> -	if (old != 0) {
> -		new |= ERRSEQ_SEEN;
> -		if (old != new)
> -			cmpxchg(eseq, old, new);
> -	}
> -	return new;

Which seems not to be true anymore after this hunk.


> +	/* If nobody has seen this error yet, then we can be the first. */
> +	if (!(old & ERRSEQ_SEEN))
> +		old = 0;
> +	return old;
>  }
>  EXPORT_SYMBOL(errseq_sample);

I've never really looked at this code in any depth before, but won't
this potentially lead to the same error being reported on multiple FDs?
Imagine two fds (potentially in different processes) getting the 0
returned by errseq_sample() because it's not ERRSEQ_SEEN. Afaict
file_check_and_advance_wb_err() will return an error that's always
unlike 0 in that case, and thus the error will returned on both fds?

I'm personally perfectly fine with that, but it's not necessarily what's
described as desired in your email?.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 20:57 ` Andres Freund
@ 2018-04-23 21:43   ` Matthew Wilcox
  2018-04-23 21:50     ` Andres Freund
  2018-04-23 21:51     ` Matthew Wilcox
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2018-04-23 21:43 UTC (permalink / raw)
  To: Andres Freund; +Cc: linux-fsdevel, Jeff Layton, linux-kernel

On Mon, Apr 23, 2018 at 01:57:30PM -0700, Andres Freund wrote:
> On 2018-04-23 13:42:08 -0700, Matthew Wilcox wrote:
> > @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
> >  errseq_t errseq_sample(errseq_t *eseq)
> >  {
> 
> There's a comment above this:
>  *
>  * This function allows callers to sample an errseq_t value, marking it as
>  * "seen" if required.

Oh, good catch.  I'll fix that.  Thanks!

> I've never really looked at this code in any depth before, but won't
> this potentially lead to the same error being reported on multiple FDs?
> Imagine two fds (potentially in different processes) getting the 0
> returned by errseq_sample() because it's not ERRSEQ_SEEN. Afaict
> file_check_and_advance_wb_err() will return an error that's always
> unlike 0 in that case, and thus the error will returned on both fds?
> 
> I'm personally perfectly fine with that, but it's not necessarily what's
> described as desired in your email?.

That's what I was trying to say with this paragraph:

> > This patch restores that behaviour by reporting errors to file descriptors
> > which are opened after the error occurred, but before it was reported
> > to any file descriptor.

Maybe it was a little unclear?  I'd appreciate a suggestion on making
it clearer.

I think this behaviour is perfectly justifiable.  Writeback errors occur
asynchronously to open.  Userspace can't tell the difference between:

kernel: writeback
p1: open
p2: open
p1: fsync
p2: fsync

and

p1: open
p2: open
kernel: writeback
p1: fsync
p2: fsync

Since both p1 and p2 would get the writeback error in the second case,
they can both get the writeback error in the first place.

p2 can't rely on this, after all we could have the sequence:

p1: open
p1: fsync
p2: open
p2: fsync

and p2 will not see the error, but it wouldn't have seen the error
before the errseq_t infrastructure went in.  And maybe p1 handled the
error three weeks ago; we don't want p2 to see the error.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 21:43   ` Matthew Wilcox
@ 2018-04-23 21:50     ` Andres Freund
  2018-04-23 21:51     ` Matthew Wilcox
  1 sibling, 0 replies; 7+ messages in thread
From: Andres Freund @ 2018-04-23 21:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Jeff Layton, linux-kernel

On 2018-04-23 14:43:48 -0700, Matthew Wilcox wrote:
> On Mon, Apr 23, 2018 at 01:57:30PM -0700, Andres Freund wrote:
> > I've never really looked at this code in any depth before, but won't
> > this potentially lead to the same error being reported on multiple FDs?
> > Imagine two fds (potentially in different processes) getting the 0
> > returned by errseq_sample() because it's not ERRSEQ_SEEN. Afaict
> > file_check_and_advance_wb_err() will return an error that's always
> > unlike 0 in that case, and thus the error will returned on both fds?
> > 
> > I'm personally perfectly fine with that, but it's not necessarily what's
> > described as desired in your email?.
> 
> That's what I was trying to say with this paragraph:
> 
> > > This patch restores that behaviour by reporting errors to file descriptors
> > > which are opened after the error occurred, but before it was reported
> > > to any file descriptor.
> 
> Maybe it was a little unclear?  I'd appreciate a suggestion on making
> it clearer.

I think I was thinking of the following paragraph from your commit
message:

> Before errseq_t, a writeback error would be reported exactly once (as
> long as the inode remained in memory), so Postgres could open a file,
> call fsync() and find out whether there had been a writeback error on
> that file from another process.

Note the "exactly once", making "that behaviour" in the following
paragraph potentially refer to exactly once:

> This patch restores that behaviour by reporting errors to file descriptors
> which are opened after the error occurred, but before it was reported
> to any file descriptor.

Just adding a sentence here saying something like "This means that the
error might be reported to more fds than before." or such would address
that potential ambiguity?

> I think this behaviour is perfectly justifiable.

I agree.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 21:43   ` Matthew Wilcox
  2018-04-23 21:50     ` Andres Freund
@ 2018-04-23 21:51     ` Matthew Wilcox
  2018-04-23 21:53       ` Andres Freund
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2018-04-23 21:51 UTC (permalink / raw)
  To: Andres Freund; +Cc: linux-fsdevel, Jeff Layton, linux-kernel

On Mon, Apr 23, 2018 at 02:43:48PM -0700, Matthew Wilcox wrote:
> On Mon, Apr 23, 2018 at 01:57:30PM -0700, Andres Freund wrote:
> > On 2018-04-23 13:42:08 -0700, Matthew Wilcox wrote:
> > > @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
> > >  errseq_t errseq_sample(errseq_t *eseq)
> > >  {
> > 
> > There's a comment above this:
> >  *
> >  * This function allows callers to sample an errseq_t value, marking it as
> >  * "seen" if required.
> 
> Oh, good catch.  I'll fix that.  Thanks!

How does this look?

@@ -111,27 +111,22 @@ EXPORT_SYMBOL(errseq_set);
  * errseq_sample() - Grab current errseq_t value.
  * @eseq: Pointer to errseq_t to be sampled.
  *
- * This function allows callers to sample an errseq_t value, marking it as
- * "seen" if required.
+ * This function allows callers to initialise their errseq_t variable.
+ * If the error has been "seen", new callers will not see an old error.
+ * If there is an unseen error in @eseq, the caller of this function will
+ * see it the next time it checks for an error.
  *
+ * Context: Any context.
  * Return: The current errseq value.
  */
 errseq_t errseq_sample(errseq_t *eseq)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 21:51     ` Matthew Wilcox
@ 2018-04-23 21:53       ` Andres Freund
  0 siblings, 0 replies; 7+ messages in thread
From: Andres Freund @ 2018-04-23 21:53 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Jeff Layton, linux-kernel

On 2018-04-23 14:51:50 -0700, Matthew Wilcox wrote:
> How does this look?
> 
> @@ -111,27 +111,22 @@ EXPORT_SYMBOL(errseq_set);
>   * errseq_sample() - Grab current errseq_t value.
>   * @eseq: Pointer to errseq_t to be sampled.
>   *
> - * This function allows callers to sample an errseq_t value, marking it as
> - * "seen" if required.
> + * This function allows callers to initialise their errseq_t variable.
> + * If the error has been "seen", new callers will not see an old error.
> + * If there is an unseen error in @eseq, the caller of this function will
> + * see it the next time it checks for an error.
>   *
> + * Context: Any context.
>   * Return: The current errseq value.
>   */
>  errseq_t errseq_sample(errseq_t *eseq)

LGTM.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Always report a writeback error once
  2018-04-23 20:42 [PATCH] Always report a writeback error once Matthew Wilcox
  2018-04-23 20:57 ` Andres Freund
@ 2018-04-24 12:24 ` Jeff Layton
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2018-04-24 12:24 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel, linux-kernel; +Cc: Andres Freund

On Mon, 2018-04-23 at 13:42 -0700, Matthew Wilcox wrote:
> The errseq_t infrastructure assumes that errors which occurred before
> the file descriptor was opened are of no interest to the application.
> This turns out to be a regression for some applications, notably Postgres.
> 
> Before errseq_t, a writeback error would be reported exactly once (as
> long as the inode remained in memory), so Postgres could open a file,
> call fsync() and find out whether there had been a writeback error on
> that file from another process.
> 
> This patch restores that behaviour by reporting errors to file descriptors
> which are opened after the error occurred, but before it was reported
> to any file descriptor.
> 
> Cc: stable@vger.kernel.org
> Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting")
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> 
> diff --git a/lib/errseq.c b/lib/errseq.c
> index df782418b333..093f1fba4ee0 100644
> --- a/lib/errseq.c
> +++ b/lib/errseq.c
> @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
>  errseq_t errseq_sample(errseq_t *eseq)
>  {
>  	errseq_t old = READ_ONCE(*eseq);
> -	errseq_t new = old;
>  
> -	/*
> -	 * For the common case of no errors ever having been set, we can skip
> -	 * marking the SEEN bit. Once an error has been set, the value will
> -	 * never go back to zero.
> -	 */
> -	if (old != 0) {
> -		new |= ERRSEQ_SEEN;
> -		if (old != new)
> -			cmpxchg(eseq, old, new);
> -	}
> -	return new;
> +	/* If nobody has seen this error yet, then we can be the first. */
> +	if (!(old & ERRSEQ_SEEN))
> +		old = 0;
> +	return old;
>  }
>  EXPORT_SYMBOL(errseq_sample);
>  

Patch looks good to me, modulo the comment fix that Andres pointed out.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-24 12:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-23 20:42 [PATCH] Always report a writeback error once Matthew Wilcox
2018-04-23 20:57 ` Andres Freund
2018-04-23 21:43   ` Matthew Wilcox
2018-04-23 21:50     ` Andres Freund
2018-04-23 21:51     ` Matthew Wilcox
2018-04-23 21:53       ` Andres Freund
2018-04-24 12:24 ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.