linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: 焦晓冬 <milestonejxd@gmail.com>, R.E.Wolff@bitwizard.nl
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: POSIX violation by writeback error
Date: Tue, 04 Sep 2018 07:09:34 -0400	[thread overview]
Message-ID: <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com> (raw)
In-Reply-To: <CAJDTihzqn3whQ47uUOxGYk4Je4S10ehNEQCtfb=j--iCsdDqgQ@mail.gmail.com>

On Tue, 2018-09-04 at 16:58 +0800, 焦晓冬 wrote:
> On Tue, Sep 4, 2018 at 3:53 PM Rogier Wolff <R.E.Wolff@bitwizard.nl> wrote:
> 
> ...
> > > 
> > > Jlayton's patch is simple but wonderful idea towards correct error
> > > reporting. It seems one crucial thing is still here to be fixed. Does
> > > anyone have some idea?
> > > 
> > > The crucial thing may be that a read() after a successful
> > > open()-write()-close() may return old data.
> > > 
> > > That may happen where an async writeback error occurs after close()
> > > and the inode/mapping get evicted before read().
> > 
> > Suppose I have 1Gb of RAM. Suppose I open a file, write 0.5Gb to it
> > and then close it. Then I repeat this 9 times.
> > 
> > Now, when writing those files to storage fails, there is 5Gb of data
> > to remember and only 1Gb of RAM.
> > 
> > I can choose any part of that 5Gb and try to read it.
> > 
> > Please make a suggestion about where we should store that data?
> 
> That is certainly not possible to be done. But at least, shall we report
> error on read()? Silently returning wrong data may cause further damage,
> such as removing wrong files since it was marked as garbage in the old file.
> 

Is the data wrong though? You tried to write and then that failed.
Eventually we want to be able to get at the data that's actually in the
file -- what is that point?

If I get an error back on a read, why should I think that it has
anything at all to do with writes that previously failed? It may even
have been written by a completely separate process that I had nothing at
all to do with.

> As I can see, that is all about error reporting.
> 
> As for suggestion, maybe the error flag of inode/mapping, or the entire inode
> should not be evicted if there was an error. That hopefully won't take much
> memory. On extreme conditions, where too much error inode requires staying
> in memory, maybe we should panic rather then spread the error.
> 
> > 
> > In the easy case, where the data easily fits in RAM, you COULD write a
> > solution. But when the hardware fails, the SYSTEM will not be able to
> > follow the posix rules.
> 
> Nope, we are able to follow the rules. The above is one way that follows the
> POSIX rules.
> 

This is something we discussed at LSF this year.

We could attempt to keep dirty data around for a little while, at least
long enough to ensure that reads reflect earlier writes until the errors
can be scraped out by fsync. That would sort of redefine fsync from
being "ensure that my writes are flushed" to "synchronize my cache with
the current state of the file".

The problem of course is that applications are not required to do fsync
at all. At what point do we give up on it, and toss out the pages that
can't be cleaned?

We could allow for a tunable that does a kernel panic if writebacks fail
and the errors are never fetched via fsync, and we run out of memory. I
don't think that is something most users would want though.

Another thought: maybe we could OOM kill any process that has the file
open and then toss out the page data in that situation?

I'm wide open to (good) ideas here.
-- 
Jeff Layton <jlayton@redhat.com>

  parent reply	other threads:[~2018-09-04 15:34 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04  6:32 POSIX violation by writeback error 焦晓冬
2018-09-04  7:53 ` Rogier Wolff
2018-09-04  8:58   ` 焦晓冬
2018-09-04  9:29     ` Rogier Wolff
2018-09-04 10:45       ` 焦晓冬
2018-09-04 11:09     ` Jeff Layton [this message]
2018-09-04 14:56       ` 焦晓冬
2018-09-04 15:44         ` Jeff Layton
2018-09-04 16:12           ` J. Bruce Fields
2018-09-04 16:23             ` Rogier Wolff
2018-09-04 18:54               ` J. Bruce Fields
2018-09-04 20:18                 ` Jeff Layton
2018-09-04 20:35                   ` Vito Caputo
2018-09-04 21:02                     ` Matthew Wilcox
2018-09-05  0:51                     ` Dave Chinner
2018-09-05  8:24                   ` 焦晓冬
2018-09-05 10:55                     ` Jeff Layton
2018-09-05 12:07                       ` Rogier Wolff
2018-09-06  2:57                         ` Dave Chinner
2018-09-06  9:17                           ` Rogier Wolff
2018-09-24 23:09                             ` Alan Cox
2018-09-05 13:53                       ` J. Bruce Fields
2018-09-05  7:08           ` Rogier Wolff
2018-09-05  7:39             ` Martin Steigerwald
2018-09-05  8:04               ` Rogier Wolff
2018-09-05  8:37                 ` 焦晓冬
2018-09-05 12:07                   ` Austin S. Hemmelgarn
2018-09-05 12:46                     ` Rogier Wolff
2018-09-05  9:32                 ` Martin Steigerwald
2018-09-05  7:37           ` Martin Steigerwald
2018-09-05 11:42             ` Jeff Layton
2018-09-05  8:09           ` 焦晓冬
2018-09-05 13:08             ` Theodore Y. Ts'o
2018-09-24 23:21               ` Alan Cox
2018-09-06  7:28             ` 焦晓冬
     [not found] <CAJDTihx2yaR-_-9Ks1PoFcrKNZgUOoLdN-wRTTMV76Jg_dCLrw@mail.gmail.com>
2018-09-04 10:56 ` Jeff Layton
2018-09-24 23:30   ` Alan Cox
2018-09-25 11:15     ` Jeff Layton
2018-09-25 15:46       ` Theodore Y. Ts'o
2018-09-25 16:17         ` Rogier Wolff
2018-09-25 16:39         ` Alan Cox
2018-09-25 16:41         ` Jeff Layton
2018-09-25 22:30           ` Theodore Y. Ts'o
2018-09-26 18:10             ` Alan Cox
2018-09-26 21:49               ` Theodore Y. Ts'o
2018-09-27 22:48                 ` Alan Cox
2018-09-27  7:18               ` Rogier Wolff
2018-09-27 12:43             ` Jeff Layton
2018-09-27 14:27               ` Theodore Y. Ts'o
2018-09-25 17:35         ` Adam Borowski
2018-09-25 22:46           ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=R.E.Wolff@bitwizard.nl \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=milestonejxd@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).