All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: fstests generic/441 -- occasional bcachefs failure
Date: Sat, 4 Feb 2023 16:33:41 -0500	[thread overview]
Message-ID: <Y97PNdqx82rot2WC@bfoster> (raw)
In-Reply-To: <Y9w/oMzjVp90c/Pk@moria.home.lan>

On Thu, Feb 02, 2023 at 05:56:32PM -0500, Kent Overstreet wrote:
> On Tue, Jan 31, 2023 at 11:04:11AM -0500, Brian Foster wrote:
> > On Mon, Jan 30, 2023 at 12:06:34PM -0500, Kent Overstreet wrote:
> > > On Fri, Jan 27, 2023 at 09:50:05AM -0500, Brian Foster wrote:
> > > > Something else that occurred to me while looking further at this is we
> > > > can also avoid the error in this case fairly easily by bailing out of
> > > > bch2_fsync() if page writeback fails, as opposed to the unconditional
> > > > flush -> sync meta -> flush log sequence that returns the first error
> > > > anyways. That would prevent marking the inode with a new sequence number
> > > > when I/Os are obviously failing. The caveat is that the test still
> > > > fails, now with a "Read-only file system" error instead of EIO, because
> > > > the filesystem is shutdown by the time the vfs write inode path actually
> > > > runs.
> > > 
> > > If some pages did write successfully we don't want to skip the rest of
> > > the fsync, though.
> > > 
> > 
> > What does it matter if the fsync() has already failed? ISTM this is
> > pretty standard error handling behavior across major fs', but it's not
> > clear to me if there's some bcachefs specific quirk that warrants
> > different handling..
> > 
> > FWIW, I think I've been able to fix this test with a couple small
> > tweaks:
> > 
> > 1. Change bch2_fsync() to return on first error.
> 
> I suppose it doesn't make sense to flush the journal if
> file_write_and_wait_range() didn't successfully do anything - but I
> would prefer to keep the behaviour where we do flush the journal on
> partial writeback.
> 

That seems reasonable to me in principle...

> What if we plumbed a did_work parameter through?
> 

... but I'm not sure how that is supposed to work..? Tracking submits
from the current flush wouldn't be hard, but wouldn't tell us about
writeback that might have occurred in the background before fsync was
called. It also doesn't give any information about what I/Os succeeded
or failed.

I was thinking about whether we could look at PageError pages or perhaps
sample errseq or something in the error case, but the fdatawait can
clear page errors and errseq only bumps the sequence value if it's been
sampled since the last update (so no way to distinguish between "one
failure" and "everything failed").

The only other thing that comes to mind is explicit
submit/completion/error counting and tracking somewhere in the mapping
or inode or some such, but that seems like overkill given fsync error
semantics. I.e.  when fsync returns an error, we're basically telling
the user "data loss somewhere in the file," without indication of where
or how much, so the user basically has to recreate the file or toss it
anyways. It would be nice to improve on that, but that's a much bigger
problem than I'm trying to address here. ;P

So I dunno.. I'll try to think a bit more about it but atm I'm not
seeing any sort of elegant way to fix this test and also preserve the
inode flush in the event of writeback failure. Thoughts? ISTM that it's
reasonable to skip the flush on wb error given fsync behavior and error
semantics seem common across all major fs', and as background writeback
should still try to handle it in short order anyways, but it's your
tree.. :)

Brian


  reply	other threads:[~2023-02-04 21:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 15:45 fstests generic/441 -- occasional bcachefs failure Brian Foster
2023-01-26 15:08 ` Kent Overstreet
2023-01-27  7:21   ` Kent Overstreet
2023-01-27 14:50   ` Brian Foster
2023-01-30 17:06     ` Kent Overstreet
2023-01-31 16:04       ` Brian Foster
2023-02-01 14:34         ` Kent Overstreet
2023-02-02 15:50           ` Brian Foster
2023-02-02 17:09             ` Freezing (was: Re: fstests generic/441 -- occasional bcachefs failure) Kent Overstreet
2023-02-02 20:04               ` Brian Foster
2023-02-02 22:39                 ` Kent Overstreet
2023-02-03  0:51               ` Dave Chinner
2023-02-04  0:35                 ` Kent Overstreet
2023-02-07  0:03                   ` Dave Chinner
2023-02-16 20:04                     ` Eric Wheeler
2023-02-20 22:19                       ` Dave Chinner
2023-02-20 23:23                         ` Kent Overstreet
2023-02-02 22:56         ` fstests generic/441 -- occasional bcachefs failure Kent Overstreet
2023-02-04 21:33           ` Brian Foster [this message]
2023-02-04 22:15             ` Kent Overstreet
2023-02-06 15:33               ` Brian Foster
2023-02-06 22:18                 ` Kent Overstreet
2023-02-09 12:57                   ` Brian Foster
2023-02-09 14:58                     ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y97PNdqx82rot2WC@bfoster \
    --to=bfoster@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.