All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/2 V4] Resubmit items failed during writeback
Date: Wed, 21 Jun 2017 06:15:26 -0400	[thread overview]
Message-ID: <20170621101526.GC28914@bfoster.bfoster> (raw)
In-Reply-To: <20170621004550.GF4733@birch.djwong.org>

On Tue, Jun 20, 2017 at 05:45:50PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 19, 2017 at 02:51:11PM -0400, Brian Foster wrote:
> > On Mon, Jun 19, 2017 at 10:42:03AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 19, 2017 at 09:51:22AM -0400, Brian Foster wrote:
> > > > On Fri, Jun 16, 2017 at 12:54:43PM +0200, Carlos Maiolino wrote:
> > > > > Hi,
> > > > > 
> > > > > there goes a new version of this patchset based on previous reviews on V3.
> > > > > 
> > > > > Changelogs added separated to each patch.
> > > > > 
> > > > 
> > > > Hi Carlos,
> > > > 
> > > > I pointed out the last things that I'm aware of that I think need to be
> > > > fixed in this series (along with a few nits here and there). That said,
> > > > it's already been pointed out that we probably want an xfstests test
> > > > case to go along with this before it would be merged. Is that something
> > > > you are still planning on?
> > > > 
> > > > I'd actually like to take this a bit farther than a regression test and
> > > > see if we can improve our ability to test the error handling code in
> > > > general. What do you (or anybody else..) think about including a patch
> > > > in this series that introduces the ability to inject metadata writeback
> > > > errors on DEBUG kernels? For example, consider something that just sets
> > > > b_error at the top of xfs_buf_iodone_callbacks() based on a random value
> > > > and configurable error frequency. This could use XFS_TEST_ERROR() or
> > > > something like a new DEBUG sysfs attribute in the error configuration
> > > > (see log_badcrc_factor for a similar example).
> > > 
> > > Sounds reasonable.
> > > 
> > > I wonder if it would be more useful to have individual knobs for each
> > > metadata object type so that you could have multiple xfstests, each of
> > > which runs the same software scenario but with different failures
> > > configured?  Then we could test what happens when, say, only inode
> > > writes fail, or bmbt writes fail, etc. rather than one big hammer that's
> > > harder to control?
> > > 
> > 
> > I suppose you could do some of that in the test just by making certain
> > modifications in isolation (e.g., test an inode modification, test a
> > dquot, buffer, etc..). It might be harder to do that from a test script
> > at the granularity of things like bmbt buffers, though that may also be
> > the case for error injection. Did you have something in mind to filter
> > the buffer types.. the blf type perhaps?
> > 
> > > For a moment I also wondered why not use the error injection mechanism
> > > that we already have, rather than inventing more sysfs knobs?  But we
> > > have limited space in xfs_error_injection (29/32 bits used) so we
> > > probably should just switch everything to sysfs knobs.
> > > 
> > 
> > Yeah, I think we discussed this before as well. IIRC, I had already
> > implemented whatever the sysfs knob was and and corresponding test, and
> > it just wasn't really important enough to reimplement the whole thing.
> 
> We probably did, back when the other error injection knobs (drop_writes,
> log/log_badcrc_factor) got added.  Unfortunately now they're all over
> sysfs, which makes for sort of a mess.
> 
> > I didn't really have a preference as to how this would be implemented.
> > On a quick look though, it looks like a downside to the existing
> > injection mechanism is that we can't control the error frequency..? If
> > that's the case, I like the idea of stepping back, perhaps audit the
> > granularity of the broader error injection framework and define a new
> > sysfs interface that also allows controlling things like the error
> > frequency dynamically.
> 
> Yes, I've heard you ask for this twice now.  I agree that being able to
> control the frequency would be a useful feature to enable longer-running
> tests so that you could, say, have the refcount_finish_one error trigger
> once every 1,000,000 updates instead of immediately afterwards.
> 

Yep. It allows modeling for longevity tests as you note above, but I'm
also thinking about the ability to turn a particular error on with 100%
frequency in order to run fast, deterministic regression tests.

> So with that in mind I coded up a quick RFC to create sysfs knobs in
> /sys/fs/xfs/$device/errortag/$tagname ; the units are the same as the
> XFS_RANDOM_ values (i.e. inverted frequency).  Now we're free of the
> limitation of only being able to inject 10 error types across all
> mounted fses, and we can individually disable injection too.
> 

Nice, that sounds very interesting.. thanks!

Brian

> > (I would still like to get something targeted towards testing this
> > series in the meantime.)
> 
> Same here.
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > This would facilitate longer running tests where iodone callback errors
> > > > occur randomly and transiently and we can thus actually exercise the
> > > > error handling and recovery code. I'd love to run some fsstress testing,
> > > > for example, as I'm hoping that would help wring out any further issues
> > > > that could be lurking (particularly with the tricky xfs_iflush_done()
> > > > logic and whatnot). If implemented generally enough, that could also be
> > > > reused for a more simple xfstests regression test for the original
> > > > problem (e.g., mount fs, set error frequency = 100%, modify an inode,
> > > > set error frequency = 0, umount), albeit it would require debug mode.
> > > > Thoughts?
> > > > 
> > > > Brian
> > > > 
> > > > > 
> > > > > -- 
> > > > > 2.9.4
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-06-21 10:15 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-16 10:54 [PATCH 0/2 V4] Resubmit items failed during writeback Carlos Maiolino
2017-06-16 10:54 ` [PATCH 1/2 V4] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-06-19 13:48   ` Brian Foster
2017-06-20  7:15     ` Carlos Maiolino
2017-06-16 10:54 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-16 11:06   ` Carlos Maiolino
2017-06-16 18:35   ` Luis R. Rodriguez
2017-06-16 19:24     ` Darrick J. Wong
2017-06-16 19:37       ` Luis R. Rodriguez
2017-06-16 19:45         ` Eric Sandeen
2017-06-19 10:59           ` Brian Foster
2017-06-20 16:52             ` Luis R. Rodriguez
2017-06-20 17:20               ` Brian Foster
2017-06-20 18:05                 ` Luis R. Rodriguez
2017-06-21 10:10                   ` Brian Foster
2017-06-21 15:25                     ` Luis R. Rodriguez
2017-06-20 18:38                 ` Luis R. Rodriguez
2017-06-20  7:01     ` Carlos Maiolino
2017-06-20 16:24       ` Luis R. Rodriguez
2017-06-21 11:51         ` Carlos Maiolino
2017-06-19 13:49   ` Brian Foster
2017-06-19 15:09     ` Brian Foster
2017-06-19 13:51 ` [PATCH 0/2 V4] Resubmit items failed during writeback Brian Foster
2017-06-19 17:42   ` Darrick J. Wong
2017-06-19 18:51     ` Brian Foster
2017-06-21  0:45       ` Darrick J. Wong
2017-06-21 10:15         ` Brian Foster [this message]
2017-06-21 11:03           ` Carlos Maiolino
2017-06-21 11:51             ` Brian Foster
2017-06-21 16:54               ` Darrick J. Wong
2017-06-22 12:05                 ` Carlos Maiolino
2017-06-22 12:40                   ` Brian Foster
2017-06-30 11:09                     ` Carlos Maiolino
2017-06-30 11:33                       ` Brian Foster
2017-06-30 12:22                         ` Carlos Maiolino
2017-06-30 17:01                           ` Darrick J. Wong
2017-07-03  8:37                             ` Carlos Maiolino
2017-06-21 16:45             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170621101526.GC28914@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.