All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: Josef Bacik <jbacik@fb.com>, Jan Kara <jack@suse.cz>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
Date: Mon, 5 Jan 2015 14:02:44 -0500	[thread overview]
Message-ID: <20150105190243.GA51005@bfoster.bfoster> (raw)
In-Reply-To: <alpine.DEB.2.00.1501050819500.20175@cobra.newdream.net>

On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > We have been doing pretty well at populating xfstests with loads of
> > > > tests to catch regressions and validate we're all working properly.
> > > > One thing that has been lacking is a good way to verify file system
> > > > integrity after a power fail.  This is a core part of what file
> > > > systems are supposed to provide but it is probably the least tested
> > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > correctness, but these tests do not catch the random horrible things
> > > > that can go wrong.  We are still finding horrible scary things that
> > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > for.
> > > > 
> > > > I have been working on an idea to do this better, some may have seen
> > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > lot changing in this area in the time between now and March but it
> > > > would be good to have everybody in the room talking about what they
> > > > would need to build a good and deterministic test to make sure we're
> > > > always giving a consistent file system and to make sure our fsync()
> > > > handling is working properly.  Thanks,
> > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > material for discussion there, unless we have something more tangible -
> > > when we have some implementation, we can talk about pros and cons of it,
> > > what still needs doing etc.
> > > 
> > 
> > Right that's what I was getting at.  I have a solution and have sent it around
> > but there doesn't seem to be too many people interested in commenting on it.
> > I figure one of two things will happen
> > 
> > 1) My solution will go in before LSF, in which case YAY my job is done and
> > this is more of an [ATTEND] than a [TOPIC], or
> > 
> > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > how we can integrate it into xfstests, future features, other areas we could
> > test etc.
> > 
> > Maybe not a full blown slot but combined with a overall testing slot or hell
> > just a quick lightening talk.  Thanks,
> 
> I have a related topic that may make sense to fit into any discussion 
> about this. Twice recently we've run into trouble using newish or less 
> common (combinations of) syscalls.
> 
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.
> 

Was there a report or any other details on this one? In particular, I'm
wondering if this is related to the problem exposed by xfstests test
xfs/053...

Brian

> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)
> 
> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests.  For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.
> 
> As things stand now the other devs are loathe to touch any remotely exotic 
> fs call, but that hardly seems ideal.  Hopefully a common framework for 
> powerfail testing can improve on this.  Perhaps there are other ways we 
> make it easier to tell what is (well) tested, and conversely ensure that 
> those tests are well-aligned with what real users are doing...
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-01-05 19:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09   ` Josef Bacik
2015-01-05 18:34     ` Sage Weil
2015-01-05 19:02       ` Brian Foster [this message]
2015-01-05 19:13         ` Sage Weil
2015-01-05 19:33           ` Brian Foster
2015-01-05 21:17       ` Jan Kara
2015-01-05 21:47       ` Dave Chinner
2015-01-05 22:26         ` Sage Weil
2015-01-05 23:27           ` Dave Chinner
2015-01-06 17:37             ` Sage Weil
2015-01-06  8:53         ` Jan Kara
2015-01-06 16:39           ` Josef Bacik
2015-01-06 22:07           ` Dave Chinner
2015-01-07 10:10             ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150105190243.GA51005@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.