All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Working towards better power fail testing
@ 2014-12-08 22:11 Josef Bacik
  2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
  2015-01-13 17:05 ` Dmitry Monakhov
  0 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2014-12-08 22:11 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel

Hello,

We have been doing pretty well at populating xfstests with loads of 
tests to catch regressions and validate we're all working properly.  One 
thing that has been lacking is a good way to verify file system 
integrity after a power fail.  This is a core part of what file systems 
are supposed to provide but it is probably the least tested aspect.  We 
have dm-flakey tests in xfstests to test fsync correctness, but these 
tests do not catch the random horrible things that can go wrong.  We are 
still finding horrible scary things that go wrong in Btrfs because it is 
simply hard to reproduce and test for.

I have been working on an idea to do this better, some may have seen my 
dm-power-fail attempt, and I've got a new incarnation of the idea thanks 
to discussions with Zach Brown.  Obviously there will be a lot changing 
in this area in the time between now and March but it would be good to 
have everybody in the room talking about what they would need to build a 
good and deterministic test to make sure we're always giving a 
consistent file system and to make sure our fsync() handling is working 
properly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
@ 2014-12-10 11:27 ` Jan Kara
  2014-12-10 15:09   ` Josef Bacik
  2015-01-13 17:05 ` Dmitry Monakhov
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Kara @ 2014-12-10 11:27 UTC (permalink / raw)
  To: Josef Bacik; +Cc: lsf-pc, linux-fsdevel

On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> Hello,
> 
> We have been doing pretty well at populating xfstests with loads of
> tests to catch regressions and validate we're all working properly.
> One thing that has been lacking is a good way to verify file system
> integrity after a power fail.  This is a core part of what file
> systems are supposed to provide but it is probably the least tested
> aspect.  We have dm-flakey tests in xfstests to test fsync
> correctness, but these tests do not catch the random horrible things
> that can go wrong.  We are still finding horrible scary things that
> go wrong in Btrfs because it is simply hard to reproduce and test
> for.
> 
> I have been working on an idea to do this better, some may have seen
> my dm-power-fail attempt, and I've got a new incarnation of the idea
> thanks to discussions with Zach Brown.  Obviously there will be a
> lot changing in this area in the time between now and March but it
> would be good to have everybody in the room talking about what they
> would need to build a good and deterministic test to make sure we're
> always giving a consistent file system and to make sure our fsync()
> handling is working properly.  Thanks,
  I agree we are lacking in testing this aspect. Just I don't see too much
material for discussion there, unless we have something more tangible -
when we have some implementation, we can talk about pros and cons of it,
what still needs doing etc.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
@ 2014-12-10 15:09   ` Josef Bacik
  2015-01-05 18:34     ` Sage Weil
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2014-12-10 15:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: lsf-pc, linux-fsdevel

On 12/10/2014 06:27 AM, Jan Kara wrote:
> On Mon 08-12-14 17:11:41, Josef Bacik wrote:
>> Hello,
>>
>> We have been doing pretty well at populating xfstests with loads of
>> tests to catch regressions and validate we're all working properly.
>> One thing that has been lacking is a good way to verify file system
>> integrity after a power fail.  This is a core part of what file
>> systems are supposed to provide but it is probably the least tested
>> aspect.  We have dm-flakey tests in xfstests to test fsync
>> correctness, but these tests do not catch the random horrible things
>> that can go wrong.  We are still finding horrible scary things that
>> go wrong in Btrfs because it is simply hard to reproduce and test
>> for.
>>
>> I have been working on an idea to do this better, some may have seen
>> my dm-power-fail attempt, and I've got a new incarnation of the idea
>> thanks to discussions with Zach Brown.  Obviously there will be a
>> lot changing in this area in the time between now and March but it
>> would be good to have everybody in the room talking about what they
>> would need to build a good and deterministic test to make sure we're
>> always giving a consistent file system and to make sure our fsync()
>> handling is working properly.  Thanks,
>    I agree we are lacking in testing this aspect. Just I don't see too much
> material for discussion there, unless we have something more tangible -
> when we have some implementation, we can talk about pros and cons of it,
> what still needs doing etc.
>

Right that's what I was getting at.  I have a solution and have sent it 
around but there doesn't seem to be too many people interested in 
commenting on it.  I figure one of two things will happen

1) My solution will go in before LSF, in which case YAY my job is done 
and this is more of an [ATTEND] than a [TOPIC], or

2) My solution hasn't gone in yet and I'd like to discuss my methodology 
and how we can integrate it into xfstests, future features, other areas 
we could test etc.

Maybe not a full blown slot but combined with a overall testing slot or 
hell just a quick lightening talk.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2014-12-10 15:09   ` Josef Bacik
@ 2015-01-05 18:34     ` Sage Weil
  2015-01-05 19:02       ` Brian Foster
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Sage Weil @ 2015-01-05 18:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jan Kara, lsf-pc, linux-fsdevel

On Wed, 10 Dec 2014, Josef Bacik wrote:
> On 12/10/2014 06:27 AM, Jan Kara wrote:
> > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > Hello,
> > > 
> > > We have been doing pretty well at populating xfstests with loads of
> > > tests to catch regressions and validate we're all working properly.
> > > One thing that has been lacking is a good way to verify file system
> > > integrity after a power fail.  This is a core part of what file
> > > systems are supposed to provide but it is probably the least tested
> > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > correctness, but these tests do not catch the random horrible things
> > > that can go wrong.  We are still finding horrible scary things that
> > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > for.
> > > 
> > > I have been working on an idea to do this better, some may have seen
> > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > lot changing in this area in the time between now and March but it
> > > would be good to have everybody in the room talking about what they
> > > would need to build a good and deterministic test to make sure we're
> > > always giving a consistent file system and to make sure our fsync()
> > > handling is working properly.  Thanks,
> >    I agree we are lacking in testing this aspect. Just I don't see too much
> > material for discussion there, unless we have something more tangible -
> > when we have some implementation, we can talk about pros and cons of it,
> > what still needs doing etc.
> > 
> 
> Right that's what I was getting at.  I have a solution and have sent it around
> but there doesn't seem to be too many people interested in commenting on it.
> I figure one of two things will happen
> 
> 1) My solution will go in before LSF, in which case YAY my job is done and
> this is more of an [ATTEND] than a [TOPIC], or
> 
> 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> how we can integrate it into xfstests, future features, other areas we could
> test etc.
> 
> Maybe not a full blown slot but combined with a overall testing slot or hell
> just a quick lightening talk.  Thanks,

I have a related topic that may make sense to fit into any discussion 
about this. Twice recently we've run into trouble using newish or less 
common (combinations of) syscalls.

The first instance was with the use of sync_file_range to try to 
control/limit the amount of dirty data in the page cache.  This, possibly 
in combination with posix_fadvise(DONTNEED), managed to break the 
writeback sequence in XFS and led to data corruption after power loss.

The other issue we saw was just a general raft of FIEMAP bugs over the 
last year or two. We saw cases where even after fsync a fiemap result 
would not include all extents, and (not unexpectedly) lots of corner cases 
in several file systems, e.g., around partial blocks at end of file.  (As 
far as I know everything we saw is resolved in current kernels.)

I'm not so concerned with these specific bugs, but worried that we 
(perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
this is a general case where a newish syscall/ioctl should be tested 
carefully with our workloads before being relied upon, and we could have 
worked to make sure e.g. xfstests has appropriate tests.  For power fail 
testing in particular, though, right now it isn't clear who is testing 
what under what workloads, so the only really "safe" approach is to stick 
to whatever syscall combinations we think the rest of the world is using, 
or make sure we test ourselves.

As things stand now the other devs are loathe to touch any remotely exotic 
fs call, but that hardly seems ideal.  Hopefully a common framework for 
powerfail testing can improve on this.  Perhaps there are other ways we 
make it easier to tell what is (well) tested, and conversely ensure that 
those tests are well-aligned with what real users are doing...

sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 18:34     ` Sage Weil
@ 2015-01-05 19:02       ` Brian Foster
  2015-01-05 19:13         ` Sage Weil
  2015-01-05 21:17       ` Jan Kara
  2015-01-05 21:47       ` Dave Chinner
  2 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2015-01-05 19:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > We have been doing pretty well at populating xfstests with loads of
> > > > tests to catch regressions and validate we're all working properly.
> > > > One thing that has been lacking is a good way to verify file system
> > > > integrity after a power fail.  This is a core part of what file
> > > > systems are supposed to provide but it is probably the least tested
> > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > correctness, but these tests do not catch the random horrible things
> > > > that can go wrong.  We are still finding horrible scary things that
> > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > for.
> > > > 
> > > > I have been working on an idea to do this better, some may have seen
> > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > lot changing in this area in the time between now and March but it
> > > > would be good to have everybody in the room talking about what they
> > > > would need to build a good and deterministic test to make sure we're
> > > > always giving a consistent file system and to make sure our fsync()
> > > > handling is working properly.  Thanks,
> > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > material for discussion there, unless we have something more tangible -
> > > when we have some implementation, we can talk about pros and cons of it,
> > > what still needs doing etc.
> > > 
> > 
> > Right that's what I was getting at.  I have a solution and have sent it around
> > but there doesn't seem to be too many people interested in commenting on it.
> > I figure one of two things will happen
> > 
> > 1) My solution will go in before LSF, in which case YAY my job is done and
> > this is more of an [ATTEND] than a [TOPIC], or
> > 
> > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > how we can integrate it into xfstests, future features, other areas we could
> > test etc.
> > 
> > Maybe not a full blown slot but combined with a overall testing slot or hell
> > just a quick lightening talk.  Thanks,
> 
> I have a related topic that may make sense to fit into any discussion 
> about this. Twice recently we've run into trouble using newish or less 
> common (combinations of) syscalls.
> 
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.
> 

Was there a report or any other details on this one? In particular, I'm
wondering if this is related to the problem exposed by xfstests test
xfs/053...

Brian

> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)
> 
> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests.  For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.
> 
> As things stand now the other devs are loathe to touch any remotely exotic 
> fs call, but that hardly seems ideal.  Hopefully a common framework for 
> powerfail testing can improve on this.  Perhaps there are other ways we 
> make it easier to tell what is (well) tested, and conversely ensure that 
> those tests are well-aligned with what real users are doing...
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 19:02       ` Brian Foster
@ 2015-01-05 19:13         ` Sage Weil
  2015-01-05 19:33           ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Sage Weil @ 2015-01-05 19:13 UTC (permalink / raw)
  To: Brian Foster; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon, 5 Jan 2015, Brian Foster wrote:
> On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> > On Wed, 10 Dec 2014, Josef Bacik wrote:
> > > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > > Hello,
> > > > > 
> > > > > We have been doing pretty well at populating xfstests with loads of
> > > > > tests to catch regressions and validate we're all working properly.
> > > > > One thing that has been lacking is a good way to verify file system
> > > > > integrity after a power fail.  This is a core part of what file
> > > > > systems are supposed to provide but it is probably the least tested
> > > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > > correctness, but these tests do not catch the random horrible things
> > > > > that can go wrong.  We are still finding horrible scary things that
> > > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > > for.
> > > > > 
> > > > > I have been working on an idea to do this better, some may have seen
> > > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > > lot changing in this area in the time between now and March but it
> > > > > would be good to have everybody in the room talking about what they
> > > > > would need to build a good and deterministic test to make sure we're
> > > > > always giving a consistent file system and to make sure our fsync()
> > > > > handling is working properly.  Thanks,
> > > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > > material for discussion there, unless we have something more tangible -
> > > > when we have some implementation, we can talk about pros and cons of it,
> > > > what still needs doing etc.
> > > > 
> > > 
> > > Right that's what I was getting at.  I have a solution and have sent it around
> > > but there doesn't seem to be too many people interested in commenting on it.
> > > I figure one of two things will happen
> > > 
> > > 1) My solution will go in before LSF, in which case YAY my job is done and
> > > this is more of an [ATTEND] than a [TOPIC], or
> > > 
> > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > > how we can integrate it into xfstests, future features, other areas we could
> > > test etc.
> > > 
> > > Maybe not a full blown slot but combined with a overall testing slot or hell
> > > just a quick lightening talk.  Thanks,
> > 
> > I have a related topic that may make sense to fit into any discussion 
> > about this. Twice recently we've run into trouble using newish or less 
> > common (combinations of) syscalls.
> > 
> > The first instance was with the use of sync_file_range to try to 
> > control/limit the amount of dirty data in the page cache.  This, possibly 
> > in combination with posix_fadvise(DONTNEED), managed to break the 
> > writeback sequence in XFS and led to data corruption after power loss.
> > 
> 
> Was there a report or any other details on this one? In particular, I'm
> wondering if this is related to the problem exposed by xfstests test
> xfs/053...

This is the original thread:

	http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

Looks like 053 is about ACLs though?

sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 19:13         ` Sage Weil
@ 2015-01-05 19:33           ` Brian Foster
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Foster @ 2015-01-05 19:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon, Jan 05, 2015 at 11:13:28AM -0800, Sage Weil wrote:
> On Mon, 5 Jan 2015, Brian Foster wrote:
> > On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> > > On Wed, 10 Dec 2014, Josef Bacik wrote:
> > > > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > We have been doing pretty well at populating xfstests with loads of
> > > > > > tests to catch regressions and validate we're all working properly.
> > > > > > One thing that has been lacking is a good way to verify file system
> > > > > > integrity after a power fail.  This is a core part of what file
> > > > > > systems are supposed to provide but it is probably the least tested
> > > > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > > > correctness, but these tests do not catch the random horrible things
> > > > > > that can go wrong.  We are still finding horrible scary things that
> > > > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > > > for.
> > > > > > 
> > > > > > I have been working on an idea to do this better, some may have seen
> > > > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > > > lot changing in this area in the time between now and March but it
> > > > > > would be good to have everybody in the room talking about what they
> > > > > > would need to build a good and deterministic test to make sure we're
> > > > > > always giving a consistent file system and to make sure our fsync()
> > > > > > handling is working properly.  Thanks,
> > > > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > > > material for discussion there, unless we have something more tangible -
> > > > > when we have some implementation, we can talk about pros and cons of it,
> > > > > what still needs doing etc.
> > > > > 
> > > > 
> > > > Right that's what I was getting at.  I have a solution and have sent it around
> > > > but there doesn't seem to be too many people interested in commenting on it.
> > > > I figure one of two things will happen
> > > > 
> > > > 1) My solution will go in before LSF, in which case YAY my job is done and
> > > > this is more of an [ATTEND] than a [TOPIC], or
> > > > 
> > > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > > > how we can integrate it into xfstests, future features, other areas we could
> > > > test etc.
> > > > 
> > > > Maybe not a full blown slot but combined with a overall testing slot or hell
> > > > just a quick lightening talk.  Thanks,
> > > 
> > > I have a related topic that may make sense to fit into any discussion 
> > > about this. Twice recently we've run into trouble using newish or less 
> > > common (combinations of) syscalls.
> > > 
> > > The first instance was with the use of sync_file_range to try to 
> > > control/limit the amount of dirty data in the page cache.  This, possibly 
> > > in combination with posix_fadvise(DONTNEED), managed to break the 
> > > writeback sequence in XFS and led to data corruption after power loss.
> > > 
> > 
> > Was there a report or any other details on this one? In particular, I'm
> > wondering if this is related to the problem exposed by xfstests test
> > xfs/053...
> 
> This is the original thread:
> 
> 	http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
> 

Thanks. It does look similar to xfs/053, the intent of which was to
indirectly create the kind of writeback pattern that exposes this.

> Looks like 053 is about ACLs though?
> 

generic/053 does something with ACLs, xfs/053 is the test of interest.
Regardless, from the thread above it sounds like Dave had honed in on
the cause.

Brian

> sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 18:34     ` Sage Weil
  2015-01-05 19:02       ` Brian Foster
@ 2015-01-05 21:17       ` Jan Kara
  2015-01-05 21:47       ` Dave Chinner
  2 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2015-01-05 21:17 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon 05-01-15 10:34:57, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > We have been doing pretty well at populating xfstests with loads of
> > > > tests to catch regressions and validate we're all working properly.
> > > > One thing that has been lacking is a good way to verify file system
> > > > integrity after a power fail.  This is a core part of what file
> > > > systems are supposed to provide but it is probably the least tested
> > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > correctness, but these tests do not catch the random horrible things
> > > > that can go wrong.  We are still finding horrible scary things that
> > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > for.
> > > > 
> > > > I have been working on an idea to do this better, some may have seen
> > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > lot changing in this area in the time between now and March but it
> > > > would be good to have everybody in the room talking about what they
> > > > would need to build a good and deterministic test to make sure we're
> > > > always giving a consistent file system and to make sure our fsync()
> > > > handling is working properly.  Thanks,
> > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > material for discussion there, unless we have something more tangible -
> > > when we have some implementation, we can talk about pros and cons of it,
> > > what still needs doing etc.
> > > 
> > 
> > Right that's what I was getting at.  I have a solution and have sent it around
> > but there doesn't seem to be too many people interested in commenting on it.
> > I figure one of two things will happen
> > 
> > 1) My solution will go in before LSF, in which case YAY my job is done and
> > this is more of an [ATTEND] than a [TOPIC], or
> > 
> > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > how we can integrate it into xfstests, future features, other areas we could
> > test etc.
> > 
> > Maybe not a full blown slot but combined with a overall testing slot or hell
> > just a quick lightening talk.  Thanks,
> 
> I have a related topic that may make sense to fit into any discussion 
> about this. Twice recently we've run into trouble using newish or less 
> common (combinations of) syscalls.
> 
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.
> 
> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)
> 
> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests.  For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.
  So I think we are getting better at providing testcases for new APIs than
we used to be.  I also think fs maintainers are aware of the need to create
xfstests tests if there is any new API introduced. So I don't think we can
do much more than write more tests :)

As Josef and you correctly wrote, powerfail testing is one area where we
are rather poor. Another area which comes to my mind is testing under
memory pressure (which is doable using error injection framework, I just
don't think anybody has put the necessary effort into actually running
that).

So probably we can speak about areas that need improving and what needs
doing there but we also need people to actually do the work...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 18:34     ` Sage Weil
  2015-01-05 19:02       ` Brian Foster
  2015-01-05 21:17       ` Jan Kara
@ 2015-01-05 21:47       ` Dave Chinner
  2015-01-05 22:26         ` Sage Weil
  2015-01-06  8:53         ` Jan Kara
  2 siblings, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2015-01-05 21:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.

Corruption after power loss is not brilliant behaviour from XFS
here, but I'll point out for the wider audience that
sync_file_range() provides absolutely no data integrity guarantees
for power loss situations. It's really, really badly named because
it doesn't give the same guarantees as other "sync" functions
filesystems provide. IOWs, if you value your data, the only
interface you can rely for data integrity is fsync/fdatasync...

> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)

Again, this is probably more a misunderstanding of FIEMAP than
anything. FIEMAP is *advisory* and gives no output accuracy
guarantees as userspace cannot prevent the extent maps from changing
at any time. As an example, see the aborted attempt by the 'cp'
utility to use FIEMAP to detect holes when copying sparse files....

> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests. 

Oh, it does - that's why it mostly works now across all filesystems
that are regularly tested with xfstests.

> For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.

Write tests for the regression test suite that filesystem developers
run all the time. ;)

> As things stand now the other devs are loathe to touch any remotely exotic 
> fs call, but that hardly seems ideal.  Hopefully a common framework for 
> powerfail testing can improve on this.  Perhaps there are other ways we 
> make it easier to tell what is (well) tested, and conversely ensure that 
> those tests are well-aligned with what real users are doing...

We don't actually need power failure (or even device failure)
infrastructure to test data integrity on failure. Filesystems just
need a shutdown method that stops any IO from being issued once the
shutdown flag is set. XFS has this and it's used by xfstests via the
"godown" utility to shut the fileystem down in various
circumstances. We've been using this for data integrity and log
recovery testing in xfstests for many years.

Hence we know if the device behaves correctly w.r.t cache flushes
and FUA then the filesystem will behave correctly on power loss. We
don't need a device power fail simulator to tell us violating
fundamental architectural assumptions will corrupt filesystems....

Unfortunately, nobody else seems to want to implement shutdown
traps, even though it massively improves reliability as it
results in extensive error path testing that can't otherwise be
easily exercised...

So, from an xfstests persepctive, I'd much prefer to see
XFS_IOC_GOINGDOWN implemented by other filesystems and have the
tests that use it made generic first. Filesystems need to handle
themselves correctly in simple error conditions before we even start
to consider creating esoteric failure conditions...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 21:47       ` Dave Chinner
@ 2015-01-05 22:26         ` Sage Weil
  2015-01-05 23:27           ` Dave Chinner
  2015-01-06  8:53         ` Jan Kara
  1 sibling, 1 reply; 18+ messages in thread
From: Sage Weil @ 2015-01-05 22:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Tue, 6 Jan 2015, Dave Chinner wrote:
> On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> > On Wed, 10 Dec 2014, Josef Bacik wrote:
> > > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > The first instance was with the use of sync_file_range to try to 
> > control/limit the amount of dirty data in the page cache.  This, possibly 
> > in combination with posix_fadvise(DONTNEED), managed to break the 
> > writeback sequence in XFS and led to data corruption after power loss.
> 
> Corruption after power loss is not brilliant behaviour from XFS
> here, but I'll point out for the wider audience that
> sync_file_range() provides absolutely no data integrity guarantees
> for power loss situations. It's really, really badly named because
> it doesn't give the same guarantees as other "sync" functions
> filesystems provide. IOWs, if you value your data, the only
> interface you can rely for data integrity is fsync/fdatasync...

Agreed.  In our case, we used syncfs(2) for data integrity.  
sync_file_range(2) was used only to limit dirty data in the page cache.

> > The other issue we saw was just a general raft of FIEMAP bugs over the 
> > last year or two. We saw cases where even after fsync a fiemap result 
> > would not include all extents, and (not unexpectedly) lots of corner cases 
> > in several file systems, e.g., around partial blocks at end of file.  (As 
> > far as I know everything we saw is resolved in current kernels.)
> 
> Again, this is probably more a misunderstanding of FIEMAP than
> anything. FIEMAP is *advisory* and gives no output accuracy
> guarantees as userspace cannot prevent the extent maps from changing
> at any time. As an example, see the aborted attempt by the 'cp'
> utility to use FIEMAP to detect holes when copying sparse files....

Where did the cp vs FIEMAP discussion play out?  I missed that one.

We only use fiemap to determine which file regions are holes, only after 
fsync, and only when there are no other processes or threads accessing the 
same file (and only when explicitly enabled by the admin since many users 
still have buggy implementations deployed).  Under those circumstances I 
thought it should be reliable...

In retrospect the SEEK_HOLE/SEEK_DATA interface is simpler and better 
suited, but I'm hesitant to fall into the same trap.

> > I'm not so concerned with these specific bugs, but worried that we 
> > (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> > this is a general case where a newish syscall/ioctl should be tested 
> > carefully with our workloads before being relied upon, and we could have 
> > worked to make sure e.g. xfstests has appropriate tests. 
> 
> Oh, it does - that's why it mostly works now across all filesystems
> that are regularly tested with xfstests.
> 
> > For power fail 
> > testing in particular, though, right now it isn't clear who is testing 
> > what under what workloads, so the only really "safe" approach is to stick 
> > to whatever syscall combinations we think the rest of the world is using, 
> > or make sure we test ourselves.
> 
> Write tests for the regression test suite that filesystem developers
> run all the time. ;)

Yes (and I assume that you specifically mean xfstests here).

I hope we can get some consensus on what that testing approach will be for 
power failure.  I don't much care whether it's an ioctl each fs implements 
or a dm layer that does about the same thing; I see advantages to both 
approaches.  As long as there is some convergence...

sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 22:26         ` Sage Weil
@ 2015-01-05 23:27           ` Dave Chinner
  2015-01-06 17:37             ` Sage Weil
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2015-01-05 23:27 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Mon, Jan 05, 2015 at 02:26:30PM -0800, Sage Weil wrote:
> On Tue, 6 Jan 2015, Dave Chinner wrote:
> > Again, this is probably more a misunderstanding of FIEMAP than
> > anything. FIEMAP is *advisory* and gives no output accuracy
> > guarantees as userspace cannot prevent the extent maps from changing
> > at any time. As an example, see the aborted attempt by the 'cp'
> > utility to use FIEMAP to detect holes when copying sparse files....
> 
> Where did the cp vs FIEMAP discussion play out?  I missed that one.

Oh, there were several issues - different filesystems exposed
different issues, but the main one is that extent maps don't reflect
newly written cached data that do not have extents allocated for
them, hence the nedd for SEEK_DATA/SEEK_HOLE for optimal sparse file
traversal:

http://lwn.net/Articles/429345/
http://lwn.net/Articles/440255/

Not to mention race conditions between extent walking and background
writeback started to noticed:

http://lists.openwall.net/linux-ext4/2012/11/13/8

But then there were also corruption bugs in the cp FIEMAP code as
well:

http://gnu-coreutils.7620.n7.nabble.com/bug-12656-cp-since-8-11-corrupts-files-td20710.html

> We only use fiemap to determine which file regions are holes, only after 
> fsync, and only when there are no other processes or threads accessing the 
> same file (and only when explicitly enabled by the admin since many users 
> still have buggy implementations deployed).  Under those circumstances I 
> thought it should be reliable...

And when the filesystem does background defragmentation or block
trimming or some other re-organisation of recently accessed files?

> In retrospect the SEEK_HOLE/SEEK_DATA interface is simpler and better 
> suited, but I'm hesitant to fall into the same trap.

SEEK_HOLE/DATA is independent of the underlying file layout, hence
it's behaviour is not affected by filesystem changing the extent
layout of the file in a manner that userspace is not aware of and
cannot control.

> > Write tests for the regression test suite that filesystem developers
> > run all the time. ;)
> 
> Yes (and I assume that you specifically mean xfstests here).

*nod*

> I hope we can get some consensus on what that testing approach
> will be for power failure.  I don't much care whether it's an
> ioctl each fs implements or a dm layer that does about the same
> thing; I see advantages to both approaches.  As long as there is
> some convergence...

Yes, I see advantages to both, too, but there's no point creating
esoteric device error conditions if the filesystem can't correctly
handle and recover from simple shutdown situations....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 21:47       ` Dave Chinner
  2015-01-05 22:26         ` Sage Weil
@ 2015-01-06  8:53         ` Jan Kara
  2015-01-06 16:39           ` Josef Bacik
  2015-01-06 22:07           ` Dave Chinner
  1 sibling, 2 replies; 18+ messages in thread
From: Jan Kara @ 2015-01-06  8:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Sage Weil, Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Tue 06-01-15 08:47:55, Dave Chinner wrote:
> > As things stand now the other devs are loathe to touch any remotely exotic 
> > fs call, but that hardly seems ideal.  Hopefully a common framework for 
> > powerfail testing can improve on this.  Perhaps there are other ways we 
> > make it easier to tell what is (well) tested, and conversely ensure that 
> > those tests are well-aligned with what real users are doing...
> 
> We don't actually need power failure (or even device failure)
> infrastructure to test data integrity on failure. Filesystems just
> need a shutdown method that stops any IO from being issued once the
> shutdown flag is set. XFS has this and it's used by xfstests via the
> "godown" utility to shut the fileystem down in various
> circumstances. We've been using this for data integrity and log
> recovery testing in xfstests for many years.
> 
> Hence we know if the device behaves correctly w.r.t cache flushes
> and FUA then the filesystem will behave correctly on power loss. We
> don't need a device power fail simulator to tell us violating
> fundamental architectural assumptions will corrupt filesystems....
  I think that fs ioctl cannot easily simulate the situation where
on-device volatile caches aren't properly flushed in all the necessary
cases (we had a bugs like this in ext3/4 in the past which were hit by real
users).

I also think that simulating the device failure in a different layer is
simpler than checking for superblock flag in all the places where the
filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
xfs has and we rely on flusher thread to flush committed metadata to final
location on disk so that writeback path completely avoids ext4 code - it's
a generic writeback of the block device mapping). So I like the solution
with the dm target more than a fs ioctl although I agree that it's more
clumsy from the xfstests perspective.
 
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-06  8:53         ` Jan Kara
@ 2015-01-06 16:39           ` Josef Bacik
  2015-01-06 22:07           ` Dave Chinner
  1 sibling, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2015-01-06 16:39 UTC (permalink / raw)
  To: Jan Kara, Dave Chinner; +Cc: Sage Weil, lsf-pc, linux-fsdevel

On 01/06/2015 03:53 AM, Jan Kara wrote:
> On Tue 06-01-15 08:47:55, Dave Chinner wrote:
>>> As things stand now the other devs are loathe to touch any remotely exotic
>>> fs call, but that hardly seems ideal.  Hopefully a common framework for
>>> powerfail testing can improve on this.  Perhaps there are other ways we
>>> make it easier to tell what is (well) tested, and conversely ensure that
>>> those tests are well-aligned with what real users are doing...
>>
>> We don't actually need power failure (or even device failure)
>> infrastructure to test data integrity on failure. Filesystems just
>> need a shutdown method that stops any IO from being issued once the
>> shutdown flag is set. XFS has this and it's used by xfstests via the
>> "godown" utility to shut the fileystem down in various
>> circumstances. We've been using this for data integrity and log
>> recovery testing in xfstests for many years.
>>
>> Hence we know if the device behaves correctly w.r.t cache flushes
>> and FUA then the filesystem will behave correctly on power loss. We
>> don't need a device power fail simulator to tell us violating
>> fundamental architectural assumptions will corrupt filesystems....
>    I think that fs ioctl cannot easily simulate the situation where
> on-device volatile caches aren't properly flushed in all the necessary
> cases (we had a bugs like this in ext3/4 in the past which were hit by real
> users).
>

Agreed, my dm thing was meant to expose problems where we do not wait on 
IO properly before writing our super, a problem we've had at least twice 
so far.  I wanted something that was nice and simple and would quickly 
expose these kind of bugs.

> I also think that simulating the device failure in a different layer is
> simpler than checking for superblock flag in all the places where the
> filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
> xfs has and we rely on flusher thread to flush committed metadata to final
> location on disk so that writeback path completely avoids ext4 code - it's
> a generic writeback of the block device mapping). So I like the solution
> with the dm target more than a fs ioctl although I agree that it's more
> clumsy from the xfstests perspective.
>

So I'm working in support to xfstests fsx to emit the proper dm messages 
when it does an fsync so we can easily build a test to stress test fsync 
in all the horrible ways that fsx works.  Building tests around the dm 
target I've written is pretty simple, you just do something like

create device
mkfs device
mark the mkfs in the log
mount device
do your operations
unmount
replay log in whichever way you want and verify the contents

The replay thing is accomplished by the library and some helper 
functions in xfstests, so it's no more awkward than what we do with dm 
flakey, and gives us a bit more reproduce-ability and lets us check more 
esoteric failure conditions.

Like Jan says, we all do things differently, we are all our own little 
snowflakes, I feel like a dm target is a nice solution where we can 
impose a certain set of rules in very little code and all agree that 
it's correct, and then build tests around that.  Then our current fs'es 
will be well tested and any new fs'es will be equally well tested, all 
without having to add fs specific code that could be buggy.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-05 23:27           ` Dave Chinner
@ 2015-01-06 17:37             ` Sage Weil
  0 siblings, 0 replies; 18+ messages in thread
From: Sage Weil @ 2015-01-06 17:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, Jan Kara, lsf-pc, linux-fsdevel

On Tue, 6 Jan 2015, Dave Chinner wrote:
> On Mon, Jan 05, 2015 at 02:26:30PM -0800, Sage Weil wrote:
> > On Tue, 6 Jan 2015, Dave Chinner wrote:
> > > Again, this is probably more a misunderstanding of FIEMAP than
> > > anything. FIEMAP is *advisory* and gives no output accuracy
> > > guarantees as userspace cannot prevent the extent maps from changing
> > > at any time. As an example, see the aborted attempt by the 'cp'
> > > utility to use FIEMAP to detect holes when copying sparse files....
> > 
> > Where did the cp vs FIEMAP discussion play out?  I missed that one.
> 
> Oh, there were several issues - different filesystems exposed
> different issues, but the main one is that extent maps don't reflect
> newly written cached data that do not have extents allocated for
> them, hence the nedd for SEEK_DATA/SEEK_HOLE for optimal sparse file
> traversal:
> 
> http://lwn.net/Articles/429345/
> http://lwn.net/Articles/440255/
> 
> Not to mention race conditions between extent walking and background
> writeback started to noticed:
> 
> http://lists.openwall.net/linux-ext4/2012/11/13/8
> 
> But then there were also corruption bugs in the cp FIEMAP code as
> well:
> 
> http://gnu-coreutils.7620.n7.nabble.com/bug-12656-cp-since-8-11-corrupts-files-td20710.html

Sigh, I didn't look far enough back it seems.

> > We only use fiemap to determine which file regions are holes, only after 
> > fsync, and only when there are no other processes or threads accessing the 
> > same file (and only when explicitly enabled by the admin since many users 
> > still have buggy implementations deployed).  Under those circumstances I 
> > thought it should be reliable...
> 
> And when the filesystem does background defragmentation or block
> trimming or some other re-organisation of recently accessed files?

I wouldn't expect any of those things to change whether the file system 
reports a file extent as allocated or a hole, but now that you mention it 
and given what we've seen so far that's probably not the safest bet to 
make.  In any case, SEEK_DATA/HOLE is clearly a more appropriate interface 
and appears to be well supported.  We'll switch to that and probably leave 
it off by default again until we've confirmed there are tests in xfstests 
that match what ceph is doing.  Thanks, Dave!

In any case, to the original point about converging on power fail testing 
approaches, I'd say it's worth a time slot at LSF.  :)

sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-06  8:53         ` Jan Kara
  2015-01-06 16:39           ` Josef Bacik
@ 2015-01-06 22:07           ` Dave Chinner
  2015-01-07 10:10             ` Jan Kara
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2015-01-06 22:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sage Weil, Josef Bacik, lsf-pc, linux-fsdevel

On Tue, Jan 06, 2015 at 09:53:47AM +0100, Jan Kara wrote:
> On Tue 06-01-15 08:47:55, Dave Chinner wrote:
> > > As things stand now the other devs are loathe to touch any remotely exotic 
> > > fs call, but that hardly seems ideal.  Hopefully a common framework for 
> > > powerfail testing can improve on this.  Perhaps there are other ways we 
> > > make it easier to tell what is (well) tested, and conversely ensure that 
> > > those tests are well-aligned with what real users are doing...
> > 
> > We don't actually need power failure (or even device failure)
> > infrastructure to test data integrity on failure. Filesystems just
> > need a shutdown method that stops any IO from being issued once the
> > shutdown flag is set. XFS has this and it's used by xfstests via the
> > "godown" utility to shut the fileystem down in various
> > circumstances. We've been using this for data integrity and log
> > recovery testing in xfstests for many years.
> > 
> > Hence we know if the device behaves correctly w.r.t cache flushes
> > and FUA then the filesystem will behave correctly on power loss. We
> > don't need a device power fail simulator to tell us violating
> > fundamental architectural assumptions will corrupt filesystems....
>   I think that fs ioctl cannot easily simulate the situation where
> on-device volatile caches aren't properly flushed in all the necessary
> cases (we had a bugs like this in ext3/4 in the past which were hit by real
> users).

Sure, I'm not arguing that it does. I'm suggesting that it's the
wrong place to be focussing effort on initially as it assumes the
filesystem behaves correctly on simple device failures.  i.e. if
filesystems fail to do the right thing on a block device that isn't
lossy, then we've got big problems to solve before we even consider
random "volatile cache blocks went missing" corruption and recovery
issues.

i.e. what we need to focus on first is "failure paths are exercised
and work reliably". When we have decent coverage of that for most
filesystems (and we sure as hell don't for btrfs and ext4), then we
can focus on "in this corner case of broken/lying hardware..."

> I also think that simulating the device failure in a different layer is
> simpler than checking for superblock flag in all the places where the
> filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
> xfs has and we rely on flusher thread to flush committed metadata to final

flusher threads call back into the filesystems to write both data
and metadata, so I don't think that's an issue. And there's
realtively few places you'd need to add a flag support to (ie.
wrappers around submit_bh and submit_bio in the relavent layers)
and that would trap all IO.

Don't get fooled by the fact that XFS has lots of shutdown traps;
there really are only three shutdown traps that prevent IO - one in
xfs_buf_submit() for metadata IO, one in xfs_map_blocks() during
->writepage for data IO, and one in xlog_bdstrat() for log IO.

All the other shutdown traps are for aborting operations that may
not reach the IO layer (as many operations will hit cached objects)
or will fail later when the inevitable IO is done (e.g. on
transaction commit). Hence shutdown traps get us fast, reliable
responses to userspace when fatal corruption errors occur, and in
doing so they also provide hooks for testing error paths in ways
that otherwise are very difficult to exercise.

This is my point - shutdown traps are far more useful for *verifying
correct filesystem behaviour in error situations* than something
that just returns errors or corrupts blocks at the IO layer. If we
really want to test behaviour with corrupt random disk blocks,
fsfuzzer already exists ;)

> location on disk so that writeback path completely avoids ext4 code - it's
> a generic writeback of the block device mapping).  So I like the solution
> with the dm target more than a fs ioctl although I agree that it's more
> clumsy from the xfstests perspective.

Wrong perspective. I'm looking at this from a filesystem layer
validation perspective, not a xfstests perspective.  The fs ioctl is
far more useful for exercising and validation filesystem behaviour
in error conditions than a dm-device that targets a rare device
failure issue.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-06 22:07           ` Dave Chinner
@ 2015-01-07 10:10             ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2015-01-07 10:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, Sage Weil, Josef Bacik, lsf-pc, linux-fsdevel

On Wed 07-01-15 09:07:06, Dave Chinner wrote:
> On Tue, Jan 06, 2015 at 09:53:47AM +0100, Jan Kara wrote:
> > On Tue 06-01-15 08:47:55, Dave Chinner wrote:
> > > > As things stand now the other devs are loathe to touch any remotely exotic 
> > > > fs call, but that hardly seems ideal.  Hopefully a common framework for 
> > > > powerfail testing can improve on this.  Perhaps there are other ways we 
> > > > make it easier to tell what is (well) tested, and conversely ensure that 
> > > > those tests are well-aligned with what real users are doing...
> > > 
> > > We don't actually need power failure (or even device failure)
> > > infrastructure to test data integrity on failure. Filesystems just
> > > need a shutdown method that stops any IO from being issued once the
> > > shutdown flag is set. XFS has this and it's used by xfstests via the
> > > "godown" utility to shut the fileystem down in various
> > > circumstances. We've been using this for data integrity and log
> > > recovery testing in xfstests for many years.
> > > 
> > > Hence we know if the device behaves correctly w.r.t cache flushes
> > > and FUA then the filesystem will behave correctly on power loss. We
> > > don't need a device power fail simulator to tell us violating
> > > fundamental architectural assumptions will corrupt filesystems....
> >   I think that fs ioctl cannot easily simulate the situation where
> > on-device volatile caches aren't properly flushed in all the necessary
> > cases (we had a bugs like this in ext3/4 in the past which were hit by real
> > users).
> 
> Sure, I'm not arguing that it does. I'm suggesting that it's the
> wrong place to be focussing effort on initially as it assumes the
> filesystem behaves correctly on simple device failures.  i.e. if
> filesystems fail to do the right thing on a block device that isn't
> lossy, then we've got big problems to solve before we even consider
> random "volatile cache blocks went missing" corruption and recovery
> issues.
> 
> i.e. what we need to focus on first is "failure paths are exercised
> and work reliably". When we have decent coverage of that for most
> filesystems (and we sure as hell don't for btrfs and ext4), then we
> can focus on "in this corner case of broken/lying hardware..."
> 
> > I also think that simulating the device failure in a different layer is
> > simpler than checking for superblock flag in all the places where the
> > filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
> > xfs has and we rely on flusher thread to flush committed metadata to final
> 
> flusher threads call back into the filesystems to write both data
> and metadata, so I don't think that's an issue. And there's
> realtively few places you'd need to add a flag support to (ie.
> wrappers around submit_bh and submit_bio in the relavent layers)
> and that would trap all IO.
  Well, they don't for ext4. Ext4 metadata is backed by block device
mapping. That mapping is written back using generic_writepages() which ends
up calling blkdev_writepage(), which just calls block_write_full_page()
with blkdev_get_block() handler. The bad thing is that at that point, we
don't have the context to decide which filesystem that writeback is coming
from since the only inode we have is the block device inode belonging to
block device superblock. So I don't see an easy way how to solve this
problem for ext4.

> Don't get fooled by the fact that XFS has lots of shutdown traps;
> there really are only three shutdown traps that prevent IO - one in
> xfs_buf_submit() for metadata IO, one in xfs_map_blocks() during
> ->writepage for data IO, and one in xlog_bdstrat() for log IO.
> 
> All the other shutdown traps are for aborting operations that may
> not reach the IO layer (as many operations will hit cached objects)
> or will fail later when the inevitable IO is done (e.g. on
> transaction commit). Hence shutdown traps get us fast, reliable
> responses to userspace when fatal corruption errors occur, and in
> doing so they also provide hooks for testing error paths in ways
> that otherwise are very difficult to exercise.
  Ext4 detects whether fs is shutdown in some cases as well and bails out
early - by checking whether the journal is aborted (is_journal_aborted()
checks). So it for example doesn't start any new transaction when the fs is
shutdown. It is easy to add an ext4 ioctl() which will abort the journal
and that will test the error paths we have. It's just that it will be a
very different test from a situation when the device goes away, power
fails, or similar cases. For verifying those cases having a target which
just starts returning EIO for any submitted IO is much easier for ext4.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Working towards better power fail testing
  2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
  2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
@ 2015-01-13 17:05 ` Dmitry Monakhov
  2015-01-13 17:17   ` Josef Bacik
  1 sibling, 1 reply; 18+ messages in thread
From: Dmitry Monakhov @ 2015-01-13 17:05 UTC (permalink / raw)
  To: Josef Bacik, lsf-pc; +Cc: linux-fsdevel

Josef Bacik <jbacik@fb.com> writes:

> Hello,
>
> We have been doing pretty well at populating xfstests with loads of 
> tests to catch regressions and validate we're all working properly.  One 
> thing that has been lacking is a good way to verify file system 
> integrity after a power fail.  This is a core part of what file systems 
> are supposed to provide but it is probably the least tested aspect.  We 
> have dm-flakey tests in xfstests to test fsync correctness, but these 
> tests do not catch the random horrible things that can go wrong.  We are 
> still finding horrible scary things that go wrong in Btrfs because it is 
> simply hard to reproduce and test for.
>
> I have been working on an idea to do this better, some may have seen my 
> dm-power-fail attempt, and I've got a new incarnation of the idea thanks 
> to discussions with Zach Brown.  Obviously there will be a lot changing 
> in this area in the time between now and March but it would be good to 
> have everybody in the room talking about what they would need to build a 
> good and deterministic test to make sure we're always giving a 
> consistent file system and to make sure our fsync() handling is working 
> properly.  Thanks,
I've submitted generic/019 long time ago. Test is fine and helps to
uncover several bugs, But it is not ideal because currently power failure
simulation (via fail_make_request) is not not completely atomic
So I would like to attend to discussion how we can implement power
failure simulation completely atomic.

BTW I also would like to share hw-flush utility (which our QA team use for
use power-fail/SSD-cache testing) and harness for it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LSF/MM TOPIC] Working towards better power fail testing
  2015-01-13 17:05 ` Dmitry Monakhov
@ 2015-01-13 17:17   ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2015-01-13 17:17 UTC (permalink / raw)
  To: Dmitry Monakhov, lsf-pc; +Cc: linux-fsdevel

On 01/13/2015 12:05 PM, Dmitry Monakhov wrote:
> Josef Bacik <jbacik@fb.com> writes:
>
>> Hello,
>>
>> We have been doing pretty well at populating xfstests with loads of
>> tests to catch regressions and validate we're all working properly.  One
>> thing that has been lacking is a good way to verify file system
>> integrity after a power fail.  This is a core part of what file systems
>> are supposed to provide but it is probably the least tested aspect.  We
>> have dm-flakey tests in xfstests to test fsync correctness, but these
>> tests do not catch the random horrible things that can go wrong.  We are
>> still finding horrible scary things that go wrong in Btrfs because it is
>> simply hard to reproduce and test for.
>>
>> I have been working on an idea to do this better, some may have seen my
>> dm-power-fail attempt, and I've got a new incarnation of the idea thanks
>> to discussions with Zach Brown.  Obviously there will be a lot changing
>> in this area in the time between now and March but it would be good to
>> have everybody in the room talking about what they would need to build a
>> good and deterministic test to make sure we're always giving a
>> consistent file system and to make sure our fsync() handling is working
>> properly.  Thanks,
> I've submitted generic/019 long time ago. Test is fine and helps to
> uncover several bugs, But it is not ideal because currently power failure
> simulation (via fail_make_request) is not not completely atomic
> So I would like to attend to discussion how we can implement power
> failure simulation completely atomic.
>

Yeah I did the first dm-flakey tests and extended that some.  These are 
good baselines but I've hit a few bugs recently in btrfs that would have 
required us to crash at exactly the right spot to hit which is what I 
want to try and build for.  Something we can run through all the 
possible crash scenarios to make sure we're always leaving a consistent fs.

> BTW I also would like to share hw-flush utility (which our QA team use for
> use power-fail/SSD-cache testing) and harness for it.
>

That would be super cool, the more testing we can have around making 
sure we're waiting for stuff properly and flushing caches properly the 
better.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-01-13 17:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09   ` Josef Bacik
2015-01-05 18:34     ` Sage Weil
2015-01-05 19:02       ` Brian Foster
2015-01-05 19:13         ` Sage Weil
2015-01-05 19:33           ` Brian Foster
2015-01-05 21:17       ` Jan Kara
2015-01-05 21:47       ` Dave Chinner
2015-01-05 22:26         ` Sage Weil
2015-01-05 23:27           ` Dave Chinner
2015-01-06 17:37             ` Sage Weil
2015-01-06  8:53         ` Jan Kara
2015-01-06 16:39           ` Josef Bacik
2015-01-06 22:07           ` Dave Chinner
2015-01-07 10:10             ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.