All of lore.kernel.org
 help / color / mirror / Atom feed
* Simulating disk failure with a writeback cache
@ 2016-12-15  2:26 Kent Overstreet
  2016-12-15  4:15 ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-15  2:26 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: david, hch

As many tests as there are in xfstests that simulate disk failure/powerloss, I'm
having a hard time believing that no one's bothered to write code to simulate a
writeback cache (so the test can drop the cached writes and test flush/fua
correctness).

So does anyone know if such code exists and I just missed it?

Or failing that, any suggestions on the easiest way to hack something up? This
is turning into a really irritating problem because it'd be simple enough to
write from scratch, but given the amount of code we have that does stuff like
this writing it from scratch seems rather silly - hacking loop to do buffered IO
instead of O_DIRECT would almost do it, I'd think, except I'm looking at loop.c
and just trying to follow the entry points and control flow is making my blood
pressure rise.

Ideally we'd have something that could easily slot into xfstests, which is using
dm-flakey for these tests right now...

Any ideas?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-15  2:26 Simulating disk failure with a writeback cache Kent Overstreet
@ 2016-12-15  4:15 ` Josef Bacik
  2016-12-18 20:12   ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-15  4:15 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch

dm-log-writes is probably what you want.  Thanks,

Josef

Sent from my iPhone

> On Dec 14, 2016, at 9:27 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote:
> 
> As many tests as there are in xfstests that simulate disk failure/powerloss, I'm
> having a hard time believing that no one's bothered to write code to simulate a
> writeback cache (so the test can drop the cached writes and test flush/fua
> correctness).
> 
> So does anyone know if such code exists and I just missed it?
> 
> Or failing that, any suggestions on the easiest way to hack something up? This
> is turning into a really irritating problem because it'd be simple enough to
> write from scratch, but given the amount of code we have that does stuff like
> this writing it from scratch seems rather silly - hacking loop to do buffered IO
> instead of O_DIRECT would almost do it, I'd think, except I'm looking at loop.c
> and just trying to follow the entry points and control flow is making my blood
> pressure rise.
> 
> Ideally we'd have something that could easily slot into xfstests, which is using
> dm-flakey for these tests right now...
> 
> Any ideas?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-15  4:15 ` Josef Bacik
@ 2016-12-18 20:12   ` Kent Overstreet
  2016-12-18 20:38     ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-18 20:12 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote:
> dm-log-writes is probably what you want.  Thanks,

Oh, that actually looks pretty cool.

Don't suppose anyone is working on making use of it in xfstests?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-18 20:12   ` Kent Overstreet
@ 2016-12-18 20:38     ` Josef Bacik
  2016-12-18 20:46       ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-18 20:38 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch

On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet 
<kent.overstreet@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote:
>>  dm-log-writes is probably what you want.  Thanks,
> 
> Oh, that actually looks pretty cool.
> 
> Don't suppose anyone is working on making use of it in xfstests?

Actually I had two xfstests, one that used fsstress and just made sure 
every commit point was valid (every FUA/FLUSH it found in the log) and 
then one that modified fsx to output a known good image every time it 
ran fsync and mark the log to make sure fsync did the correct thing.  I 
need to go back and clean them up and get the upstream, but I've been 
pretty heavily distracted with other things for the last year or two.  
I'll make getting those upstream a priority after Christmas.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-18 20:38     ` Josef Bacik
@ 2016-12-18 20:46       ` Kent Overstreet
  2016-12-18 21:19         ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-18 20:46 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Sun, Dec 18, 2016 at 03:38:54PM -0500, Josef Bacik wrote:
> On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet <kent.overstreet@gmail.com>
> wrote:
> > On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote:
> > >  dm-log-writes is probably what you want.  Thanks,
> > 
> > Oh, that actually looks pretty cool.
> > 
> > Don't suppose anyone is working on making use of it in xfstests?
> 
> Actually I had two xfstests, one that used fsstress and just made sure every
> commit point was valid (every FUA/FLUSH it found in the log) and then one
> that modified fsx to output a known good image every time it ran fsync and
> mark the log to make sure fsync did the correct thing.  I need to go back
> and clean them up and get the upstream, but I've been pretty heavily
> distracted with other things for the last year or two.  I'll make getting
> those upstream a priority after Christmas.  Thanks,

Don't suppose you could throw up just what you've currently got somewhere? It'd
save me some work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-18 20:46       ` Kent Overstreet
@ 2016-12-18 21:19         ` Josef Bacik
  2016-12-19  2:51           ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-18 21:19 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch

Eeesh these are older than I thought, I pushed to https://github.com/josefbacik/fstests.git.  The fsx work is there and is generic, the fsstress one has some btrfs specific stuff but you can just pull that crap out and it'll work on anything.  Let me know if you need anything else, thanks,

Josef

Sent from my iPhone

> On Dec 18, 2016, at 3:46 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote:
> 
>> On Sun, Dec 18, 2016 at 03:38:54PM -0500, Josef Bacik wrote:
>> On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet <kent.overstreet@gmail.com>
>> wrote:
>>> On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote:
>>>> dm-log-writes is probably what you want.  Thanks,
>>> 
>>> Oh, that actually looks pretty cool.
>>> 
>>> Don't suppose anyone is working on making use of it in xfstests?
>> 
>> Actually I had two xfstests, one that used fsstress and just made sure every
>> commit point was valid (every FUA/FLUSH it found in the log) and then one
>> that modified fsx to output a known good image every time it ran fsync and
>> mark the log to make sure fsync did the correct thing.  I need to go back
>> and clean them up and get the upstream, but I've been pretty heavily
>> distracted with other things for the last year or two.  I'll make getting
>> those upstream a priority after Christmas.  Thanks,
> 
> Don't suppose you could throw up just what you've currently got somewhere? It'd
> save me some work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-18 21:19         ` Josef Bacik
@ 2016-12-19  2:51           ` Kent Overstreet
  2016-12-19  3:07             ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-19  2:51 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote:
> Eeesh these are older than I thought, I pushed to
> https://github.com/josefbacik/fstests.git.  The fsx work is there and is
> generic, the fsstress one has some btrfs specific stuff but you can just pull
> that crap out and it'll work on anything.  Let me know if you need anything
> else, thanks,

Thanks

I got your first fsx based test up and running, but - did you get it to pass
with any existing filesystems?

I finally figured out what I'm seeing, in _check_files() where it's looping over
the mark: I changed it to so that it always checks the fsync marks in the order
where it was created, but when it goes to check the very first mark it's getting
the very last version of the file (at least, the file size is consistent with
that).

I just pushed what I'm working off of, but I don't think it's anything I
broke...

https://evilpiepirate.org/git/xfstests.git

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19  2:51           ` Kent Overstreet
@ 2016-12-19  3:07             ` Kent Overstreet
  2016-12-19 12:58               ` Josef Bacik
  2016-12-19 15:27               ` Josef Bacik
  0 siblings, 2 replies; 15+ messages in thread
From: Kent Overstreet @ 2016-12-19  3:07 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote:
> On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote:
> > Eeesh these are older than I thought, I pushed to
> > https://github.com/josefbacik/fstests.git.  The fsx work is there and is
> > generic, the fsstress one has some btrfs specific stuff but you can just pull
> > that crap out and it'll work on anything.  Let me know if you need anything
> > else, thanks,
> 
> Thanks
> 
> I got your first fsx based test up and running, but - did you get it to pass
> with any existing filesystems?
> 
> I finally figured out what I'm seeing, in _check_files() where it's looping over
> the mark: I changed it to so that it always checks the fsync marks in the order
> where it was created, but when it goes to check the very first mark it's getting
> the very last version of the file (at least, the file size is consistent with
> that).
> 
> I just pushed what I'm working off of, but I don't think it's anything I
> broke...
> 
> https://evilpiepirate.org/git/xfstests.git

Just figured it out - log replay doesn't touch anything that hadn't been written
at that point in the log, so the newer journal entries were still there. fun...

If I blow away the journal before replay, it works.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19  3:07             ` Kent Overstreet
@ 2016-12-19 12:58               ` Josef Bacik
  2016-12-19 15:27               ` Josef Bacik
  1 sibling, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2016-12-19 12:58 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch


> On Dec 18, 2016, at 10:07 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote:
> 
>> On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote:
>>> On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote:
>>> Eeesh these are older than I thought, I pushed to
>>> https://github.com/josefbacik/fstests.git.  The fsx work is there and is
>>> generic, the fsstress one has some btrfs specific stuff but you can just pull
>>> that crap out and it'll work on anything.  Let me know if you need anything
>>> else, thanks,
>> 
>> Thanks
>> 
>> I got your first fsx based test up and running, but - did you get it to pass
>> with any existing filesystems?
>> 
>> I finally figured out what I'm seeing, in _check_files() where it's looping over
>> the mark: I changed it to so that it always checks the fsync marks in the order
>> where it was created, but when it goes to check the very first mark it's getting
>> the very last version of the file (at least, the file size is consistent with
>> that).
>> 
>> I just pushed what I'm working off of, but I don't think it's anything I
>> broke...
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__evilpiepirate.org_git_xfstests.git&d=DgIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=E_8Rycs0b-Vo8GLKPW3XOah6HMo1P1ixJKC56-np3BQ&s=xABKM88GhCUwdBnB8_7bzZM76O0U9tnQdK6p40kdXtU&e= 
> 
> Just figured it out - log replay doesn't touch anything that hadn't been written
> at that point in the log, so the newer journal entries were still there. fun...
> 
> If I blow away the journal before replay, it works.

I'm confused, are you talking about the log writes log?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19  3:07             ` Kent Overstreet
  2016-12-19 12:58               ` Josef Bacik
@ 2016-12-19 15:27               ` Josef Bacik
  2016-12-19 20:55                 ` Kent Overstreet
  1 sibling, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-19 15:27 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch

On Sun, Dec 18, 2016 at 10:07 PM, Kent Overstreet 
<kent.overstreet@gmail.com> wrote:
> On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote:
>>  On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote:
>>  > Eeesh these are older than I thought, I pushed to
>>  > https://github.com/josefbacik/fstests.git.  The fsx work is there 
>> and is
>>  > generic, the fsstress one has some btrfs specific stuff but you 
>> can just pull
>>  > that crap out and it'll work on anything.  Let me know if you 
>> need anything
>>  > else, thanks,
>> 
>>  Thanks
>> 
>>  I got your first fsx based test up and running, but - did you get 
>> it to pass
>>  with any existing filesystems?
>> 
>>  I finally figured out what I'm seeing, in _check_files() where it's 
>> looping over
>>  the mark: I changed it to so that it always checks the fsync marks 
>> in the order
>>  where it was created, but when it goes to check the very first mark 
>> it's getting
>>  the very last version of the file (at least, the file size is 
>> consistent with
>>  that).
>> 
>>  I just pushed what I'm working off of, but I don't think it's 
>> anything I
>>  broke...
>> 
>>  
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__evilpiepirate.org_git_xfstests.git&d=DgIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=E_8Rycs0b-Vo8GLKPW3XOah6HMo1P1ixJKC56-np3BQ&s=xABKM88GhCUwdBnB8_7bzZM76O0U9tnQdK6p40kdXtU&e=
> 
> Just figured it out - log replay doesn't touch anything that hadn't 
> been written
> at that point in the log, so the newer journal entries were still 
> there. fun...
> 
> If I blow away the journal before replay, it works.

Oh sorry for some reason I only saw this reply but not your previous 
one about seeing the last version of the file.  Yeah that's kind of 
annoying, I'll make a note to fix the test to do a wipefs before doing 
the log replay.  Does using wipefs work as well as dd'ing the first bit 
of the disk for you?  Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19 15:27               ` Josef Bacik
@ 2016-12-19 20:55                 ` Kent Overstreet
  2016-12-19 21:00                   ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-19 20:55 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Mon, Dec 19, 2016 at 10:27:34AM -0500, Josef Bacik wrote:
> Oh sorry for some reason I only saw this reply but not your previous one
> about seeing the last version of the file.  Yeah that's kind of annoying,
> I'll make a note to fix the test to do a wipefs before doing the log replay.
> Does using wipefs work as well as dd'ing the first bit of the disk for you?
> Thanks,

That worked for the short test, but it won't work in general - the issue is that
in bcache/bcachefs, btree nodes are also log structured, so if we just blow away
the journal we'll still find btree node entries from the future (and it'll
complain loudly on finding btree node entries far in the future of the most
recent journal entry).

For the bug I'm hunting now I'm gonna see if I can reproduce it with a small
enough filesystem that I can just dd over the entire device between replays.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19 20:55                 ` Kent Overstreet
@ 2016-12-19 21:00                   ` Josef Bacik
  2016-12-19 21:53                     ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-19 21:00 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch

On Mon, Dec 19, 2016 at 3:55 PM, Kent Overstreet 
<kent.overstreet@gmail.com> wrote:
> On Mon, Dec 19, 2016 at 10:27:34AM -0500, Josef Bacik wrote:
>>  Oh sorry for some reason I only saw this reply but not your 
>> previous one
>>  about seeing the last version of the file.  Yeah that's kind of 
>> annoying,
>>  I'll make a note to fix the test to do a wipefs before doing the 
>> log replay.
>>  Does using wipefs work as well as dd'ing the first bit of the disk 
>> for you?
>>  Thanks,
> 
> That worked for the short test, but it won't work in general - the 
> issue is that
> in bcache/bcachefs, btree nodes are also log structured, so if we 
> just blow away
> the journal we'll still find btree node entries from the future (and 
> it'll
> complain loudly on finding btree node entries far in the future of 
> the most
> recent journal entry).
> 
> For the bug I'm hunting now I'm gonna see if I can reproduce it with 
> a small
> enough filesystem that I can just dd over the entire device between 
> replays.

Ok that works fine in the short term for you, but ideally you wouldn't 
have to fix the test every time you wanted to run it on bcachefs.  Do 
you have a suggestion of how to make this generic enough to support you 
as well without dd'ing the whole drive?  Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19 21:00                   ` Josef Bacik
@ 2016-12-19 21:53                     ` Kent Overstreet
  2016-12-20  1:01                       ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-12-19 21:53 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Mon, Dec 19, 2016 at 04:00:40PM -0500, Josef Bacik wrote:
> Ok that works fine in the short term for you, but ideally you wouldn't have
> to fix the test every time you wanted to run it on bcachefs.  Do you have a
> suggestion of how to make this generic enough to support you as well without
> dd'ing the whole drive?  Thanks,

Yes, but it'd be a lot of work :)

Change log-writes so that instead of replaying the log, you have it create a
block device representing that specific point in the log - and then service read
requests by looking them up however you like. So, it'd be just like mounting
snapshots.

It might not be too difficult since we can deal with crappy performance or
memory usage here, but I don't know if it's really worth the effort. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-19 21:53                     ` Kent Overstreet
@ 2016-12-20  1:01                       ` Josef Bacik
  2016-12-20  1:30                         ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-12-20  1:01 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-fsdevel, david, hch


> On Dec 19, 2016, at 4:53 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote:
> 
>> On Mon, Dec 19, 2016 at 04:00:40PM -0500, Josef Bacik wrote:
>> Ok that works fine in the short term for you, but ideally you wouldn't have
>> to fix the test every time you wanted to run it on bcachefs.  Do you have a
>> suggestion of how to make this generic enough to support you as well without
>> dd'ing the whole drive?  Thanks,
> 
> Yes, but it'd be a lot of work :)
> 
> Change log-writes so that instead of replaying the log, you have it create a
> block device representing that specific point in the log - and then service read
> requests by looking them up however you like. So, it'd be just like mounting
> snapshots.
> 
> It might not be too difficult since we can deal with crappy performance or
> memory usage here, but I don't know if it's really worth the effort. 

One thing I was definitely going to do is use the dm thin provisioning so I could snapshot between each check so we don't have to replay from the beginning every time we want to check a different mark.  Replaying the whole thing is fine for small runs like the fsx thing, but when you want to say run fsstress for a minute and replay to every fua and fsck and mount it gets tedious fast.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Simulating disk failure with a writeback cache
  2016-12-20  1:01                       ` Josef Bacik
@ 2016-12-20  1:30                         ` Kent Overstreet
  0 siblings, 0 replies; 15+ messages in thread
From: Kent Overstreet @ 2016-12-20  1:30 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, david, hch

On Tue, Dec 20, 2016 at 01:01:29AM +0000, Josef Bacik wrote:
> One thing I was definitely going to do is use the dm thin provisioning so I
> could snapshot between each check so we don't have to replay from the
> beginning every time we want to check a different mark.  Replaying the whole
> thing is fine for small runs like the fsx thing, but when you want to say run
> fsstress for a minute and replay to every fua and fsck and mount it gets
> tedious fast.  Thanks,

I actually just converted my new test to do that:

https://evilpiepirate.org/git/ktest.git/commit/

now, most of the time in the test is spent in mount/unmount (bcachefs doesn't
have a standalone fsck yet, we check everything fsck should but it's done at
mount time).

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-12-20  1:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15  2:26 Simulating disk failure with a writeback cache Kent Overstreet
2016-12-15  4:15 ` Josef Bacik
2016-12-18 20:12   ` Kent Overstreet
2016-12-18 20:38     ` Josef Bacik
2016-12-18 20:46       ` Kent Overstreet
2016-12-18 21:19         ` Josef Bacik
2016-12-19  2:51           ` Kent Overstreet
2016-12-19  3:07             ` Kent Overstreet
2016-12-19 12:58               ` Josef Bacik
2016-12-19 15:27               ` Josef Bacik
2016-12-19 20:55                 ` Kent Overstreet
2016-12-19 21:00                   ` Josef Bacik
2016-12-19 21:53                     ` Kent Overstreet
2016-12-20  1:01                       ` Josef Bacik
2016-12-20  1:30                         ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.