* Simulating disk failure with a writeback cache @ 2016-12-15 2:26 Kent Overstreet 2016-12-15 4:15 ` Josef Bacik 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-15 2:26 UTC (permalink / raw) To: linux-fsdevel; +Cc: david, hch As many tests as there are in xfstests that simulate disk failure/powerloss, I'm having a hard time believing that no one's bothered to write code to simulate a writeback cache (so the test can drop the cached writes and test flush/fua correctness). So does anyone know if such code exists and I just missed it? Or failing that, any suggestions on the easiest way to hack something up? This is turning into a really irritating problem because it'd be simple enough to write from scratch, but given the amount of code we have that does stuff like this writing it from scratch seems rather silly - hacking loop to do buffered IO instead of O_DIRECT would almost do it, I'd think, except I'm looking at loop.c and just trying to follow the entry points and control flow is making my blood pressure rise. Ideally we'd have something that could easily slot into xfstests, which is using dm-flakey for these tests right now... Any ideas? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-15 2:26 Simulating disk failure with a writeback cache Kent Overstreet @ 2016-12-15 4:15 ` Josef Bacik 2016-12-18 20:12 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-15 4:15 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch dm-log-writes is probably what you want. Thanks, Josef Sent from my iPhone > On Dec 14, 2016, at 9:27 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > > As many tests as there are in xfstests that simulate disk failure/powerloss, I'm > having a hard time believing that no one's bothered to write code to simulate a > writeback cache (so the test can drop the cached writes and test flush/fua > correctness). > > So does anyone know if such code exists and I just missed it? > > Or failing that, any suggestions on the easiest way to hack something up? This > is turning into a really irritating problem because it'd be simple enough to > write from scratch, but given the amount of code we have that does stuff like > this writing it from scratch seems rather silly - hacking loop to do buffered IO > instead of O_DIRECT would almost do it, I'd think, except I'm looking at loop.c > and just trying to follow the entry points and control flow is making my blood > pressure rise. > > Ideally we'd have something that could easily slot into xfstests, which is using > dm-flakey for these tests right now... > > Any ideas? > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-15 4:15 ` Josef Bacik @ 2016-12-18 20:12 ` Kent Overstreet 2016-12-18 20:38 ` Josef Bacik 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-18 20:12 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote: > dm-log-writes is probably what you want. Thanks, Oh, that actually looks pretty cool. Don't suppose anyone is working on making use of it in xfstests? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-18 20:12 ` Kent Overstreet @ 2016-12-18 20:38 ` Josef Bacik 2016-12-18 20:46 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-18 20:38 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote: >> dm-log-writes is probably what you want. Thanks, > > Oh, that actually looks pretty cool. > > Don't suppose anyone is working on making use of it in xfstests? Actually I had two xfstests, one that used fsstress and just made sure every commit point was valid (every FUA/FLUSH it found in the log) and then one that modified fsx to output a known good image every time it ran fsync and mark the log to make sure fsync did the correct thing. I need to go back and clean them up and get the upstream, but I've been pretty heavily distracted with other things for the last year or two. I'll make getting those upstream a priority after Christmas. Thanks, Josef ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-18 20:38 ` Josef Bacik @ 2016-12-18 20:46 ` Kent Overstreet 2016-12-18 21:19 ` Josef Bacik 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-18 20:46 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Sun, Dec 18, 2016 at 03:38:54PM -0500, Josef Bacik wrote: > On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet <kent.overstreet@gmail.com> > wrote: > > On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote: > > > dm-log-writes is probably what you want. Thanks, > > > > Oh, that actually looks pretty cool. > > > > Don't suppose anyone is working on making use of it in xfstests? > > Actually I had two xfstests, one that used fsstress and just made sure every > commit point was valid (every FUA/FLUSH it found in the log) and then one > that modified fsx to output a known good image every time it ran fsync and > mark the log to make sure fsync did the correct thing. I need to go back > and clean them up and get the upstream, but I've been pretty heavily > distracted with other things for the last year or two. I'll make getting > those upstream a priority after Christmas. Thanks, Don't suppose you could throw up just what you've currently got somewhere? It'd save me some work. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-18 20:46 ` Kent Overstreet @ 2016-12-18 21:19 ` Josef Bacik 2016-12-19 2:51 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-18 21:19 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch Eeesh these are older than I thought, I pushed to https://github.com/josefbacik/fstests.git. The fsx work is there and is generic, the fsstress one has some btrfs specific stuff but you can just pull that crap out and it'll work on anything. Let me know if you need anything else, thanks, Josef Sent from my iPhone > On Dec 18, 2016, at 3:46 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > >> On Sun, Dec 18, 2016 at 03:38:54PM -0500, Josef Bacik wrote: >> On Sun, Dec 18, 2016 at 3:12 PM, Kent Overstreet <kent.overstreet@gmail.com> >> wrote: >>> On Thu, Dec 15, 2016 at 04:15:27AM +0000, Josef Bacik wrote: >>>> dm-log-writes is probably what you want. Thanks, >>> >>> Oh, that actually looks pretty cool. >>> >>> Don't suppose anyone is working on making use of it in xfstests? >> >> Actually I had two xfstests, one that used fsstress and just made sure every >> commit point was valid (every FUA/FLUSH it found in the log) and then one >> that modified fsx to output a known good image every time it ran fsync and >> mark the log to make sure fsync did the correct thing. I need to go back >> and clean them up and get the upstream, but I've been pretty heavily >> distracted with other things for the last year or two. I'll make getting >> those upstream a priority after Christmas. Thanks, > > Don't suppose you could throw up just what you've currently got somewhere? It'd > save me some work. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-18 21:19 ` Josef Bacik @ 2016-12-19 2:51 ` Kent Overstreet 2016-12-19 3:07 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-19 2:51 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote: > Eeesh these are older than I thought, I pushed to > https://github.com/josefbacik/fstests.git. The fsx work is there and is > generic, the fsstress one has some btrfs specific stuff but you can just pull > that crap out and it'll work on anything. Let me know if you need anything > else, thanks, Thanks I got your first fsx based test up and running, but - did you get it to pass with any existing filesystems? I finally figured out what I'm seeing, in _check_files() where it's looping over the mark: I changed it to so that it always checks the fsync marks in the order where it was created, but when it goes to check the very first mark it's getting the very last version of the file (at least, the file size is consistent with that). I just pushed what I'm working off of, but I don't think it's anything I broke... https://evilpiepirate.org/git/xfstests.git ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 2:51 ` Kent Overstreet @ 2016-12-19 3:07 ` Kent Overstreet 2016-12-19 12:58 ` Josef Bacik 2016-12-19 15:27 ` Josef Bacik 0 siblings, 2 replies; 15+ messages in thread From: Kent Overstreet @ 2016-12-19 3:07 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote: > On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote: > > Eeesh these are older than I thought, I pushed to > > https://github.com/josefbacik/fstests.git. The fsx work is there and is > > generic, the fsstress one has some btrfs specific stuff but you can just pull > > that crap out and it'll work on anything. Let me know if you need anything > > else, thanks, > > Thanks > > I got your first fsx based test up and running, but - did you get it to pass > with any existing filesystems? > > I finally figured out what I'm seeing, in _check_files() where it's looping over > the mark: I changed it to so that it always checks the fsync marks in the order > where it was created, but when it goes to check the very first mark it's getting > the very last version of the file (at least, the file size is consistent with > that). > > I just pushed what I'm working off of, but I don't think it's anything I > broke... > > https://evilpiepirate.org/git/xfstests.git Just figured it out - log replay doesn't touch anything that hadn't been written at that point in the log, so the newer journal entries were still there. fun... If I blow away the journal before replay, it works. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 3:07 ` Kent Overstreet @ 2016-12-19 12:58 ` Josef Bacik 2016-12-19 15:27 ` Josef Bacik 1 sibling, 0 replies; 15+ messages in thread From: Josef Bacik @ 2016-12-19 12:58 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch > On Dec 18, 2016, at 10:07 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > >> On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote: >>> On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote: >>> Eeesh these are older than I thought, I pushed to >>> https://github.com/josefbacik/fstests.git. The fsx work is there and is >>> generic, the fsstress one has some btrfs specific stuff but you can just pull >>> that crap out and it'll work on anything. Let me know if you need anything >>> else, thanks, >> >> Thanks >> >> I got your first fsx based test up and running, but - did you get it to pass >> with any existing filesystems? >> >> I finally figured out what I'm seeing, in _check_files() where it's looping over >> the mark: I changed it to so that it always checks the fsync marks in the order >> where it was created, but when it goes to check the very first mark it's getting >> the very last version of the file (at least, the file size is consistent with >> that). >> >> I just pushed what I'm working off of, but I don't think it's anything I >> broke... >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__evilpiepirate.org_git_xfstests.git&d=DgIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=E_8Rycs0b-Vo8GLKPW3XOah6HMo1P1ixJKC56-np3BQ&s=xABKM88GhCUwdBnB8_7bzZM76O0U9tnQdK6p40kdXtU&e= > > Just figured it out - log replay doesn't touch anything that hadn't been written > at that point in the log, so the newer journal entries were still there. fun... > > If I blow away the journal before replay, it works. I'm confused, are you talking about the log writes log? Thanks, Josef ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 3:07 ` Kent Overstreet 2016-12-19 12:58 ` Josef Bacik @ 2016-12-19 15:27 ` Josef Bacik 2016-12-19 20:55 ` Kent Overstreet 1 sibling, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-19 15:27 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch On Sun, Dec 18, 2016 at 10:07 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > On Sun, Dec 18, 2016 at 05:51:11PM -0900, Kent Overstreet wrote: >> On Sun, Dec 18, 2016 at 09:19:57PM +0000, Josef Bacik wrote: >> > Eeesh these are older than I thought, I pushed to >> > https://github.com/josefbacik/fstests.git. The fsx work is there >> and is >> > generic, the fsstress one has some btrfs specific stuff but you >> can just pull >> > that crap out and it'll work on anything. Let me know if you >> need anything >> > else, thanks, >> >> Thanks >> >> I got your first fsx based test up and running, but - did you get >> it to pass >> with any existing filesystems? >> >> I finally figured out what I'm seeing, in _check_files() where it's >> looping over >> the mark: I changed it to so that it always checks the fsync marks >> in the order >> where it was created, but when it goes to check the very first mark >> it's getting >> the very last version of the file (at least, the file size is >> consistent with >> that). >> >> I just pushed what I'm working off of, but I don't think it's >> anything I >> broke... >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__evilpiepirate.org_git_xfstests.git&d=DgIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=E_8Rycs0b-Vo8GLKPW3XOah6HMo1P1ixJKC56-np3BQ&s=xABKM88GhCUwdBnB8_7bzZM76O0U9tnQdK6p40kdXtU&e= > > Just figured it out - log replay doesn't touch anything that hadn't > been written > at that point in the log, so the newer journal entries were still > there. fun... > > If I blow away the journal before replay, it works. Oh sorry for some reason I only saw this reply but not your previous one about seeing the last version of the file. Yeah that's kind of annoying, I'll make a note to fix the test to do a wipefs before doing the log replay. Does using wipefs work as well as dd'ing the first bit of the disk for you? Thanks, Josef ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 15:27 ` Josef Bacik @ 2016-12-19 20:55 ` Kent Overstreet 2016-12-19 21:00 ` Josef Bacik 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-19 20:55 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Mon, Dec 19, 2016 at 10:27:34AM -0500, Josef Bacik wrote: > Oh sorry for some reason I only saw this reply but not your previous one > about seeing the last version of the file. Yeah that's kind of annoying, > I'll make a note to fix the test to do a wipefs before doing the log replay. > Does using wipefs work as well as dd'ing the first bit of the disk for you? > Thanks, That worked for the short test, but it won't work in general - the issue is that in bcache/bcachefs, btree nodes are also log structured, so if we just blow away the journal we'll still find btree node entries from the future (and it'll complain loudly on finding btree node entries far in the future of the most recent journal entry). For the bug I'm hunting now I'm gonna see if I can reproduce it with a small enough filesystem that I can just dd over the entire device between replays. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 20:55 ` Kent Overstreet @ 2016-12-19 21:00 ` Josef Bacik 2016-12-19 21:53 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-19 21:00 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch On Mon, Dec 19, 2016 at 3:55 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > On Mon, Dec 19, 2016 at 10:27:34AM -0500, Josef Bacik wrote: >> Oh sorry for some reason I only saw this reply but not your >> previous one >> about seeing the last version of the file. Yeah that's kind of >> annoying, >> I'll make a note to fix the test to do a wipefs before doing the >> log replay. >> Does using wipefs work as well as dd'ing the first bit of the disk >> for you? >> Thanks, > > That worked for the short test, but it won't work in general - the > issue is that > in bcache/bcachefs, btree nodes are also log structured, so if we > just blow away > the journal we'll still find btree node entries from the future (and > it'll > complain loudly on finding btree node entries far in the future of > the most > recent journal entry). > > For the bug I'm hunting now I'm gonna see if I can reproduce it with > a small > enough filesystem that I can just dd over the entire device between > replays. Ok that works fine in the short term for you, but ideally you wouldn't have to fix the test every time you wanted to run it on bcachefs. Do you have a suggestion of how to make this generic enough to support you as well without dd'ing the whole drive? Thanks, Josef ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 21:00 ` Josef Bacik @ 2016-12-19 21:53 ` Kent Overstreet 2016-12-20 1:01 ` Josef Bacik 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2016-12-19 21:53 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Mon, Dec 19, 2016 at 04:00:40PM -0500, Josef Bacik wrote: > Ok that works fine in the short term for you, but ideally you wouldn't have > to fix the test every time you wanted to run it on bcachefs. Do you have a > suggestion of how to make this generic enough to support you as well without > dd'ing the whole drive? Thanks, Yes, but it'd be a lot of work :) Change log-writes so that instead of replaying the log, you have it create a block device representing that specific point in the log - and then service read requests by looking them up however you like. So, it'd be just like mounting snapshots. It might not be too difficult since we can deal with crappy performance or memory usage here, but I don't know if it's really worth the effort. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-19 21:53 ` Kent Overstreet @ 2016-12-20 1:01 ` Josef Bacik 2016-12-20 1:30 ` Kent Overstreet 0 siblings, 1 reply; 15+ messages in thread From: Josef Bacik @ 2016-12-20 1:01 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-fsdevel, david, hch > On Dec 19, 2016, at 4:53 PM, Kent Overstreet <kent.overstreet@gmail.com> wrote: > >> On Mon, Dec 19, 2016 at 04:00:40PM -0500, Josef Bacik wrote: >> Ok that works fine in the short term for you, but ideally you wouldn't have >> to fix the test every time you wanted to run it on bcachefs. Do you have a >> suggestion of how to make this generic enough to support you as well without >> dd'ing the whole drive? Thanks, > > Yes, but it'd be a lot of work :) > > Change log-writes so that instead of replaying the log, you have it create a > block device representing that specific point in the log - and then service read > requests by looking them up however you like. So, it'd be just like mounting > snapshots. > > It might not be too difficult since we can deal with crappy performance or > memory usage here, but I don't know if it's really worth the effort. One thing I was definitely going to do is use the dm thin provisioning so I could snapshot between each check so we don't have to replay from the beginning every time we want to check a different mark. Replaying the whole thing is fine for small runs like the fsx thing, but when you want to say run fsstress for a minute and replay to every fua and fsck and mount it gets tedious fast. Thanks, Josef ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Simulating disk failure with a writeback cache 2016-12-20 1:01 ` Josef Bacik @ 2016-12-20 1:30 ` Kent Overstreet 0 siblings, 0 replies; 15+ messages in thread From: Kent Overstreet @ 2016-12-20 1:30 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, david, hch On Tue, Dec 20, 2016 at 01:01:29AM +0000, Josef Bacik wrote: > One thing I was definitely going to do is use the dm thin provisioning so I > could snapshot between each check so we don't have to replay from the > beginning every time we want to check a different mark. Replaying the whole > thing is fine for small runs like the fsx thing, but when you want to say run > fsstress for a minute and replay to every fua and fsck and mount it gets > tedious fast. Thanks, I actually just converted my new test to do that: https://evilpiepirate.org/git/ktest.git/commit/ now, most of the time in the test is spent in mount/unmount (bcachefs doesn't have a standalone fsck yet, we check everything fsck should but it's done at mount time). ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-12-20 1:30 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-15 2:26 Simulating disk failure with a writeback cache Kent Overstreet 2016-12-15 4:15 ` Josef Bacik 2016-12-18 20:12 ` Kent Overstreet 2016-12-18 20:38 ` Josef Bacik 2016-12-18 20:46 ` Kent Overstreet 2016-12-18 21:19 ` Josef Bacik 2016-12-19 2:51 ` Kent Overstreet 2016-12-19 3:07 ` Kent Overstreet 2016-12-19 12:58 ` Josef Bacik 2016-12-19 15:27 ` Josef Bacik 2016-12-19 20:55 ` Kent Overstreet 2016-12-19 21:00 ` Josef Bacik 2016-12-19 21:53 ` Kent Overstreet 2016-12-20 1:01 ` Josef Bacik 2016-12-20 1:30 ` Kent Overstreet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.