* [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS @ 2020-08-26 14:38 Brian Foster 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw) To: fstests; +Cc: linux-xfs Hi all, We've still had lingering false positive failure reports on some of these generic dmlogwrites tests on XFS due to metadata ordering issues. The logwrites mechanism relies on discard to provide zeroing behavior to avoid this, but if that is not available, this can result in subtle failures that take time to diagnose. This series updates the remaining generic dmlogwrites tests to use the same scheme we used in generic/482 to address this problem, which is to explicitly use a thin volume for predictable discard support. It also adds a discard zeroing behavior check as a backstop against future tests. The thought crossed my mind of pushing much of this code down into the common dmlogwrites code to reduce duplication, but I didn't want to get too deep into the weeds of reworking the common code to address this problem in a handful of tests. Thoughts, reviews, flames appreciated. Brian Brian Foster (4): generic: require discard zero behavior for dmlogwrites on XFS generic/455: use thin volume for dmlogwrites target device generic/457: use thin volume for dmlogwrites target device generic/470: use thin volume for dmlogwrites target device common/dmlogwrites | 10 ++++++++-- common/rc | 14 ++++++++++++++ tests/generic/455 | 36 ++++++++++++++++++++++-------------- tests/generic/457 | 33 +++++++++++++++++++++------------ tests/generic/470 | 24 ++++++++++++++++++------ 5 files changed, 83 insertions(+), 34 deletions(-) -- 2.25.4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster @ 2020-08-26 14:38 ` Brian Foster 2020-08-27 6:58 ` Amir Goldstein 2020-08-27 7:38 ` Christoph Hellwig 2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster ` (2 subsequent siblings) 3 siblings, 2 replies; 19+ messages in thread From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw) To: fstests; +Cc: linux-xfs Several generic fstests use dm-log-writes to test the filesystem for consistency at various crash recovery points. dm-log-writes and the associated replay mechanism rely on zeroing via discard to clear stale blocks when moving to various points in time of the fs. If the storage doesn't provide zeroing or the discard requests exceed the hardcoded maximum (128MB) of the fallback solution to physically write zeroes, stale blocks are left around in the target fs. This scheme is known to cause issues on XFS v5 superblocks if recovery observes metadata from a future variant of an fs that has been replayed to an older point in time. This corrupts the filesystem and leads to false test failures. generic/482 already works around this problem by using a thin volume as the target device, which provides consistent and efficient discard zeroing behavior, but other tests have seen similar issues on XFS. Add an XFS specific check to the dmlogwrites init time code that requires discard zeroing support and otherwise skips the test to avoid false positive failures. Signed-off-by: Brian Foster <bfoster@redhat.com> --- common/dmlogwrites | 10 ++++++++-- common/rc | 14 ++++++++++++++ tests/generic/470 | 2 +- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/common/dmlogwrites b/common/dmlogwrites index 573f4b8a..92cc6ce2 100644 --- a/common/dmlogwrites +++ b/common/dmlogwrites @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt() _require_test_program "log-writes/replay-log" local ret=0 - local mountopt=$1 + local dev=$1 + local mountopt=$2 - _log_writes_init $SCRATCH_DEV + _log_writes_init $dev _log_writes_mkfs > /dev/null 2>&1 _log_writes_mount "-o $mountopt" > /dev/null 2>&1 # Check options to be sure. @@ -66,6 +67,11 @@ _log_writes_init() [ -z "$blkdev" ] && _fail \ "block dev must be specified for _log_writes_init" + # XFS requires discard zeroing support on the target device to work + # reliably with dm-log-writes. Use dm-thin devices in tests that want + # to provide reliable discard zeroing support. + [ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev + local BLK_DEV_SIZE=`blockdev --getsz $blkdev` LOGWRITES_NAME=logwrites-test LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME diff --git a/common/rc b/common/rc index aa5a7409..fedb5221 100644 --- a/common/rc +++ b/common/rc @@ -4313,6 +4313,20 @@ _require_mknod() rm -f $TEST_DIR/$seq.null } +# check that discard is supported and subsequent reads return zeroes +_require_discard_zeroes() +{ + local dev=$1 + + _require_command "$BLKDISCARD_PROG" blkdiscard + + $XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 || + _fail "write error" + $BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support" + hexdump -n 4096 $dev | head -n 1 | grep cdcd && + _notrun "no discard zeroing support" +} + init_rc ################################################################################ diff --git a/tests/generic/470 b/tests/generic/470 index fd6da563..707b6237 100755 --- a/tests/generic/470 +++ b/tests/generic/470 @@ -35,7 +35,7 @@ rm -f $seqres.full _supported_fs generic _supported_os Linux _require_scratch -_require_log_writes_dax_mountopt "dax" +_require_log_writes_dax_mountopt $SCRATCH_DEV "dax" _require_xfs_io_command "mmap" "-S" _require_xfs_io_command "log_writes" -- 2.25.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster @ 2020-08-27 6:58 ` Amir Goldstein 2020-08-27 7:02 ` Christoph Hellwig 2020-08-27 7:38 ` Christoph Hellwig 1 sibling, 1 reply; 19+ messages in thread From: Amir Goldstein @ 2020-08-27 6:58 UTC (permalink / raw) To: Brian Foster; +Cc: fstests, linux-xfs, Christoph Hellwig On Wed, Aug 26, 2020 at 5:38 PM Brian Foster <bfoster@redhat.com> wrote: > > Several generic fstests use dm-log-writes to test the filesystem for > consistency at various crash recovery points. dm-log-writes and the > associated replay mechanism rely on zeroing via discard to clear > stale blocks when moving to various points in time of the fs. If the > storage doesn't provide zeroing or the discard requests exceed the > hardcoded maximum (128MB) of the fallback solution to physically > write zeroes, stale blocks are left around in the target fs. This > scheme is known to cause issues on XFS v5 superblocks if recovery > observes metadata from a future variant of an fs that has been > replayed to an older point in time. This corrupts the filesystem and > leads to false test failures. > > generic/482 already works around this problem by using a thin volume > as the target device, which provides consistent and efficient > discard zeroing behavior, but other tests have seen similar issues > on XFS. Add an XFS specific check to the dmlogwrites init time code > that requires discard zeroing support and otherwise skips the test > to avoid false positive failures. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > common/dmlogwrites | 10 ++++++++-- > common/rc | 14 ++++++++++++++ > tests/generic/470 | 2 +- > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/common/dmlogwrites b/common/dmlogwrites > index 573f4b8a..92cc6ce2 100644 > --- a/common/dmlogwrites > +++ b/common/dmlogwrites > @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt() > _require_test_program "log-writes/replay-log" > > local ret=0 > - local mountopt=$1 > + local dev=$1 > + local mountopt=$2 > > - _log_writes_init $SCRATCH_DEV > + _log_writes_init $dev > _log_writes_mkfs > /dev/null 2>&1 > _log_writes_mount "-o $mountopt" > /dev/null 2>&1 > # Check options to be sure. > @@ -66,6 +67,11 @@ _log_writes_init() > [ -z "$blkdev" ] && _fail \ > "block dev must be specified for _log_writes_init" > > + # XFS requires discard zeroing support on the target device to work > + # reliably with dm-log-writes. Use dm-thin devices in tests that want > + # to provide reliable discard zeroing support. > + [ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev > + I imagine that ext4 could also be burned by this. Do we have a reason to limit this requirement to xfs? I prefer to make it generic. > local BLK_DEV_SIZE=`blockdev --getsz $blkdev` > LOGWRITES_NAME=logwrites-test > LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME > diff --git a/common/rc b/common/rc > index aa5a7409..fedb5221 100644 > --- a/common/rc > +++ b/common/rc > @@ -4313,6 +4313,20 @@ _require_mknod() > rm -f $TEST_DIR/$seq.null > } > > +# check that discard is supported and subsequent reads return zeroes > +_require_discard_zeroes() > +{ > + local dev=$1 > + > + _require_command "$BLKDISCARD_PROG" blkdiscard > + > + $XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 || > + _fail "write error" > + $BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support" > + hexdump -n 4096 $dev | head -n 1 | grep cdcd && > + _notrun "no discard zeroing support" > +} > + I am fine with your solution, but if there was a discussion on the best way to solve the problem, I missed it, so would like to hear what Chritoph has to say. Thanks, Amir. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 6:58 ` Amir Goldstein @ 2020-08-27 7:02 ` Christoph Hellwig 2020-08-27 7:29 ` Amir Goldstein 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2020-08-27 7:02 UTC (permalink / raw) To: Amir Goldstein; +Cc: Brian Foster, fstests, linux-xfs, Christoph Hellwig On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote: > I imagine that ext4 could also be burned by this. > Do we have a reason to limit this requirement to xfs? > I prefer to make it generic. The whole idea that discard zeroes data is just completely broken. Discard is a hint that is explicitly allowed to do nothing. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 7:02 ` Christoph Hellwig @ 2020-08-27 7:29 ` Amir Goldstein 2020-08-27 7:37 ` Christoph Hellwig 2020-08-27 14:11 ` Brian Foster 0 siblings, 2 replies; 19+ messages in thread From: Amir Goldstein @ 2020-08-27 7:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Brian Foster, fstests, linux-xfs, Josef Bacik On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote: > > I imagine that ext4 could also be burned by this. > > Do we have a reason to limit this requirement to xfs? > > I prefer to make it generic. > > The whole idea that discard zeroes data is just completely broken. > Discard is a hint that is explicitly allowed to do nothing. I figured you'd say something like that :) but since we are talking about dm-thin as a solution for predictable behavior at the moment and this sanity check helps avoiding adding new tests that can fail to some extent, is the proposed bandaid good enough to keep those tests alive until a better solution is proposed? Another concern that I have is that perhaps adding dm-thin to the fsx tests would change timing of io in a subtle way and hide the original bugs that they caught: 47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync") 3af423b03435 ("xfs: evict CoW fork extents when performing finsert/fcollapse") 51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and delalloc after a crash") Because at the time I (re)wrote Josef's fsx tests, they flushed out the bugs on spinning rust much more frequently than on SSD. So Brian, if you could verify that the fsx tests still catch the original bugs by reverting the fixes or running with an old kernel and probably running on a slow disk that would be great. I know it is not a simple request, but I just don't know how else to trust these changes. I guess a quick way for you to avoid the hassle is to add _require_discard_zeroes (patch #1) and drop the rest. Thanks, Amir. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 7:29 ` Amir Goldstein @ 2020-08-27 7:37 ` Christoph Hellwig 2020-08-27 15:57 ` Josef Bacik 2020-08-27 14:11 ` Brian Foster 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2020-08-27 7:37 UTC (permalink / raw) To: Amir Goldstein Cc: Christoph Hellwig, Brian Foster, fstests, linux-xfs, Josef Bacik On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote: > I figured you'd say something like that :) > but since we are talking about dm-thin as a solution for predictable > behavior at the moment and this sanity check helps avoiding adding > new tests that can fail to some extent, is the proposed bandaid good enough > to keep those tests alive until a better solution is proposed? Well, the problem is that a test that wants to reliable nuke data needs to... *drumroll* reliably nuke data. Which means zeroing or at least a known pattern. discard doesn't give you that. I don't see how a plain discard is going to work for any file system for that particular case. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 7:37 ` Christoph Hellwig @ 2020-08-27 15:57 ` Josef Bacik 2020-08-27 17:02 ` Christoph Hellwig 2020-08-29 6:44 ` Christoph Hellwig 0 siblings, 2 replies; 19+ messages in thread From: Josef Bacik @ 2020-08-27 15:57 UTC (permalink / raw) To: Christoph Hellwig, Amir Goldstein; +Cc: Brian Foster, fstests, linux-xfs On 8/27/20 3:37 AM, Christoph Hellwig wrote: > On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote: >> I figured you'd say something like that :) >> but since we are talking about dm-thin as a solution for predictable >> behavior at the moment and this sanity check helps avoiding adding >> new tests that can fail to some extent, is the proposed bandaid good enough >> to keep those tests alive until a better solution is proposed? > > Well, the problem is that a test that wants to reliable nuke data needs > to... *drumroll* reliably nuke data. Which means zeroing or at least > a known pattern. discard doesn't give you that. > > I don't see how a plain discard is going to work for any file system > for that particular case. > This sort of brings up a good point, the whole point of DISCARD support in log-writes was to expose problems where we may have been discarding real data we cared about, hence adding the forced zero'ing stuff for devices that didn't support discard. But that made the incorrect assumption that a drive with actual discard support would actually return 0's for discarded data. That assumption was based on hardware that did actually do that, but now we live in the brave new world of significantly shittier drives. Does dm-thinp reliably unmap the ranges we discard, and thus give us this zero'ing behavior? Because we might as well just use that for everything so log-writes doesn't have to resort to pwrite()'ing zeros everywhere. Thanks, Josef ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 15:57 ` Josef Bacik @ 2020-08-27 17:02 ` Christoph Hellwig 2020-08-27 18:35 ` Brian Foster 2020-08-29 6:44 ` Christoph Hellwig 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2020-08-27 17:02 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, Amir Goldstein, Brian Foster, fstests, linux-xfs On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote: > This sort of brings up a good point, the whole point of DISCARD support in > log-writes was to expose problems where we may have been discarding real > data we cared about, hence adding the forced zero'ing stuff for devices that > didn't support discard. But that made the incorrect assumption that a drive > with actual discard support would actually return 0's for discarded data. > That assumption was based on hardware that did actually do that, but now we > live in the brave new world of significantly shittier drives. Does dm-thinp > reliably unmap the ranges we discard, and thus give us this zero'ing > behavior? Because we might as well just use that for everything so > log-writes doesn't have to resort to pwrite()'ing zeros everywhere. Thanks, We have a write zeroes operation in the block layer. For some devices this is as efficient as discard, and that should (I think) dm. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 17:02 ` Christoph Hellwig @ 2020-08-27 18:35 ` Brian Foster 2020-08-29 6:46 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Brian Foster @ 2020-08-27 18:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Josef Bacik, Amir Goldstein, fstests, linux-xfs On Thu, Aug 27, 2020 at 06:02:42PM +0100, Christoph Hellwig wrote: > On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote: > > This sort of brings up a good point, the whole point of DISCARD support in > > log-writes was to expose problems where we may have been discarding real > > data we cared about, hence adding the forced zero'ing stuff for devices that > > didn't support discard. But that made the incorrect assumption that a drive > > with actual discard support would actually return 0's for discarded data. > > That assumption was based on hardware that did actually do that, but now we > > live in the brave new world of significantly shittier drives. Does dm-thinp > > reliably unmap the ranges we discard, and thus give us this zero'ing > > behavior? Because we might as well just use that for everything so > > log-writes doesn't have to resort to pwrite()'ing zeros everywhere. Thanks, > That's pretty much what this series does. It only modifies the generic tests because I didn't want to mess with the others (all btrfs, I think) that might not have any issues, but I wouldn't be opposed to burying the logic into the dmlogwrites bits so it just always creates a thin volume behind the scenes. If we were going to do that, I'd prefer to do it as a follow up to these patches (dropping patch 1, most likely) so at least they can remain enabled on XFS for the time being. OTOH, perhaps the thinp behavior could be internal, but conditional based on XFS. It's not really clear to me if this problem is more of an XFS phenomenon or just that XFS happens to have some unique recovery checking logic that explicitly detects it. It seems more like the latter, but I don't know enough about ext4 or btrfs to say.. > We have a write zeroes operation in the block layer. For some devices > this is as efficient as discard, and that should (I think) dm. > Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes from userspace, but I don't think it's efficient enough to solve this problem. It takes about 3m to manually zero my 15GB lvm (dm-linear) scratch device on my test vm via dd using sync writes. A 'blkdiscard -z' saves me about half that time, but IIRC this is an operation that would occur every time the logwrites device is replayed to a particular recovery point (which can happen many times per test). Brian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 18:35 ` Brian Foster @ 2020-08-29 6:46 ` Christoph Hellwig 2020-08-30 13:30 ` Amir Goldstein 2020-08-31 13:37 ` Brian Foster 0 siblings, 2 replies; 19+ messages in thread From: Christoph Hellwig @ 2020-08-29 6:46 UTC (permalink / raw) To: Brian Foster Cc: Christoph Hellwig, Josef Bacik, Amir Goldstein, fstests, linux-xfs On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote: > OTOH, perhaps the thinp behavior could be internal, but conditional > based on XFS. It's not really clear to me if this problem is more of an > XFS phenomenon or just that XFS happens to have some unique recovery > checking logic that explicitly detects it. It seems more like the > latter, but I don't know enough about ext4 or btrfs to say.. The way I understand the tests (and Josefs mail seems to confirm that) is that it uses discards to ensure data disappears. Unfortunately that's only how discard sometimes work, but not all the time. > > We have a write zeroes operation in the block layer. For some devices > > this is as efficient as discard, and that should (I think) dm. > > > > Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes > from userspace, but I don't think it's efficient enough to solve this > problem. It takes about 3m to manually zero my 15GB lvm (dm-linear) > scratch device on my test vm via dd using sync writes. A 'blkdiscard -z' > saves me about half that time, but IIRC this is an operation that would > occur every time the logwrites device is replayed to a particular > recovery point (which can happen many times per test). Are we talking about the block layer interface or the userspace syscall one? I though it was the former, in which case REQ_OP_WRITE_ZEROES is the interface. User interface is harder - you need to use fallocate on the block device, but the flags are mapped kinda weird: FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE guarantees you a REQ_OP_WRITE_ZEROES, but there is a bunch of other variants that include fallbacks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-29 6:46 ` Christoph Hellwig @ 2020-08-30 13:30 ` Amir Goldstein 2020-08-31 13:37 ` Brian Foster 1 sibling, 0 replies; 19+ messages in thread From: Amir Goldstein @ 2020-08-30 13:30 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Brian Foster, Josef Bacik, fstests, linux-xfs On Sat, Aug 29, 2020 at 9:47 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote: > > OTOH, perhaps the thinp behavior could be internal, but conditional > > based on XFS. It's not really clear to me if this problem is more of an > > XFS phenomenon or just that XFS happens to have some unique recovery > > checking logic that explicitly detects it. It seems more like the > > latter, but I don't know enough about ext4 or btrfs to say.. > > The way I understand the tests (and Josefs mail seems to confirm that) > is that it uses discards to ensure data disappears. Unfortunately > that's only how discard sometimes work, but not all the time. > I think we are confusing two slightly different uses of discard in those tests. One use case is that dm-logwrites records discards in test workloads and then needs to replay them to simulate the sequence of IO event up to a crash point. I think that use case is less interesting, because as Christoph points out, discard is not reliable, so I think we should get rid of " -o discard" in the tests - it did not catch any issues that I know of. But there is another discard in those tests issued by _log_writes_mkfs (at least it does for xfs and ext4). This discard has the by product of making sure that replay from the start to point in time, first wipes all stale data from the replay block device. Of course the problems we encounter is that it does not wipe all stale data when not running on dm-thinp, which is why I suggested to always use dm-thinp for replay device, but I can live perfectly well with Brian's v1 patches where both workload and replay are done on dm-thinp. Josef had two variants for those tests, one doing "replay from start" for every checkpoint and utilizing this discard-as-wipe behavior and one flavor that used dm-thinp to take snapshots and replay from snapshot T to the next mark. I remember someone tried converting some of the tests to the snapshot flavor, but it turned out to be slower, so we left it as is (always replay from the start). In conclusion, I *think* the correct fix for the failing tests is: 1. Use dm-thinp for all those tests (as v1 does) 2. In _log_writes_replay_log{,_range}() start by explicitly wiping dm-thinp, either with with hole punch command or by re-creating the new thinp volume, whichever is faster. instead of relying on the replay of discard operation recorded from mkfs that sort of kind of worked by mistake. Thanks, Amir. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-29 6:46 ` Christoph Hellwig 2020-08-30 13:30 ` Amir Goldstein @ 2020-08-31 13:37 ` Brian Foster 1 sibling, 0 replies; 19+ messages in thread From: Brian Foster @ 2020-08-31 13:37 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Josef Bacik, Amir Goldstein, fstests, linux-xfs On Sat, Aug 29, 2020 at 07:46:59AM +0100, Christoph Hellwig wrote: > On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote: > > OTOH, perhaps the thinp behavior could be internal, but conditional > > based on XFS. It's not really clear to me if this problem is more of an > > XFS phenomenon or just that XFS happens to have some unique recovery > > checking logic that explicitly detects it. It seems more like the > > latter, but I don't know enough about ext4 or btrfs to say.. > > The way I understand the tests (and Josefs mail seems to confirm that) > is that it uses discards to ensure data disappears. Unfortunately > that's only how discard sometimes work, but not all the time. > I think Amir's followup describes how the infrastructure uses discard better than I could. I'm not intimately familiar with how it works, so my goal was to take the same approach as generic/482 and update the tests to provide the predictable behavior expected by the infrastructure. If folks want to revisit all of that to improve the tests to not rely on discard and break that dependency, that seems like a fine direction, but it also seems that can come later as improvements to the broader infrastructure. > > > We have a write zeroes operation in the block layer. For some devices > > > this is as efficient as discard, and that should (I think) dm. > > > > > > > Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes > > from userspace, but I don't think it's efficient enough to solve this > > problem. It takes about 3m to manually zero my 15GB lvm (dm-linear) > > scratch device on my test vm via dd using sync writes. A 'blkdiscard -z' > > saves me about half that time, but IIRC this is an operation that would > > occur every time the logwrites device is replayed to a particular > > recovery point (which can happen many times per test). > > Are we talking about the block layer interface or the userspace syscall > one? I though it was the former, in which case REQ_OP_WRITE_ZEROES > is the interface. User interface is harder - you need to use fallocate > on the block device, but the flags are mapped kinda weird: > > FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE guarantees you a > REQ_OP_WRITE_ZEROES, but there is a bunch of other variants that include > fallbacks. > I was using the BLKZEROOUT ioctl in my previous test because fallocate(PUNCH_HOLE|KEEP_SIZE) (zeroing offload) isn't supported on this device. I see similar results as above with fallocate(PUNCH_HOLE|KEEP_SIZE) though, which seems to fall back to __blkdev_issue_zero_pages() in that case. Brian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 15:57 ` Josef Bacik 2020-08-27 17:02 ` Christoph Hellwig @ 2020-08-29 6:44 ` Christoph Hellwig 1 sibling, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2020-08-29 6:44 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, Amir Goldstein, Brian Foster, fstests, linux-xfs On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote: > > > > This sort of brings up a good point, the whole point of DISCARD support in > log-writes was to expose problems where we may have been discarding real > data we cared about, hence adding the forced zero'ing stuff for devices that > didn't support discard. But that made the incorrect assumption that a drive > with actual discard support would actually return 0's for discarded data. > That assumption was based on hardware that did actually do that, but now we > live in the brave new world of significantly shittier drives. Does dm-thinp > reliably unmap the ranges we discard, and thus give us this zero'ing > behavior? Because we might as well just use that for everything so > log-writes doesn't have to resort to pwrite()'ing zeros everywhere. Thanks, So the right fix is to replace issuing REQ_OP_DISCARD bios with REQ_OP_WRITE_ZEROES (plus the check for that). That'll give you the effective zeroing part if the device supports it, and you can still fall back to manual zeroing (and blk-lib.c actually has helpers for that as well). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-27 7:29 ` Amir Goldstein 2020-08-27 7:37 ` Christoph Hellwig @ 2020-08-27 14:11 ` Brian Foster [not found] ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com> 1 sibling, 1 reply; 19+ messages in thread From: Brian Foster @ 2020-08-27 14:11 UTC (permalink / raw) To: Amir Goldstein; +Cc: Christoph Hellwig, fstests, linux-xfs, Josef Bacik On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote: > On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote: > > > I imagine that ext4 could also be burned by this. > > > Do we have a reason to limit this requirement to xfs? > > > I prefer to make it generic. > > > > The whole idea that discard zeroes data is just completely broken. > > Discard is a hint that is explicitly allowed to do nothing. > > I figured you'd say something like that :) > but since we are talking about dm-thin as a solution for predictable > behavior at the moment and this sanity check helps avoiding adding > new tests that can fail to some extent, is the proposed bandaid good enough > to keep those tests alive until a better solution is proposed? > > Another concern that I have is that perhaps adding dm-thin to the fsx tests > would change timing of io in a subtle way and hide the original bugs that they > caught: > > 47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync") > 3af423b03435 ("xfs: evict CoW fork extents when performing finsert/fcollapse") > 51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and > delalloc after a crash") > > Because at the time I (re)wrote Josef's fsx tests, they flushed out the bugs > on spinning rust much more frequently than on SSD. > > So Brian, if you could verify that the fsx tests still catch the original bugs > by reverting the fixes or running with an old kernel and probably running > on a slow disk that would be great. > I made these particular changes because it's how we previously addressed generic/482 and they were a pretty clear and quick way to address the problem. I'm pretty sure I've seen these tests reproduce legitimate issues with or without thin, but that's from memory so I can't confirm that with certainty or suggest that applies for every regression these have discovered in the past. If we want to go down that path, I'd say lets just assume those no longer reproduce. That doesn't make these tests any less broken on XFS (v5, at least) today, so I guess I'd drop the thin changes (note again that generic/482 is already using dmthinp) and just disable these tests on XFS until we can come up with an acceptable solution to make them more reliable. That's slightly unfortunate because I think the tests are quite effective, but we're continuing to see spurious failures that are not trivial to diagnose. IIRC, I did at one point experiment with removing the (128M?) physical zeroing limit from the associated logwrites tool, but that somewhat predictably caused the test to take an extremely long time. I'm not sure I even allowed it to finish, tbh. Anyways, perhaps some options are to allow a larger physical zeroing limit and remove the tests from the auto group, use smaller target devices to reduce the amount of zeroing required, require the user to have a thinp or some such volume as a scratch dev already or otherwise find some other more efficient zeroing mechanism. > I know it is not a simple request, but I just don't know how else to trust > these changes. I guess a quick way for you to avoid the hassle is to add > _require_discard_zeroes (patch #1) and drop the rest. > That was going to be my fallback suggestion if the changes to the tests were problematic for whatever reason. Christoph points out that the discard zeroing logic is not predictable in general as it might be on dm-thinp, so I think for now we should probably just notrun these on XFS. I'll send something like that as a v2 of patch 1. Brian > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com>]
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS [not found] ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com> @ 2020-08-28 14:10 ` Brian Foster 0 siblings, 0 replies; 19+ messages in thread From: Brian Foster @ 2020-08-28 14:10 UTC (permalink / raw) To: Amir Goldstein; +Cc: Christoph Hellwig, Josef Bacik, fstests, linux-xfs Re-add lists to CC. On Fri, Aug 28, 2020 at 06:47:06AM +0300, Amir Goldstein wrote: > On Thu, Aug 27, 2020, 5:11 PM Brian Foster <bfoster@redhat.com> wrote: > > > On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote: > > > On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org> > > wrote: > > > > > > > > On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote: > > > > > I imagine that ext4 could also be burned by this. > > > > > Do we have a reason to limit this requirement to xfs? > > > > > I prefer to make it generic. > > > > > > > > The whole idea that discard zeroes data is just completely broken. > > > > Discard is a hint that is explicitly allowed to do nothing. > > > > > > I figured you'd say something like that :) > > > but since we are talking about dm-thin as a solution for predictable > > > behavior at the moment and this sanity check helps avoiding adding > > > new tests that can fail to some extent, is the proposed bandaid good > > enough > > > to keep those tests alive until a better solution is proposed? > > > > > > Another concern that I have is that perhaps adding dm-thin to the fsx > > tests > > > would change timing of io in a subtle way and hide the original bugs > > that they > > > caught: > > > > > > 47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync") > > > 3af423b03435 ("xfs: evict CoW fork extents when performing > > finsert/fcollapse") > > > 51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and > > > delalloc after a crash") > > > > > > Because at the time I (re)wrote Josef's fsx tests, they flushed out the > > bugs > > > on spinning rust much more frequently than on SSD. > > > > > > So Brian, if you could verify that the fsx tests still catch the > > original bugs > > > by reverting the fixes or running with an old kernel and probably running > > > on a slow disk that would be great. > > > > > > > I made these particular changes because it's how we previously addressed > > generic/482 and they were a pretty clear and quick way to address the > > problem. I'm pretty sure I've seen these tests reproduce legitimate > > issues with or without thin, but that's from memory so I can't confirm > > that with certainty or suggest that applies for every regression these > > have discovered in the past. > > > > If we want to go down that path, I'd say lets just assume those no > > longer reproduce. That doesn't make these tests any less broken on XFS > > (v5, at least) today, so I guess I'd drop the thin changes (note again > > that generic/482 is already using dmthinp) and just disable these tests > > on XFS until we can come up with an acceptable solution to make them > > more reliable. That's slightly unfortunate because I think the tests are > > quite effective, but we're continuing to see spurious failures that are > > not trivial to diagnose. > > > > > IIRC, I did at one point experiment with removing the (128M?) physical > > zeroing limit from the associated logwrites tool, but that somewhat > > predictably caused the test to take an extremely long time. I'm not sure > > I even allowed it to finish, tbh. Anyways, perhaps some options are to > > allow a larger physical zeroing limit and remove the tests from the auto > > group, use smaller target devices to reduce the amount of zeroing > > required, require the user to have a thinp or some such volume as a > > scratch dev already or otherwise find some other more efficient zeroing > > mechanism. > > > > > I know it is not a simple request, but I just don't know how else to > > trust > > > these changes. I guess a quick way for you to avoid the hassle is to add > > > _require_discard_zeroes (patch #1) and drop the rest. > > > > > > > That was going to be my fallback suggestion if the changes to the tests > > were problematic for whatever reason. Christoph points out that the > > discard zeroing logic is not predictable in general as it might be on > > dm-thinp, so I think for now we should probably just notrun these on > > XFS. I'll send something like that as a v2 of patch 1. > > > > Maybe there is a better way to stay safe and not sorry. > > Can we use dm thinp only for replay but not for fsx? > I suppose reliable zeroing is only needed in replay? > I think that would work, but might clutter or complicate the tests by using multiple devices for I/O vs. replay. That kind of strikes me as overkill given that two of these run multiple instances of fsx (which has a fixed size file) and the other looks like it runs a simple xfs_io command with a fixed amount of I/O. Ironically, generic/482 seems like the test with the most I/O overhead (via fsstress), but it's been using dm-thin for a while now. I think if we're really that concerned with dm-thin allocation overhead, we should either configure the cluster size to amortize the cost of it or just look into using smaller block devices so replay-log falls back to manual zeroing when discard doesn't work. A quick test of the latter doesn't show a huge increase in test runtime for 455/457, but I'd also have to confirm that this works as expected and eliminates the spurious corruption issue. Of course, a tradeoff could be that if discard does work but doesn't provide zeroing (which Christoph already explained is unpredictable), then I think we're still susceptible to the original problem. Perhaps we could create a mode that simply forces manual zeroing over discards if there isn't one already... Brian > Thanks, > Amir. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster 2020-08-27 6:58 ` Amir Goldstein @ 2020-08-27 7:38 ` Christoph Hellwig 1 sibling, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2020-08-27 7:38 UTC (permalink / raw) To: Brian Foster; +Cc: fstests, linux-xfs On Wed, Aug 26, 2020 at 10:38:12AM -0400, Brian Foster wrote: > Several generic fstests use dm-log-writes to test the filesystem for > consistency at various crash recovery points. dm-log-writes and the > associated replay mechanism rely on zeroing via discard to clear > stale blocks when moving to various points in time of the fs. If the > storage doesn't provide zeroing or the discard requests exceed the > hardcoded maximum (128MB) of the fallback solution to physically > write zeroes, stale blocks are left around in the target fs. This > scheme is known to cause issues on XFS v5 superblocks if recovery > observes metadata from a future variant of an fs that has been > replayed to an older point in time. This corrupts the filesystem and > leads to false test failures. > > generic/482 already works around this problem by using a thin volume > as the target device, which provides consistent and efficient > discard zeroing behavior, but other tests have seen similar issues > on XFS. Add an XFS specific check to the dmlogwrites init time code > that requires discard zeroing support and otherwise skips the test > to avoid false positive failures. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > common/dmlogwrites | 10 ++++++++-- > common/rc | 14 ++++++++++++++ > tests/generic/470 | 2 +- > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/common/dmlogwrites b/common/dmlogwrites > index 573f4b8a..92cc6ce2 100644 > --- a/common/dmlogwrites > +++ b/common/dmlogwrites > @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt() > _require_test_program "log-writes/replay-log" > > local ret=0 > - local mountopt=$1 > + local dev=$1 > + local mountopt=$2 > > - _log_writes_init $SCRATCH_DEV > + _log_writes_init $dev > _log_writes_mkfs > /dev/null 2>&1 > _log_writes_mount "-o $mountopt" > /dev/null 2>&1 > # Check options to be sure. > @@ -66,6 +67,11 @@ _log_writes_init() > [ -z "$blkdev" ] && _fail \ > "block dev must be specified for _log_writes_init" > > + # XFS requires discard zeroing support on the target device to work > + # reliably with dm-log-writes. Use dm-thin devices in tests that want > + # to provide reliable discard zeroing support. > + [ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev > + > local BLK_DEV_SIZE=`blockdev --getsz $blkdev` > LOGWRITES_NAME=logwrites-test > LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME > diff --git a/common/rc b/common/rc > index aa5a7409..fedb5221 100644 > --- a/common/rc > +++ b/common/rc > @@ -4313,6 +4313,20 @@ _require_mknod() > rm -f $TEST_DIR/$seq.null > } > > +# check that discard is supported and subsequent reads return zeroes > +_require_discard_zeroes() > +{ > + local dev=$1 > + > + _require_command "$BLKDISCARD_PROG" blkdiscard > + > + $XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 || > + _fail "write error" > + $BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support" > + hexdump -n 4096 $dev | head -n 1 | grep cdcd && > + _notrun "no discard zeroing support" > +} This test is bogus. a discard may zero parts of the request or all of it. It may decided to zero based on the LBA, the physical block number in the SSD, the phase of the moon or a random number generator. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device 2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster @ 2020-08-26 14:38 ` Brian Foster 2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster 2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster 3 siblings, 0 replies; 19+ messages in thread From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw) To: fstests; +Cc: linux-xfs dmlogwrites support for XFS depends on discard zeroing support of the intended target device. Update the test to use a thin volume and allow it to run consistently and reliably on XFS. Signed-off-by: Brian Foster <bfoster@redhat.com> --- tests/generic/455 | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/tests/generic/455 b/tests/generic/455 index 05621220..72a44fda 100755 --- a/tests/generic/455 +++ b/tests/generic/455 @@ -16,12 +16,14 @@ status=1 # failure is the default! _cleanup() { _log_writes_cleanup + _dmthin_cleanup } trap "_cleanup; exit \$status" 0 1 2 3 15 # get standard environment, filters and checks . ./common/rc . ./common/filter +. ./common/dmthin . ./common/dmlogwrites # real QA test starts here @@ -30,6 +32,7 @@ _supported_os Linux _require_test _require_scratch_nocheck _require_log_writes +_require_dm_target thin-pool rm -f $seqres.full @@ -42,13 +45,12 @@ check_files() local filename=$(basename $i) local mark="${filename##*.}" echo "checking $filename" >> $seqres.full - _log_writes_replay_log $filename $SCRATCH_DEV - _scratch_mount + _log_writes_replay_log $filename $DMTHIN_VOL_DEV + _dmthin_mount local expected_md5=$(_md5_checksum $i) local md5=$(_md5_checksum $SCRATCH_MNT/$name) [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" - _scratch_unmount - _check_scratch_fs + _dmthin_check_fs done } @@ -56,8 +58,16 @@ SANITY_DIR=$TEST_DIR/fsxtests rm -rf $SANITY_DIR mkdir $SANITY_DIR +devsize=$((1024*1024*200 / 512)) # 200m phys/virt size +csize=$((1024*64 / 512)) # 64k cluster size +lowspace=$((1024*1024 / 512)) # 1m low space threshold + +# Use a thin device to provide deterministic discard behavior. Discards are used +# by the log replay tool for fast zeroing to prevent out-of-order replay issues. +_dmthin_init $devsize $devsize $csize $lowspace + # Create the log -_log_writes_init $SCRATCH_DEV +_log_writes_init $DMTHIN_VOL_DEV _log_writes_mkfs >> $seqres.full 2>&1 @@ -88,14 +98,13 @@ _log_writes_mark last _log_writes_unmount _log_writes_mark end _log_writes_remove -_check_scratch_fs +_dmthin_check_fs # check pre umount echo "checking pre umount" >> $seqres.full -_log_writes_replay_log last $SCRATCH_DEV -_scratch_mount -_scratch_unmount -_check_scratch_fs +_log_writes_replay_log last $DMTHIN_VOL_DEV +_dmthin_mount +_dmthin_check_fs for j in `seq 0 $((NUM_FILES-1))`; do check_files testfile$j @@ -103,14 +112,13 @@ done # Check the end echo "checking post umount" >> $seqres.full -_log_writes_replay_log end $SCRATCH_DEV -_scratch_mount +_log_writes_replay_log end $DMTHIN_VOL_DEV +_dmthin_mount for j in `seq 0 $((NUM_FILES-1))`; do md5=$(_md5_checksum $SCRATCH_MNT/testfile$j) [ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched" done -_scratch_unmount -_check_scratch_fs +_dmthin_check_fs echo "Silence is golden" status=0 -- 2.25.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/4] generic/457: use thin volume for dmlogwrites target device 2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster 2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster @ 2020-08-26 14:38 ` Brian Foster 2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster 3 siblings, 0 replies; 19+ messages in thread From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw) To: fstests; +Cc: linux-xfs dmlogwrites support for XFS depends on discard zeroing support of the intended target device. Update the test to use a thin volume and allow it to run consistently and reliably on XFS. Signed-off-by: Brian Foster <bfoster@redhat.com> --- tests/generic/457 | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/tests/generic/457 b/tests/generic/457 index 82367304..42a064d8 100755 --- a/tests/generic/457 +++ b/tests/generic/457 @@ -16,6 +16,7 @@ status=1 # failure is the default! _cleanup() { _log_writes_cleanup + _dmthin_cleanup } trap "_cleanup; exit \$status" 0 1 2 3 15 @@ -23,6 +24,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 . ./common/rc . ./common/filter . ./common/reflink +. ./common/dmthin . ./common/dmlogwrites # real QA test starts here @@ -32,6 +34,7 @@ _require_test _require_scratch_reflink _require_cp_reflink _require_log_writes +_require_dm_target thin-pool rm -f $seqres.full @@ -44,13 +47,12 @@ check_files() local filename=$(basename $i) local mark="${filename##*.}" echo "checking $filename" >> $seqres.full - _log_writes_replay_log $filename $SCRATCH_DEV - _scratch_mount + _log_writes_replay_log $filename $DMTHIN_VOL_DEV + _dmthin_mount local expected_md5=$(_md5_checksum $i) local md5=$(_md5_checksum $SCRATCH_MNT/$name) [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" - _scratch_unmount - _check_scratch_fs + _dmthin_check_fs done } @@ -58,8 +60,16 @@ SANITY_DIR=$TEST_DIR/fsxtests rm -rf $SANITY_DIR mkdir $SANITY_DIR +devsize=$((1024*1024*200 / 512)) # 200m phys/virt size +csize=$((1024*64 / 512)) # 64k cluster size +lowspace=$((1024*1024 / 512)) # 1m low space threshold + +# Use a thin device to provide deterministic discard behavior. Discards are used +# by the log replay tool for fast zeroing to prevent out-of-order replay issues. +_dmthin_init $devsize $devsize $csize $lowspace + # Create the log -_log_writes_init $SCRATCH_DEV +_log_writes_init $DMTHIN_VOL_DEV _log_writes_mkfs >> $seqres.full 2>&1 @@ -92,14 +102,13 @@ _log_writes_mark last _log_writes_unmount _log_writes_mark end _log_writes_remove -_check_scratch_fs +_dmthin_check_fs # check pre umount echo "checking pre umount" >> $seqres.full -_log_writes_replay_log last $SCRATCH_DEV -_scratch_mount -_scratch_unmount -_check_scratch_fs +_log_writes_replay_log last $DMTHIN_VOL_DEV +_dmthin_mount +_dmthin_check_fs for j in `seq 0 $((NUM_FILES-1))`; do check_files testfile$j @@ -107,8 +116,8 @@ done # Check the end echo "checking post umount" >> $seqres.full -_log_writes_replay_log end $SCRATCH_DEV -_scratch_mount +_log_writes_replay_log end $DMTHIN_VOL_DEV +_dmthin_mount for j in `seq 0 $((NUM_FILES-1))`; do md5=$(_md5_checksum $SCRATCH_MNT/testfile$j) [ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched" -- 2.25.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/4] generic/470: use thin volume for dmlogwrites target device 2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster ` (2 preceding siblings ...) 2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster @ 2020-08-26 14:38 ` Brian Foster 3 siblings, 0 replies; 19+ messages in thread From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw) To: fstests; +Cc: linux-xfs dmlogwrites support for XFS depends on discard zeroing support of the intended target device. Update the test to use a thin volume and allow it to run consistently and reliably on XFS. Signed-off-by: Brian Foster <bfoster@redhat.com> --- tests/generic/470 | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/tests/generic/470 b/tests/generic/470 index 707b6237..cb7ab484 100755 --- a/tests/generic/470 +++ b/tests/generic/470 @@ -20,12 +20,14 @@ _cleanup() { cd / _log_writes_cleanup + _dmthin_cleanup rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter +. ./common/dmthin . ./common/dmlogwrites # remove previous $seqres.full before test @@ -35,11 +37,21 @@ rm -f $seqres.full _supported_fs generic _supported_os Linux _require_scratch -_require_log_writes_dax_mountopt $SCRATCH_DEV "dax" +_require_dm_target thin-pool _require_xfs_io_command "mmap" "-S" _require_xfs_io_command "log_writes" -_log_writes_init $SCRATCH_DEV +devsize=$((1024*1024*200 / 512)) # 200m phys/virt size +csize=$((1024*64 / 512)) # 64k cluster size +lowspace=$((1024*1024 / 512)) # 1m low space threshold + +# Use a thin device to provide deterministic discard behavior. Discards are used +# by the log replay tool for fast zeroing to prevent out-of-order replay issues. +_dmthin_init $devsize $devsize $csize $lowspace + +_require_log_writes_dax_mountopt $DMTHIN_VOL_DEV "dax" + +_log_writes_init $DMTHIN_VOL_DEV _log_writes_mkfs >> $seqres.full 2>&1 _log_writes_mount -o dax @@ -52,14 +64,14 @@ $XFS_IO_PROG -t -c "truncate $LEN" -c "mmap -S 0 $LEN" -c "mwrite 0 $LEN" \ # Unmount the scratch dir and tear down the log writes target _log_writes_unmount _log_writes_remove -_check_scratch_fs +_dmthin_check_fs # destroy previous filesystem so we can be sure our rebuild works -_scratch_mkfs >> $seqres.full 2>&1 +_mkfs_dev $DMTHIN_VOL_DEV >> $seqres.full 2>&1 # check pre-unmap state -_log_writes_replay_log preunmap $SCRATCH_DEV -_scratch_mount +_log_writes_replay_log preunmap $DMTHIN_VOL_DEV +_dmthin_mount # We should see $SCRATCH_MNT/test as having 1 MiB in block allocations du -sh $SCRATCH_MNT/test | _filter_scratch | _filter_spaces -- 2.25.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-08-31 13:37 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster 2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster 2020-08-27 6:58 ` Amir Goldstein 2020-08-27 7:02 ` Christoph Hellwig 2020-08-27 7:29 ` Amir Goldstein 2020-08-27 7:37 ` Christoph Hellwig 2020-08-27 15:57 ` Josef Bacik 2020-08-27 17:02 ` Christoph Hellwig 2020-08-27 18:35 ` Brian Foster 2020-08-29 6:46 ` Christoph Hellwig 2020-08-30 13:30 ` Amir Goldstein 2020-08-31 13:37 ` Brian Foster 2020-08-29 6:44 ` Christoph Hellwig 2020-08-27 14:11 ` Brian Foster [not found] ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com> 2020-08-28 14:10 ` Brian Foster 2020-08-27 7:38 ` Christoph Hellwig 2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster 2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster 2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).