From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb0-f180.google.com ([209.85.213.180]:41358 "EHLO mail-yb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750904AbdK1VWZ (ORCPT ); Tue, 28 Nov 2017 16:22:25 -0500 Received: by mail-yb0-f180.google.com with SMTP id s46so543364ybi.8 for ; Tue, 28 Nov 2017 13:22:25 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20171128210554.hyqw27egx5kjsv3q@destiny> References: <1511890225-16601-1-git-send-email-josef@toxicpanda.com> <20171128210554.hyqw27egx5kjsv3q@destiny> From: Amir Goldstein Date: Tue, 28 Nov 2017 23:22:24 +0200 Message-ID: Subject: Re: [PATCH] dm-log-writes: invalidate the bdev's for both of our devices To: Josef Bacik Cc: Mike Snitzer , dm-devel@redhat.com, linux-fsdevel , Josef Bacik Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Nov 28, 2017 at 11:05 PM, Josef Bacik wrote: > On Tue, Nov 28, 2017 at 10:40:24PM +0200, Amir Goldstein wrote: >> On Tue, Nov 28, 2017 at 9:29 PM, Amir Goldstein wrote: >> > On Tue, Nov 28, 2017 at 7:30 PM, Josef Bacik wrote: >> >> From: Josef Bacik >> >> >> >> Amir noticed that sometimes the xfstests using dm-log-writes would fail >> >> randomly but would work fine after trying again manually. This is >> >> because dm-log-writes writes directly to the device, but the log replay >> >> tools read and write via the block device page cache. Sometimes this >> >> resulted in stale data being in the block device's page cache which >> >> would result in random failures. To handle this simply invalidate the >> >> block device page cache on destruction so any replay of the log device >> >> that follows will be forced to read the new real contents. >> >> >> >> Reported-and-tested-by: Amir Goldstein >> > >> > I'm fine with the Reported-by, but let's wait a while with this patch so >> > I have more time to torture it. >> > The incidents I got even before the patch did not happen more than >> > a handful of times after running for a few days, so I need some more >> > days to validate the fix. >> > I had already sent you some weird output. Let's see what else comes >> > along. >> > >> >> Sorry, no cigar. >> Another run just completed with Malformed log and corrupted fs >> >> The _check_scratch_fs that fails is the one right after _log_writes_remove >> just like the report that I sent before this patch >> and the LOGWRITES_DEV itself has malformed entry before the "end" mark >> or even the last fsync mark: >> >> ./src/log-writes/replay-log -v --log $LOGWRITES_DEV --find --end-mark >> testfile1.mark17 >> Malformed entry @112134 >> >> For what its worth, I am testing on spinning disks, 100G scratch dev. >> Right now, I zoomed in on the following fsx seeds that managed to fail the test >> a few times already, but in different ways, so I'm not sure the seeds are more >> than voodoo: >> seeds=(4597 4598 4599 4600) >> >> I'll start running the same test but with fsx running on test partition, just >> to get the feel for running the same fsx threads on bare xfs. >> >> Any other ideas? >> > > Is there anything special about your devices? Are they 4k drives? The corrupt > log is not awesome, was it still corrupt after the test bailed out? Thanks, > No nothing special. boring 4TB WD drive. just reported on the xfstest thread that problem was reproduced with xfs on scratch partition, where dm-log-writes in not in the picture, so for now, dm-log-writes is off the hook. Still need to explain the malformed log, but will follow the xfs corruption lead first. Thanks, Amir.