From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dmitry Monakhov <dmonlist@gmail.com>
Subject: Re: [LSF/MM TOPIC] Working towards better power fail testing
Date: Tue, 13 Jan 2015 20:05:06 +0300
Message-ID: <87r3uy3931.fsf@openvz.org>
References: <5486221D.6000006@fb.com>
Mime-Version: 1.0
Content-Type: text/plain
Cc: linux-fsdevel@vger.kernel.org
To: Josef Bacik <jbacik@fb.com>, lsf-pc@lists.linux-foundation.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-we0-f180.google.com ([74.125.82.180]:46218 "EHLO
	mail-we0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752703AbbAMRMV (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 13 Jan 2015 12:12:21 -0500
Received: by mail-we0-f180.google.com with SMTP id w62so4138517wes.11
        for <linux-fsdevel@vger.kernel.org>; Tue, 13 Jan 2015 09:12:20 -0800 (PST)
In-Reply-To: <5486221D.6000006@fb.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Josef Bacik <jbacik@fb.com> writes:

> Hello,
>
> We have been doing pretty well at populating xfstests with loads of 
> tests to catch regressions and validate we're all working properly.  One 
> thing that has been lacking is a good way to verify file system 
> integrity after a power fail.  This is a core part of what file systems 
> are supposed to provide but it is probably the least tested aspect.  We 
> have dm-flakey tests in xfstests to test fsync correctness, but these 
> tests do not catch the random horrible things that can go wrong.  We are 
> still finding horrible scary things that go wrong in Btrfs because it is 
> simply hard to reproduce and test for.
>
> I have been working on an idea to do this better, some may have seen my 
> dm-power-fail attempt, and I've got a new incarnation of the idea thanks 
> to discussions with Zach Brown.  Obviously there will be a lot changing 
> in this area in the time between now and March but it would be good to 
> have everybody in the room talking about what they would need to build a 
> good and deterministic test to make sure we're always giving a 
> consistent file system and to make sure our fsync() handling is working 
> properly.  Thanks,
I've submitted generic/019 long time ago. Test is fine and helps to
uncover several bugs, But it is not ideal because currently power failure
simulation (via fail_make_request) is not not completely atomic
So I would like to attend to discussion how we can implement power
failure simulation completely atomic.

BTW I also would like to share hw-flush utility (which our QA team use for
use power-fail/SSD-cache testing) and harness for it.