From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Monakhov Subject: Re: [LSF/MM TOPIC] Working towards better power fail testing Date: Tue, 13 Jan 2015 20:05:06 +0300 Message-ID: <87r3uy3931.fsf@openvz.org> References: <5486221D.6000006@fb.com> Mime-Version: 1.0 Content-Type: text/plain Cc: linux-fsdevel@vger.kernel.org To: Josef Bacik , lsf-pc@lists.linux-foundation.org Return-path: Received: from mail-we0-f180.google.com ([74.125.82.180]:46218 "EHLO mail-we0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752703AbbAMRMV (ORCPT ); Tue, 13 Jan 2015 12:12:21 -0500 Received: by mail-we0-f180.google.com with SMTP id w62so4138517wes.11 for ; Tue, 13 Jan 2015 09:12:20 -0800 (PST) In-Reply-To: <5486221D.6000006@fb.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Josef Bacik writes: > Hello, > > We have been doing pretty well at populating xfstests with loads of > tests to catch regressions and validate we're all working properly. One > thing that has been lacking is a good way to verify file system > integrity after a power fail. This is a core part of what file systems > are supposed to provide but it is probably the least tested aspect. We > have dm-flakey tests in xfstests to test fsync correctness, but these > tests do not catch the random horrible things that can go wrong. We are > still finding horrible scary things that go wrong in Btrfs because it is > simply hard to reproduce and test for. > > I have been working on an idea to do this better, some may have seen my > dm-power-fail attempt, and I've got a new incarnation of the idea thanks > to discussions with Zach Brown. Obviously there will be a lot changing > in this area in the time between now and March but it would be good to > have everybody in the room talking about what they would need to build a > good and deterministic test to make sure we're always giving a > consistent file system and to make sure our fsync() handling is working > properly. Thanks, I've submitted generic/019 long time ago. Test is fine and helps to uncover several bugs, But it is not ideal because currently power failure simulation (via fail_make_request) is not not completely atomic So I would like to attend to discussion how we can implement power failure simulation completely atomic. BTW I also would like to share hw-flush utility (which our QA team use for use power-fail/SSD-cache testing) and harness for it.