From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753251AbZHaNOd (ORCPT ); Mon, 31 Aug 2009 09:14:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752468AbZHaNOd (ORCPT ); Mon, 31 Aug 2009 09:14:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47516 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752342AbZHaNOc (ORCPT ); Mon, 31 Aug 2009 09:14:32 -0400 Message-ID: <4A9BCCEF.7010402@redhat.com> Date: Mon, 31 Aug 2009 09:15:27 -0400 From: Ric Wheeler User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090806 Fedora/3.0-3.8.b3.fc12 Thunderbird/3.0b3 MIME-Version: 1.0 To: Christoph Hellwig CC: Michael Tokarev , david@lang.hm, Pavel Machek , Theodore Tso , NeilBrown , Rob Landley , Florian Weimer , Goswin von Brederlow , kernel list , Andrew Morton , mtk.manpages@gmail.com, rdunlap@xenotime.net, linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org, corbet@lwn.net Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3: document conditions when reliable operation is possible) References: <200908262253.17886.rob@landley.net> <4A967175.5070700@redhat.com> <20090827221319.GA1601@ucw.cz> <4A9733C1.2070904@redhat.com> <20090828064449.GA27528@elf.ucw.cz> <20090828120854.GA8153@mit.edu> <20090830075135.GA1874@ucw.cz> <4A9A88B6.9050902@redhat.com> <4A9A9034.8000703@msgid.tls.msk.ru> <20090830163513.GA25899@infradead.org> In-Reply-To: <20090830163513.GA25899@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/30/2009 12:35 PM, Christoph Hellwig wrote: > On Sun, Aug 30, 2009 at 06:44:04PM +0400, Michael Tokarev wrote: >>> If you lose power with the write caches enabled on that same 5 drive >>> RAID set, you could lose as much as 5 * 32MB of freshly written data on >>> a power loss (16-32MB write caches are common on s-ata disks these >>> days). >> >> This is fundamentally wrong. Many filesystems today use either barriers >> or flushes (if barriers are not supported), and the times when disk drives >> were lying to the OS that the cache got flushed are long gone. > > While most common filesystem do have barrier support it is: > > - not actually enabled for the two most common filesystems > - the support for write barriers an cache flushing tends to be buggy > all over our software stack, > Or just missing - I think that MD5/6 simply drop the requests at present. I wonder if it would be worth having MD probe for write cache enabled & warn if barriers are not supported? >>> For MD5 (and MD6), you really must run with the write cache disabled >>> until we get barriers to work for those configurations. >> >> I highly doubt barriers will ever be supported on anything but simple >> raid1, because it's impossible to guarantee ordering across multiple >> drives. Well, it *is* possible to have write barriers with journalled >> (and/or with battery-backed-cache) raid[456]. >> >> Note that even if raid[456] does not support barriers, write cache >> flushes still works. > > All currently working barrier implementations on Linux are built upon > queue drains and cache flushes, plus sometimes setting the FUA bit. >