From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261612AbVF0US7 (ORCPT ); Mon, 27 Jun 2005 16:18:59 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261608AbVF0US7 (ORCPT ); Mon, 27 Jun 2005 16:18:59 -0400 Received: from [63.81.117.10] ([63.81.117.10]:30408 "EHLO mail00hq.adic.com") by vger.kernel.org with ESMTP id S261612AbVF0USg (ORCPT ); Mon, 27 Jun 2005 16:18:36 -0400 Message-ID: <42C05F16.5000804@xfs.org> Date: Mon, 27 Jun 2005 15:18:30 -0500 From: Steve Lord User-Agent: Mozilla Thunderbird 1.0.2-1.3.3 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Hans Reiser CC: "Theodore Ts'o" , Markus T?rnqvist , Horst von Brand , David Masover , Alan Cox , Jeff Garzik , Christoph Hellwig , Andrew Morton , Linux Kernel Mailing List , ReiserFS List Subject: Re: reiser4 plugins References: <42BB7B32.4010100@slaphack.com> <200506240334.j5O3YowB008100@laptop11.inf.utfsm.cl> <20050627092138.GD11013@nysv.org> <20050627124255.GB6280@thunk.org> <42C0578F.7030608@namesys.com> In-Reply-To: <42C0578F.7030608@namesys.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 27 Jun 2005 20:18:33.0908 (UTC) FILETIME=[64893F40:01C57B55] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hans Reiser wrote: > Steve, there is a remark about XFS below which you are going to be more > expert on. > > Theodore Ts'o wrote: > >> >>XFS has similar issues where it assumes that hardware has powerfail >>interrupts, and that the OS can use said powerfail interrupt to stop >>DMA's in its tracks on an power failure, so that you don't have >>garbage written to key filesystem data structures when the memory >>starts suffering from the dropping voltage on the power bus faster >>than the DMA engine or the disk drives. So XFS is a great filesystem >>--- but you'd better be running it on a UPS, or on a system which has >>power fail interrupts and an OS that knows what to do. Ext3, because >>it does physical block journalling, does not suffer from this problem. >>(Yes, Resierfs uses logical journalling as well, so it suffers from >>the same problem.) >> I presume Ted is referring to problems guaranteeing the integrity of the journal at recovery time. I am coming into this without all the available context, so I may be barking up the wrong tree.... In particular, I am not sure how journaling whole blocks protects you from this. The xfs journal protects itself against partial writes, to a certain degree. The header of a journal write (inside a 512 byte sector) contains an array of words which are swapped out from the start of each following 512 byte sector of the journal write. The following sectors then each have the log sequence number (LSN) of the write inserted in place of that data. During recovery, we find the most recent LSN via a binary chop search, this gives us an associated tail LSN. A scan backwards from the head LSN is then done - this covers the total possible amount of in flight data (maximum log buffers x maximum log buffer size). If any of the sectors has the wrong LSN in the first word, then it an all following data is discarded from replay. Of course, we will also not replay any journal entry for which we do not find the transaction commit record. Now, this protects against some failure cases, it assumes that sector writes are atomic, they either happen or they do not happen. If sector writes are not atomic and one end can be good with the other is bad, then a partial sector is possibly going to get replayed. There have been discussions about doing this with the head and tail of each sector, or using a checksum instead. XFS on linux has had power cycle crash testing, but there is no way you can cover all possible hardware configurations, and I seem to recall some hardware never recovered from this testing, by that I mean the PC did not survive the continual power cycling and went up in smoke. Steve