All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
To: Theodore Tso <tytso@mit.edu>
Cc: Pavel Machek <pavel@suse.cz>, Chris Friesen <cfriesen@nortel.com>,
	clock@atrey.karlin.mff.cuni.cz,
	kernel list <linux-kernel@vger.kernel.org>,
	aviro@redhat.com
Subject: Re: writing file to disk: not as easy as it looks
Date: Wed, 3 Dec 2008 16:34:18 +0100 (CET)	[thread overview]
Message-ID: <Pine.LNX.4.64.0812031613320.5406@artax.karlin.mff.cuni.cz> (raw)
In-Reply-To: <20081203050709.GL20858@mit.edu>



On Wed, 3 Dec 2008, Theodore Tso wrote:

> > Ok, "memory failed before disk" is ... bad hardware.
> 
> It's PC class hardware. Live with it.  Back when SGI made their own
> hardware, they noticed this problem, and so they wired up their SGI
> machines with powerfail interrupts, and extra big capacitors in their
> power supplies, and when Irix got a powerfail interrupt, it would
> frantically run around aborting DMA transfers to avoid this particular
> problem.  At least, that's what an old-timer SGI engineer (who is
> unfortunately no longer at SGI) told me.

I heard this too --- I just don't understand why did they route it to an 
interrupt and undertook the complicated sequence of aborting the commands 
by the kernel --- instead of simply routing it to PCI reset line --- that 
would reset the controller and stop it from feeding data to disks.

Also, if they had ECC memory, the chipset should detect unrecoverable 
garbage and respond with target-abort or full system reset and not feed 
bad data to the controller.

> PC class hardware don't have power fail interrupts.  Hence, my advice
> to you is that if you use a filesystem that does logical journalling
> --- better have a UPS.

ATX has PWR_OK pin that should be deasserted on power failure before the 
voltage drops.

I don't know if motherboards use it --- but there should be no problem 
routing the pin to the chipset reset and stop it before power goes low.

> > ...but... you seem to be saying that modern filesystems can damage 
> > data even on "sane" hardware.
> 
> The example I gave was one where a disk failure could cause a file
> that had previously been sucessfully written to disk and fsync()'ed to
> be damaged by another filesystem operation ***in the face of hard
> drive failure***.  Surely that is obvious.  The most obvious case of
> that might be if the disk controller gets confused and slams a data
> block into the wrong location on disk (there's a reason why DIF
> includes the sector number in its checksum and why some enterprise
> databases do the same thing in their tablespace blocks --- it happens
> often enough that paranoid data integrity engineers worry about it).

You can read the block number back from ATA disk after you write it and 
before you submit the command.

Mikulas

  parent reply	other threads:[~2008-12-03 15:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-02  9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26   ` Pavel Machek
2008-12-02 16:37     ` Theodore Tso
2008-12-02 17:22       ` Chris Friesen
2008-12-02 20:55         ` Theodore Tso
2008-12-02 22:44           ` Pavel Machek
2008-12-02 22:50             ` Pavel Machek
2008-12-03  5:07             ` Theodore Tso
2008-12-03  8:46               ` Pavel Machek
2008-12-03 15:50                 ` Mikulas Patocka
2008-12-03 15:54                   ` Alan Cox
2008-12-03 17:37                     ` Mikulas Patocka
2008-12-03 17:52                       ` Alan Cox
2008-12-03 18:16                       ` Pavel Machek
2008-12-03 18:33                         ` Mikulas Patocka
2008-12-03 16:42                 ` Theodore Tso
2008-12-03 17:43                   ` Mikulas Patocka
2008-12-03 18:26                     ` Pavel Machek
2008-12-03 15:34               ` Mikulas Patocka [this message]
2008-12-15 10:24               ` [patch] " Pavel Machek
2008-12-15 11:03           ` Pavel Machek
2008-12-15 20:08             ` Folkert van Heusden
2008-12-02 19:10       ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0812031613320.5406@artax.karlin.mff.cuni.cz \
    --to=mikulas@artax.karlin.mff.cuni.cz \
    --cc=aviro@redhat.com \
    --cc=cfriesen@nortel.com \
    --cc=clock@atrey.karlin.mff.cuni.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.