linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@turbolinux.com>
To: "Albert D. Cahalan" <acahalan@cs.uml.edu>
Cc: Ben LaHaise <bcrl@redhat.com>,
	Ragnar Kjxrstad <kernel@ragnark.vestdata.no>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	mike@bigstorage.com, kevin@bigstorage.com, linux-lvm@sistina.com
Subject: Re: [PATCH] 64 bit scsi read/write
Date: Fri, 13 Jul 2001 14:41:52 -0600 (MDT)	[thread overview]
Message-ID: <200107132041.f6DKfqM8013404@webber.adilger.int> (raw)
In-Reply-To: <200107131820.f6DIKvg190902@saturn.cs.uml.edu> "from Albert D. Cahalan at Jul 13, 2001 02:20:57 pm"

Albert writes:
> How does can any of this even work?
> 
> Say I have N disks, mirrored, or maybe with parity. I'm trying
> to have a reliable system. I change a file. The write goes out
> to my disks, and power is lost. Some number M, such that 0<M<N,
> of the disks are written before the power loss. The rest of the
> disks don't complete the write. Maybe worse, this is more than
> one sector, and some disks have partial writes.
> 
> Doesn't RAID need a journal or the phase-tree algorithm?
> How does one tell what data is old and what data is new?

Yes, RAID should have a journal or other ordering enforcement, but
it really isn't any worse in this regard than a single disk.  Even
on a single disk you don't have any guarantees of data ordering, so
if you change the file and the power is lost, some of the sectors
will make it to disk and some will not => fsck, with possible data
corrpution or loss.

That's why the journaled filesystems have multi-stage commit of I/O,
first to the journal and then to the disk, so no chance of corruption
of the metadata, and if you journal data also, then the data cannot
be corrupted (but some may be lost).

RAID 5 throws a wrench into this by not guaranteeing that all of the
blocks in a stripe are consistent (you don't know which blocks and/or
parity were written and which not).  Ideally, you want a multi-stage
commit for RAID as well, so that you write the data first, and the
parity afterwards (so on reboot you trust the data first, and not the
parity).  You have a problem if there is a bad disk and you crash.

With a data-journaled fs you don't care what RAID does because the fs
journal knows which transactions were in progress.  If an I/O was being
written into the journal and did not complete, it is discarded.  If it
was written into the journal and did not finish the write into the fs,
it will re-write it on recovery.  In both cases you don't care if the
RAID finished the write or not.

Note that for LVM (the original topic), it does NOT do any RAID stuff
at all, it is just a virtually contiguous disk, made up of one or more
real disks (or stacked on top of RAID).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

  reply	other threads:[~2001-07-13 20:43 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-07-01  4:53 [RFC][PATCH] first cut 64 bit block support Ben LaHaise
2001-07-03  4:53 ` Ragnar Kjørstad
2001-07-04  2:19   ` [PATCH] 64 bit scsi read/write Ben LaHaise
2001-07-04  7:11     ` Alan Cox
2001-07-05  6:34     ` Ragnar Kjørstad
2001-07-05  7:35       ` Ben LaHaise
2001-07-13 18:20         ` Albert D. Cahalan
2001-07-13 20:41           ` Andreas Dilger [this message]
2001-07-13 21:07             ` Chris Wedgwood
2001-07-13 22:04               ` Andreas Dilger
2001-07-14  0:49                 ` Jonathan Lundell
2001-07-14 12:27                 ` Paul Jakma
2001-07-14 14:48                   ` Chris Wedgwood
2001-07-14 15:42                     ` Paul Jakma
2001-07-14 17:18                       ` Chris Wedgwood
2001-07-20 17:03                       ` Stephen C. Tweedie
2001-07-16 18:53                   ` Andreas Dilger
2001-07-16 19:13                     ` Ragnar Kjørstad
2001-07-13 21:14             ` Alan Cox
2001-07-14  3:23               ` Andrew Morton
2001-07-14  8:45                 ` Alan Cox
2001-07-14 14:50                   ` Chris Wedgwood
2001-07-14 20:11                     ` Daniel Phillips
2001-07-15  1:21                       ` Andrew Morton
2001-07-15  1:53                         ` Daniel Phillips
2001-07-15  3:36                       ` Chris Wedgwood
2001-07-15  6:05                         ` John Alvord
2001-07-15  6:07                           ` Chris Wedgwood
2001-07-15 13:16                             ` Ken Hirsch
2001-07-15 14:50                               ` Chris Wedgwood
2001-07-15 22:14                               ` Daniel Phillips
2001-07-17  0:31                             ` Juan Quintela
2001-07-15 13:44                         ` Daniel Phillips
2001-07-15 14:39                           ` Chris Wedgwood
2001-07-15 15:32                             ` Alan Cox
2001-07-15 15:33                               ` Chris Wedgwood
2001-07-15 16:24                               ` Chris Wedgwood
2001-07-15 15:06                           ` Jonathan Lundell
2001-07-15 15:22                             ` Chris Wedgwood
2001-07-15 17:44                             ` Jonathan Lundell
2001-07-15 17:47                             ` Justin T. Gibbs
2001-07-15 23:14                               ` Rod Van Meter
2001-07-16  0:37                                 ` Jonathan Lundell
2001-07-16 15:11                                   ` Rod Van Meter
2001-07-16  8:56                               ` Chris Wedgwood
2001-07-16 13:19                                 ` Daniel Phillips
2001-07-16  1:08                           ` Albert D. Cahalan
2001-07-16  8:49                             ` Chris Wedgwood
2001-07-21 19:18                             ` Alexander Griesser
2001-07-22  3:52                               ` Albert D. Cahalan
2001-07-23 14:41                                 ` Daniel Phillips
2001-07-24  4:29                                   ` Albert D. Cahalan
2001-07-24 11:45                                     ` Daniel Phillips
2001-07-14 15:41                   ` Jonathan Lundell
2001-07-14 17:00                     ` Chris Wedgwood
2001-07-14 17:33                   ` Jonathan Lundell
2001-07-15  4:02                     ` Chris Wedgwood
2001-07-15  5:46                     ` Jonathan Lundell
2001-07-15 17:10                   ` Chris Wedgwood
2001-07-15 17:39                   ` Jonathan Lundell
2001-07-26  2:18     ` Ragnar Kjørstad
2001-07-26 16:24       ` Andreas Dilger
2001-08-10 19:42       ` Ben LaHaise
2001-08-10 19:51       ` Ragnar Kjørstad
2001-08-10 20:02         ` Ben LaHaise
2001-08-11  0:18           ` Steve Lord
2001-08-11 21:44       ` Matti Aarnio
2001-07-04 10:16 ` [RFC][PATCH] first cut 64 bit block support Chris Wedgwood
2001-07-04 16:59   ` Ben LaHaise
2001-07-14 15:08 [PATCH] 64 bit scsi read/write Ed Tomlinson
2001-07-19  7:35 [PATCH] 64 bit SCSI read/write Andre Hedrick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200107132041.f6DKfqM8013404@webber.adilger.int \
    --to=adilger@turbolinux.com \
    --cc=acahalan@cs.uml.edu \
    --cc=bcrl@redhat.com \
    --cc=kernel@ragnark.vestdata.no \
    --cc=kevin@bigstorage.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-lvm@sistina.com \
    --cc=mike@bigstorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).