linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:04 Are linux-fs's drive-fault-tolerant by concept? Stephan von Krawczynski
@ 2003-04-19 15:29 ` Alan Cox
  2003-04-19 17:00   ` Stephan von Krawczynski
  2003-04-19 21:13   ` Jos Hulzink
  2003-04-19 16:22 ` John Bradford
       [not found] ` <20030419161011$0136@gated-at.bofh.it>
  2 siblings, 2 replies; 74+ messages in thread
From: Alan Cox @ 2003-04-19 15:29 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-19 at 17:04, Stephan von Krawczynski wrote:
> after shooting down one of this bloody cute new very-big-and-poor IDE drives
> today I wonder whether it would be a good idea to give the linux-fs (namely my
> preferred reiser and ext2 :-) some fault-tolerance. I remember there have been
> some discussions along this issue some time ago and I guess remembering that it
> was decided against because it should be the drivers issue to give the fs a
> clean space to live, right?
 
Sometimes disks just go bang. They seem to do it distressingly more
often nowdays which (while handy for criminals and pirates) is annoying
for the rest of us. Putting magic in the file system to handle this is
hard to do well, and at best you get things like ext2/ext3 have now -
the ability to recover data in the event of some corruption, unless you
get into really fancy stff.

Buy IDE disks in pairs use md1, and remember to continually send the
hosed ones back to the vendor/shop (and if they keep appearing DOA to
your local trading standards/fair trading type bodies).

Perhaps someone should also start a scoreboard for people to report dead
IDE drives by vendor ;)

Alan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-19 16:04 Stephan von Krawczynski
  2003-04-19 15:29 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-19 16:04 UTC (permalink / raw)
  To: linux-kernel

Hello all,

after shooting down one of this bloody cute new very-big-and-poor IDE drives
today I wonder whether it would be a good idea to give the linux-fs (namely my
preferred reiser and ext2 :-) some fault-tolerance. I remember there have been
some discussions along this issue some time ago and I guess remembering that it
was decided against because it should be the drivers issue to give the fs a
clean space to live, right? 
Unfortunately todays' reality seems to have gotten a lot worse comparing to one
year ago. I cannot remember a lot of failed drives back then, but today about
20% seemed to be already shipped DOA. Most I came across have only small
problems (few dead sectors), but they seemed to produce quite a lot of trouble 
- at least on my 3ware in non-RAID setup the box partly dies away because
reiser feels quite unhappy about the non-recoverable disk-errors.
I know this question can get religious, but to name my only point: wouldn't it
be a good defensive programming style _not_ to rely on proven-to-be-unreliable
hardware manufacturers. Thing is: you cannot prevent buying bad hardware these
days, because just about every manufacturer already sold bad apples ...

Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:04 Are linux-fs's drive-fault-tolerant by concept? Stephan von Krawczynski
  2003-04-19 15:29 ` Alan Cox
@ 2003-04-19 16:22 ` John Bradford
  2003-04-19 16:36   ` Russell King
                     ` (3 more replies)
       [not found] ` <20030419161011$0136@gated-at.bofh.it>
  2 siblings, 4 replies; 74+ messages in thread
From: John Bradford @ 2003-04-19 16:22 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

> I wonder whether it would be a good idea to give the linux-fs
> (namely my preferred reiser and ext2 :-) some fault-tolerance.

Fault tollerance should be done at a lower level than the filesystem.

Linux filesystems are used on many different devices, not just hard
disks.  Different devices can fail in different ways - a disk might
have a lot of bad sectors in a block, a tape[1] might have a single
track which becomes unreadble, and solid state devices might have get
a few random bits flipped all over them, if a charged particle passes
through them.

The filesystem doesn't know or care what device it is stored on, and
therefore shouldn't try to predict likely failiures.

A RAID-0 array and regular backups are the best way to protect your
data.

[1] Although it is uncommon to use a tape as a block device, you never
know.  It's certainly possible, (not necessarily with Linux).

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:22 ` John Bradford
@ 2003-04-19 16:36   ` Russell King
  2003-04-19 16:45     ` John Bradford
  2003-04-19 16:52   ` Stephan von Krawczynski
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 74+ messages in thread
From: Russell King @ 2003-04-19 16:36 UTC (permalink / raw)
  To: John Bradford; +Cc: Stephan von Krawczynski, linux-kernel

On Sat, Apr 19, 2003 at 05:22:18PM +0100, John Bradford wrote:
> A RAID-0 array and regular backups are the best way to protect your
> data.

Correction.  RAID-0 is the best way to loose your data.  If any device
containing any part of the array goes down, you loose at least some of
your data.

RAID-1 is the redundant raid level, where each device in the set
contains a duplicate of the other device(s).

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:36   ` Russell King
@ 2003-04-19 16:45     ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-19 16:45 UTC (permalink / raw)
  To: Russell King; +Cc: John Bradford, Stephan von Krawczynski, linux-kernel

> > A RAID-0 array and regular backups are the best way to protect your
> > data.
> 
> Correction.  RAID-0 is the best way to loose your data.  If any device
> containing any part of the array goes down, you loose at least some of
> your data.
> 
> RAID-1 is the redundant raid level, where each device in the set
> contains a duplicate of the other device(s).

Yes, sorry about that, I was being stupid again :-).

I meant a mirrored array.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:22 ` John Bradford
  2003-04-19 16:36   ` Russell King
@ 2003-04-19 16:52   ` Stephan von Krawczynski
  2003-04-19 20:04     ` John Bradford
                       ` (2 more replies)
  2003-04-19 17:54   ` Felipe Alfaro Solana
  2003-04-25  0:07   ` Stewart Smith
  3 siblings, 3 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-19 16:52 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

On Sat, 19 Apr 2003 17:22:18 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > I wonder whether it would be a good idea to give the linux-fs
> > (namely my preferred reiser and ext2 :-) some fault-tolerance.
> 
> Fault tollerance should be done at a lower level than the filesystem.

I know it _should_ to live in a nice and easy world. Unfortunately real life is
different. The simple question is: you have tons of low-level drivers for all
kinds of storage media, but you have comparably few filesystems. To me this
sound like the preferred place for this type of behaviour can be fs, because
all drivers inherit the feature if it lives in fs.

> Linux filesystems are used on many different devices, not just hard
> disks.  Different devices can fail in different ways - a disk might
> have a lot of bad sectors in a block, a tape[1] might have a single
> track which becomes unreadble, and solid state devices might have get
> a few random bits flipped all over them, if a charged particle passes
> through them.
> 
> The filesystem doesn't know or care what device it is stored on, and
> therefore shouldn't try to predict likely failiures.

It should not predict failures, it should possibly only say: "ok, driver told
me the block I just wanted to write has an error, so lets try a different one
and mark this block in my personal bad-block list as unusable. This does not
sound all-too complex. There is a free-block-list anyway...

> A RAID-0 array and regular backups are the best way to protect your
> data.

RAID-1 obviously ;-)

> [1] Although it is uncommon to use a tape as a block device, you never
> know.  It's certainly possible, (not necessarily with Linux).

>From the fs point of view it makes no difference living on disk or tape or a
tesa-strip.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 15:29 ` Alan Cox
@ 2003-04-19 17:00   ` Stephan von Krawczynski
  2003-04-19 22:04     ` Alan Cox
  2003-04-20 13:59     ` John Bradford
  2003-04-19 21:13   ` Jos Hulzink
  1 sibling, 2 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-19 17:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On 19 Apr 2003 16:29:36 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Sad, 2003-04-19 at 17:04, Stephan von Krawczynski wrote:
> > after shooting down one of this bloody cute new very-big-and-poor IDE
> > drives today I wonder whether it would be a good idea to give the linux-fs
> > (namely my preferred reiser and ext2 :-) some fault-tolerance. I remember
> > there have been some discussions along this issue some time ago and I guess
> > remembering that it was decided against because it should be the drivers
> > issue to give the fs a clean space to live, right?
>  
> Sometimes disks just go bang. They seem to do it distressingly more
> often nowdays which (while handy for criminals and pirates) is annoying
> for the rest of us. Putting magic in the file system to handle this is
> hard to do well, and at best you get things like ext2/ext3 have now -
> the ability to recover data in the event of some corruption, unless you
> get into really fancy stff.

Ok, you mean active error-recovery on reading. My basic point is the writing
case. A simple handling of write-errors from the drivers level and a retry to
write on a different location could help a lot I guess.

> Buy IDE disks in pairs use md1, and remember to continually send the
> hosed ones back to the vendor/shop (and if they keep appearing DOA to
> your local trading standards/fair trading type bodies).

Just to give some numbers: from 25 disk I bought during last half year 16 have
gone dead within the first month. This is ridiculous. Of course they are all
returned and guarantee-replaced, but it gets on ones nerves to continously
replace disks, the rate could be lowered if one could use them at least 4
months (or upto a deadline number of bad blocks mapped by the fs - still
guarantee but fewer replacement cycles).

> Perhaps someone should also start a scoreboard for people to report dead
> IDE drives by vendor ;)

I sure have contribution to it.

> Alan

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
       [not found] ` <20030419161011$0136@gated-at.bofh.it>
@ 2003-04-19 17:18   ` Florian Weimer
  2003-04-19 18:07     ` Stephan von Krawczynski
                       ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Florian Weimer @ 2003-04-19 17:18 UTC (permalink / raw)
  To: linux-kernel

Stephan von Krawczynski <skraw@ithnet.com> writes:

> Most I came across have only small problems (few dead sectors),

IDE disks automatically remap defective sectors, so you won't see any
of them unless the disk is already quite broken.

Some disks (notably the IBM DTLA series) cannot deal with sudden power
failures during write operators.  In such a case, the sector has an
incorrect checksum and cannot be read until after the next write.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:22 ` John Bradford
  2003-04-19 16:36   ` Russell King
  2003-04-19 16:52   ` Stephan von Krawczynski
@ 2003-04-19 17:54   ` Felipe Alfaro Solana
  2003-04-25  0:07   ` Stewart Smith
  3 siblings, 0 replies; 74+ messages in thread
From: Felipe Alfaro Solana @ 2003-04-19 17:54 UTC (permalink / raw)
  To: John Bradford; +Cc: Stephan von Krawczynski, linux-kernel

On Sat, 2003-04-19 at 18:22, John Bradford wrote:
> A RAID-0 array and regular backups are the best way to protect your
> data.

I assume you meant a RAID-1 or RAID-10 array ;-)
-- 
Please AVOID sending me WORD, EXCEL or POWERPOINT attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
Linux Registered User #287198


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 17:18   ` Florian Weimer
@ 2003-04-19 18:07     ` Stephan von Krawczynski
  2003-04-19 18:41       ` Dr. David Alan Gilbert
  2003-04-19 22:02     ` Alan Cox
  2003-04-25  0:11     ` Stewart Smith
  2 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-19 18:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-kernel, Alan Cox

On Sat, 19 Apr 2003 19:18:56 +0200
Florian Weimer <fw@deneb.enyo.de> wrote:

> Stephan von Krawczynski <skraw@ithnet.com> writes:
> 
> > Most I came across have only small problems (few dead sectors),
> 
> IDE disks automatically remap defective sectors, so you won't see any
> of them unless the disk is already quite broken.

One year ago I thought basically the same, just to give you some info on todays' case (on 2.4.21-pre7-ac1):

Apr 18 22:08:53 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x4, unit #0. 
Apr 18 22:08:57 admin kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #0.
Apr 18 22:08:57 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 18 22:08:58 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 18 22:10:11 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x4b, unit #0.
Apr 18 22:10:13 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 18 22:11:20 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x58, unit #0. 
Apr 18 22:11:23 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 18 23:11:27 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 18 23:11:27 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 18 23:11:31 admin kernel: 3w-xxxx: scsi2: Reset succeeded.

Apr 19 00:15:47 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:15:47 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:15:48 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 19 00:16:03 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 19 00:16:07 admin kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #0.
Apr 19 00:16:09 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 19 00:16:26 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xcb, flags = 0x37, unit #1.
Apr 19 00:16:26 admin kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #0.
Apr 19 00:16:26 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xcb, flags = 0x37, unit #1.
Apr 19 00:16:26 admin kernel:  I/O error: dev 08:21, sector 125092104
Apr 19 00:16:26 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xcb, flags = 0x37, unit #1.
Apr 19 00:16:26 admin kernel:  I/O error: dev 08:21, sector 125092104
Apr 19 00:28:06 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x24, unit #0.
Apr 19 00:28:10 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 19 00:28:36 admin kernel: 3w-xxxx: scsi2: Unit #0: Command (f7419c00) timed out, resetting card.
Apr 19 00:28:43 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 19 00:56:23 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 19 00:56:23 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x4, unit #0. 
Apr 19 00:56:23 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x9, unit #0. 
Apr 19 00:56:23 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 19 00:56:23 admin last message repeated 2 times

Apr 19 00:56:27 admin kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #0.
Apr 19 00:56:27 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0.
Apr 19 00:56:54 admin kernel: 3w-xxxx: scsi2: Unit #0: Command (f7415200) timed out, resetting card.
Apr 19 00:56:54 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #0. 
Apr 19 00:56:56 admin kernel: 3w-xxxx: scsi2: Reset succeeded.
Apr 19 00:57:30 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:57:34 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.
Apr 19 00:57:59 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:58:03 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.
Apr 19 00:58:29 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:58:32 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.
Apr 19 00:58:58 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:59:02 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.
Apr 19 00:59:27 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.
Apr 19 00:59:31 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.
Apr 19 00:59:56 admin kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x1b, unit #0.

And then reiserfs is going mad:

Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 53320
Apr 19 00:59:56 admin kernel: journal-601, buffer write failed   
Apr 19 00:59:56 admin kernel: kernel BUG at prints.c:334!
Apr 19 00:59:56 admin kernel: invalid operand: 0000
Apr 19 00:59:56 admin kernel: CPU:    1
Apr 19 00:59:56 admin kernel: EIP:    0010:[reiserfs_panic+56/112]    Not tainted
Apr 19 00:59:56 admin kernel: EIP:    0010:[<c0188128>]    Not tainted
Apr 19 00:59:56 admin kernel: EFLAGS: 00010282
Apr 19 00:59:56 admin kernel: eax: 00000024   ebx: f6ce8c00   ecx: 00000001   edx: c02cb6cc
Apr 19 00:59:56 admin kernel: esi: 00000000   edi: f6ce8c00   ebp: 00000006   esp: c34f5eb8
Apr 19 00:59:56 admin kernel: ds: 0018   es: 0018   ss: 0018
Apr 19 00:59:56 admin kernel: Process kupdated (pid: 9, stackpage=c34f5000)
Apr 19 00:59:56 admin kernel: Stack: c029a58c c036c5c0 f6ce8c00 f8c136b4 c019352a f6ce8c00 c02a3220 00001000
Apr 19 00:59:56 admin kernel:        e32965c0 00000009 00000007 00000000 da2b6c80 00000000 00000014 dde3b000
Apr 19 00:59:56 admin kernel:        00000004 c01976a1 f6ce8c00 f8c136b4 00000001 00000006 f8c1c3c4 00000004
Apr 19 00:59:56 admin kernel: Call Trace:    [flush_commit_list+714/1104] [do_journal_end+1649/2976] [flush_old_commits+292/448] [reiserfs_write_super+112/128] [syn
Apr 19 00:59:56 admin kernel: Call Trace:    [<c019352a>] [<c01976a1>] [<c01968b4>] [<c0184e40>] [<c014894c>]
Apr 19 00:59:56 admin kernel:   [sync_old_buffers+60/176] [kupdate+253/320] [rest_init+0/96] [rest_init+0/96] [arch_kernel_thread+46/64] [kupdate+0/320]
Apr 19 00:59:56 admin kernel:   [<c01479ac>] [<c0147d1d>] [<c0105000>] [<c0105000>] [<c010581e>] [<c0147c20>]
Apr 19 00:59:56 admin kernel: 

Apr 19 00:59:56 admin kernel: Code: 0f 0b 4e 01 58 d4 29 c0 85 db 74 0e 0f b7 43 08 89 04 24 e8
Apr 19 00:59:56 admin kernel:  SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2   
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285225664
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285225672
Apr 19 00:59:56 admin kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226176
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226184
Apr 19 00:59:56 admin kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285225920
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285225928
Apr 19 00:59:56 admin kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226432
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226440
Apr 19 00:59:56 admin kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226688
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226696
Apr 19 00:59:56 admin kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 2
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226944
Apr 19 00:59:56 admin kernel:  I/O error: dev 08:11, sector 285226952
Apr 19 01:00:00 admin kernel: 3w-xxxx: scsi2: AEN: WARNING: ATA port timeout: Port #0.

Things turn out a bit more complicated as you may notice ...

Regards,
Stephan



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:07     ` Stephan von Krawczynski
@ 2003-04-19 18:41       ` Dr. David Alan Gilbert
  2003-04-19 20:56         ` Helge Hafting
                           ` (4 more replies)
  0 siblings, 5 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2003-04-19 18:41 UTC (permalink / raw)
  To: linux-kernel

Hi,
  Besides the problem that most drive manufacturers now seem to use
cheese as the data storage surface, I think there are some other
problems:

  1) I don't trust drive firmware.
	2) I don't think all drives are set to remap sectors by default.
	3) I don't believe that all drivers recover neatly from a drive error.
	4) It is OK saying return the drive and get a new one - but many of
	   us can't do this in a commercial environment where the contents of
		 the drive are confidential - leading to stacks of dead drives
		 (often many inside their now short warranty periods).
		 
To be fair I'm not sure if it is only the drive firmware I don't trust -
it could be the controllers and the IDE drivers as well - I don't know.

While RAID works well for drives that just go pop and die, for drives
with dodgy firmware we just sit there and watch the filesystems decay.
I don't think the kernel can do much about that - but it is a sad state.

I'd find two things useful in this respect:
  1) A tool to check the consistency of a RAID - presuming I shut my
	RAID down safely I should actually be able to use the redundant
	information to test it; this should reveal corruption early.
	(Perhaps the kernel could check a few sectors a second in the
	background)

	2) A disc exerciser - something that I can use to see if this drive,
	connected to this controller, on this motherboard on this kernel
	actually works and keeps its data safe before I put it into live
	service.

Dave (After a few weeks of fighting pissy IDE hard drives)

 ---------------- Have a happy GNU millennium! ----------------------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:52   ` Stephan von Krawczynski
@ 2003-04-19 20:04     ` John Bradford
  2003-04-19 20:33       ` Andreas Dilger
  2003-04-19 20:38       ` Stephan von Krawczynski
  2003-04-19 20:05     ` John Bradford
  2003-04-19 23:13     ` Arnaldo Carvalho de Melo
  2 siblings, 2 replies; 74+ messages in thread
From: John Bradford @ 2003-04-19 20:04 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, linux-kernel

> > > I wonder whether it would be a good idea to give the linux-fs
> > > (namely my preferred reiser and ext2 :-) some fault-tolerance.
> > 
> > Fault tollerance should be done at a lower level than the filesystem.
> 
> I know it _should_ to live in a nice and easy world. Unfortunately
> real life is different. The simple question is: you have tons of
> low-level drivers for all kinds of storage media, but you have
> comparably few filesystems. To me this sound like the preferred
> place for this type of behaviour can be fs, because all drivers
> inherit the feature if it lives in fs.

The sort of feature you are describing would really belong in a
separate layer, somewhat analogous to the MD driver, but for defect
management.  You could create a virtual block device that has 90% of
the capacity of the real block device, then allocte spare blocks from
the real device if and when blocks failed.

> > Linux filesystems are used on many different devices, not just hard
> > disks.  Different devices can fail in different ways - a disk might
> > have a lot of bad sectors in a block, a tape[1] might have a single
> > track which becomes unreadble, and solid state devices might have get
> > a few random bits flipped all over them, if a charged particle passes
> > through them.
> > 
> > The filesystem doesn't know or care what device it is stored on, and
> > therefore shouldn't try to predict likely failiures.
> 
> It should not predict failures, it should possibly only say: "ok, driver told
> me the block I just wanted to write has an error, so lets try a different one
> and mark this block in my personal bad-block list as unusable. This does not
> sound all-too complex. There is a free-block-list anyway...

Modern disk devices typically already do this kind of defect management.

> > A RAID-0 array and regular backups are the best way to protect your
> > data.
> 
> RAID-1 obviously ;-)

Obviously :-).

> > [1] Although it is uncommon to use a tape as a block device, you never
> > know.  It's certainly possible, (not necessarily with Linux).
> 
> From the fs point of view it makes no difference living on disk or tape or a
> tesa-strip.

It does if you are trying to avoid likely failiures, which is what I
was originally thinking about.

Imagine a four-track tape.  It would be valid reasoning to store
important data such as the directory structure on the two inner
tracks, on the basis that the outer tracks were more susceptable to
edge damage.  If blocks are allocated on all tracks at the same time,
that might mean storing the directory on blocks 2,3,6,7,10,11, etc.
Such a layout wouldn't be very useful on a disk where blocks 1-8 were
written on one track, and 9-16 on the next, because the directory
would needlessly span two tracks.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:52   ` Stephan von Krawczynski
  2003-04-19 20:04     ` John Bradford
@ 2003-04-19 20:05     ` John Bradford
  2003-04-19 23:13     ` Arnaldo Carvalho de Melo
  2 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-19 20:05 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, linux-kernel

> > > I wonder whether it would be a good idea to give the linux-fs
> > > (namely my preferred reiser and ext2 :-) some fault-tolerance.
> > 
> > Fault tollerance should be done at a lower level than the filesystem.
> 
> I know it _should_ to live in a nice and easy world. Unfortunately
> real life is different. The simple question is: you have tons of
> low-level drivers for all kinds of storage media, but you have
> comparably few filesystems. To me this sound like the preferred
> place for this type of behaviour can be fs, because all drivers
> inherit the feature if it lives in fs.

Unless you write a tar archive to the raw device :-)

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 20:04     ` John Bradford
@ 2003-04-19 20:33       ` Andreas Dilger
  2003-04-21  9:25         ` Denis Vlasenko
  2003-04-19 20:38       ` Stephan von Krawczynski
  1 sibling, 1 reply; 74+ messages in thread
From: Andreas Dilger @ 2003-04-19 20:33 UTC (permalink / raw)
  To: John Bradford; +Cc: Stephan von Krawczynski, linux-kernel

On Apr 19, 2003  21:04 +0100, John Bradford wrote:
> > > > I wonder whether it would be a good idea to give the linux-fs
> > > > (namely my preferred reiser and ext2 :-) some fault-tolerance.

I'm not against this in principle, but in practise it is almost useless.
Modern disk drives do bad sector remapping at write time, so unless something
is terribly wrong you will never see a write error (which is exactly the time
that the filesystem could do such remapping).  Normally, you will only see
an error like this at read time, at which point it is too late to fix.

If you do an fsck, it would normally detect the read error and try to write
back the repaired data, and cause the device to do remapping.  It will not
normally be possible to regenerate metadata with anything less than a full
fsck (if at all).

> > > Fault tollerance should be done at a lower level than the filesystem.
> > 
> > I know it _should_ to live in a nice and easy world. Unfortunately
> > real life is different. The simple question is: you have tons of
> > low-level drivers for all kinds of storage media, but you have
> > comparably few filesystems. To me this sound like the preferred
> > place for this type of behaviour can be fs, because all drivers
> > inherit the feature if it lives in fs.
> 
> The sort of feature you are describing would really belong in a
> separate layer, somewhat analogous to the MD driver, but for defect
> management.  You could create a virtual block device that has 90% of
> the capacity of the real block device, then allocte spare blocks from
> the real device if and when blocks failed.

Hmm, like the "bad blocks relocation" plugin for EVMS?

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 20:04     ` John Bradford
  2003-04-19 20:33       ` Andreas Dilger
@ 2003-04-19 20:38       ` Stephan von Krawczynski
  2003-04-20 14:21         ` John Bradford
  1 sibling, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-19 20:38 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

On Sat, 19 Apr 2003 21:04:28 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > > > I wonder whether it would be a good idea to give the linux-fs
> > > > (namely my preferred reiser and ext2 :-) some fault-tolerance.
> > > 
> > > Fault tollerance should be done at a lower level than the filesystem.
> > 
> > I know it _should_ to live in a nice and easy world. Unfortunately
> > real life is different. The simple question is: you have tons of
> > low-level drivers for all kinds of storage media, but you have
> > comparably few filesystems. To me this sound like the preferred
> > place for this type of behaviour can be fs, because all drivers
> > inherit the feature if it lives in fs.
> 
> The sort of feature you are describing would really belong in a
> separate layer, somewhat analogous to the MD driver, but for defect
> management.  You could create a virtual block device that has 90% of
> the capacity of the real block device, then allocte spare blocks from
> the real device if and when blocks failed.

Well, of all work-arounds for the problem this one is probably the worst: it
wastes space on good drives and runs out of space for sure on bad ones.
What is so bad about the simple way: the one who wants to write (e.g. fs) and
knows _where_ to write simply uses another newly allocated block and dumps the
old one on a blacklist. The blacklist only for being able to count them (or get
the sector-numbers) in case you are interested. If you weren't you might as
well mark them allocated and that's it (which I would presume a _bad_ idea). If
there are no free blocks left, well, then the medium is full. And that is just
about the only cause for a write error then (if the medium is writeable at
all).
Don't make the thing bigger than it really is...

> 
> > > The filesystem doesn't know or care what device it is stored on, and
> > > therefore shouldn't try to predict likely failiures.
> > 
> > It should not predict failures, it should possibly only say: "ok, driver
> > told me the block I just wanted to write has an error, so lets try a
> > different one and mark this block in my personal bad-block list as
> > unusable. This does not sound all-too complex. There is a free-block-list
> > anyway...
> 
> Modern disk devices typically already do this kind of defect management.

_should_ do, or do you know for sure?
Take real life: it is near midnight and your ide hd has just filled its last
available bad-block-mapping. The next write comes in and your fs goes boom. You
notice next morning and work is waiting for you. Can you mount it again? Is
your data being corrupted? Well, you need some luck ...
On the other hand, you could have been informed by mail that your ide-drives'
fs has just mapped another bad block and the total number of bad blocks just
reached 4 percent of the available space on the drive.
Why do you trust hd manufacturing sites in Malaysia or Hungary or whereever,
when you can make _sure_ yourself?
I don't get it, sorry.


> > > [1] Although it is uncommon to use a tape as a block device, you never
> > > know.  It's certainly possible, (not necessarily with Linux).
> > 
> > From the fs point of view it makes no difference living on disk or tape or
> > a tesa-strip.
> 
> It does if you are trying to avoid likely failiures, which is what I
> was originally thinking about.

Well, this is for sure nothing for a fs, you are right. But lets fix the _easy_
things first ...


-- 
Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:41       ` Dr. David Alan Gilbert
@ 2003-04-19 20:56         ` Helge Hafting
  2003-04-19 21:15           ` Valdis.Kletnieks
  2003-04-19 21:57         ` Alan Cox
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 74+ messages in thread
From: Helge Hafting @ 2003-04-19 20:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-kernel

On Sat, Apr 19, 2003 at 07:41:20PM +0100, Dr. David Alan Gilbert wrote:
> 	4) It is OK saying return the drive and get a new one - but many of
> 	   us can't do this in a commercial environment where the contents of
> 		 the drive are confidential - leading to stacks of dead drives
> 		 (often many inside their now short warranty periods).

There are commercially available programs that guarantees to
wipe your drive clean - including hidden areas and remapped
sectors.  You should then be able to send drives
back for warranty replacement.

There are also bulk erasers that reset every bit magnetically,
but those will probably void the warranty too.  (You'll
need a low-level reformat to recreate sector addresses on the
suddenly blank surface.)


Helge Hafting  		 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 15:29 ` Alan Cox
  2003-04-19 17:00   ` Stephan von Krawczynski
@ 2003-04-19 21:13   ` Jos Hulzink
  2003-04-20 16:07     ` Stephan von Krawczynski
  1 sibling, 1 reply; 74+ messages in thread
From: Jos Hulzink @ 2003-04-19 21:13 UTC (permalink / raw)
  To: Alan Cox, Stephan von Krawczynski; +Cc: Linux Kernel Mailing List

On Saturday 19 Apr 2003 17:29, Alan Cox wrote:
> On Sad, 2003-04-19 at 17:04, Stephan von Krawczynski wrote:
> > after shooting down one of this bloody cute new very-big-and-poor IDE
> > drives today I wonder whether it would be a good idea to give the
> > linux-fs (namely my preferred reiser and ext2 :-) some fault-tolerance. I
> > remember there have been some discussions along this issue some time ago
> > and I guess remembering that it was decided against because it should be
> > the drivers issue to give the fs a clean space to live, right?
>
> Sometimes disks just go bang. They seem to do it distressingly more
> often nowdays which (while handy for criminals and pirates) is annoying
> for the rest of us. Putting magic in the file system to handle this is
> hard to do well, and at best you get things like ext2/ext3 have now -
> the ability to recover data in the event of some corruption, unless you
> get into really fancy stff.

Basically, disks 1) die or 2) get bad sectors. Unfortunately, all disk 
problems I had so far belong in the 1st category. There is nothing to recover 
there, or it must be done by professionals (electrical / mechanical 
reconstruction of the drive) Talking about the second category: any disk has 
ECC these days, and recoverable errors (sector dying, but data valid) are 
detectable and can be handled (badblocks + sector remapping). This all has 
nothing to do with filesystems.

Now there is one error left: The unrecoverable data error. Basically this 
means you can't trust the data of an entire sector. It might be possible that 
only one bit is wrong, true, but for any read/write mounted filesystem, you 
don't want to continue beyond this point before a decent filesystem check has 
been done. It might be an option to mount a partition readonly as soon as 
errors are discovered (don't make the mess bigger than it is already).

Fault tolerance in a filesystem layer means in practical terms that you are 
guessing what a filesystem should look like, for the disk doesn't answer that 
question anymore. IMHO you don't want that to be done automagically, for it 
might go right sometimes, but also might trash everything on RW filesystems.

Fault tolerance OK, but the fs layer should only detect errors reported by the 
lower level drivers and handle them gracefully (which is something that might 
need impovement a little for some fs drivers), or else trust the data it 
gets. 

Jos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 20:56         ` Helge Hafting
@ 2003-04-19 21:15           ` Valdis.Kletnieks
  2003-04-20 10:51             ` Helge Hafting
  0 siblings, 1 reply; 74+ messages in thread
From: Valdis.Kletnieks @ 2003-04-19 21:15 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Dr. David Alan Gilbert, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Sat, 19 Apr 2003 22:56:21 +0200, Helge Hafting said:

> There are commercially available programs that guarantees to
> wipe your drive clean - including hidden areas and remapped
> sectors.  You should then be able to send drives
> back for warranty replacement.

These don't address the problem - if the drive won't go "ready" because
of a blown server platter, your data won't get overwritten but it's still
readable (a number of companies make good money at this).

In general, if the disk is dead enough that you're looking at replacement,
you'll probably not be totally pleased with the results of those programs..

> There are also bulk erasers that reset every bit magnetically,
> but those will probably void the warranty too.  (You'll
> need a low-level reformat to recreate sector addresses on the
> suddenly blank surface.)

Note that this only works well for single-platter disks - the field
you need to get the *inner* surfaces of the platters, especially for
a 5 or 6 platter disk, is quite astounding....

We ran into this issue recently on one of our Sun servers - Sun *does*
have a program to deal with this, but (a) you have to specifically ask
them and (b) it's additional charge.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:41       ` Dr. David Alan Gilbert
  2003-04-19 20:56         ` Helge Hafting
@ 2003-04-19 21:57         ` Alan Cox
  2003-04-20 10:09         ` Geert Uytterhoeven
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 74+ messages in thread
From: Alan Cox @ 2003-04-19 21:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-19 at 19:41, Dr. David Alan Gilbert wrote:
> 	2) I don't think all drives are set to remap sectors by default.

I don't know any that do not remap on write if needed. 

> 	3) I don't believe that all drivers recover neatly from a drive error.

For IDE we have some issues with ATA6 drives in certain cases at least. 

> 	2) A disc exerciser - something that I can use to see if this drive,
> 	connected to this controller, on this motherboard on this kernel
> 	actually works and keeps its data safe before I put it into live
> 	service.

SMART supports some of this. Andre also has some disk stress testing
tools


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 17:18   ` Florian Weimer
  2003-04-19 18:07     ` Stephan von Krawczynski
@ 2003-04-19 22:02     ` Alan Cox
  2003-04-20  8:41       ` Arjan van de Ven
  2003-04-25  0:11     ` Stewart Smith
  2 siblings, 1 reply; 74+ messages in thread
From: Alan Cox @ 2003-04-19 22:02 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-19 at 18:18, Florian Weimer wrote:
> Stephan von Krawczynski <skraw@ithnet.com> writes:
> 
> > Most I came across have only small problems (few dead sectors),
> 
> IDE disks automatically remap defective sectors, so you won't see any
> of them unless the disk is already quite broken.

You will if it writes and fails to read back. The disk can't invent a
sector that is gone. 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 17:00   ` Stephan von Krawczynski
@ 2003-04-19 22:04     ` Alan Cox
  2003-04-20 16:24       ` Stephan von Krawczynski
  2003-04-20 13:59     ` John Bradford
  1 sibling, 1 reply; 74+ messages in thread
From: Alan Cox @ 2003-04-19 22:04 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-19 at 18:00, Stephan von Krawczynski wrote:
> Ok, you mean active error-recovery on reading. My basic point is the writing
> case. A simple handling of write-errors from the drivers level and a retry to
> write on a different location could help a lot I guess.

It would make no difference. The IDE drive firmware already knows about
such things.

> Just to give some numbers: from 25 disk I bought during last half year 16 have
> gone dead within the first month. This is ridiculous. Of course they are all
> returned and guarantee-replaced, but it gets on ones nerves to continously
> replace disks, the rate could be lowered if one could use them at least 4
> months (or upto a deadline number of bad blocks mapped by the fs - still
> guarantee but fewer replacement cycles).

I'd be changing vendors and also looking at my power/heat/vibration for
that level of problems. I'm sure google consider hard disks as a
consumable but not the rest of us 8)

> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:52   ` Stephan von Krawczynski
  2003-04-19 20:04     ` John Bradford
  2003-04-19 20:05     ` John Bradford
@ 2003-04-19 23:13     ` Arnaldo Carvalho de Melo
  2 siblings, 0 replies; 74+ messages in thread
From: Arnaldo Carvalho de Melo @ 2003-04-19 23:13 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, linux-kernel

Em Sat, Apr 19, 2003 at 06:52:01PM +0200, Stephan von Krawczynski escreveu:
> On Sat, 19 Apr 2003 17:22:18 +0100 (BST)
> > A RAID-0 array and regular backups are the best way to protect your
> > data.
> 
> RAID-1 obviously ;-)

Have you considered this:

http://www.complang.tuwien.ac.at/reisner/drbd/

?

- Arnaldo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 22:02     ` Alan Cox
@ 2003-04-20  8:41       ` Arjan van de Ven
  0 siblings, 0 replies; 74+ messages in thread
From: Arjan van de Ven @ 2003-04-20  8:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Florian Weimer, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Sun, 2003-04-20 at 00:02, Alan Cox wrote:
> On Sad, 2003-04-19 at 18:18, Florian Weimer wrote:
> > Stephan von Krawczynski <skraw@ithnet.com> writes:
> > 
> > > Most I came across have only small problems (few dead sectors),
> > 
> > IDE disks automatically remap defective sectors, so you won't see any
> > of them unless the disk is already quite broken.
> 
> You will if it writes and fails to read back. The disk can't invent a
> sector that is gone. 

but linux can if you use an raid1 mirror... maybe we should teach the md
layer to write back the data from the other disk on a "bad sector"
error.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:41       ` Dr. David Alan Gilbert
  2003-04-19 20:56         ` Helge Hafting
  2003-04-19 21:57         ` Alan Cox
@ 2003-04-20 10:09         ` Geert Uytterhoeven
  2003-04-21  8:37         ` Denis Vlasenko
  2003-05-05 12:38         ` Pavel Machek
  4 siblings, 0 replies; 74+ messages in thread
From: Geert Uytterhoeven @ 2003-04-20 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Linux Kernel Development

On Sat, 19 Apr 2003, Dr. David Alan Gilbert wrote:
>   Besides the problem that most drive manufacturers now seem to use
> cheese as the data storage surface, I think there are some other
> problems:
> 
>   1) I don't trust drive firmware.

Time for Open Source/Free Software to enter a new market segment... ;-)

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 21:15           ` Valdis.Kletnieks
@ 2003-04-20 10:51             ` Helge Hafting
  2003-04-20 19:04               ` Valdis.Kletnieks
  0 siblings, 1 reply; 74+ messages in thread
From: Helge Hafting @ 2003-04-20 10:51 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel

On Sat, Apr 19, 2003 at 05:15:31PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sat, 19 Apr 2003 22:56:21 +0200, Helge Hafting said:
> 
> > There are commercially available programs that guarantees to
> > wipe your drive clean - including hidden areas and remapped
> > sectors.  You should then be able to send drives
> > back for warranty replacement.
> 
> These don't address the problem - if the drive won't go "ready" because
> of a blown server platter, your data won't get overwritten but it's still
> readable (a number of companies make good money at this).
> 
I see.  Your data is so special that you expect people to pay for
reconstruction hoping to find something that pays for all
that trouble and more.

> In general, if the disk is dead enough that you're looking at replacement,
> you'll probably not be totally pleased with the results of those programs..
> 
I have replaced a couple of drives in my life - because a few sectors
didn't read back right.  I expect a overwrite program to be just
fine under such circumstances.  

> > There are also bulk erasers that reset every bit magnetically,
> > but those will probably void the warranty too.  (You'll
> > need a low-level reformat to recreate sector addresses on the
> > suddenly blank surface.)
> 
> Note that this only works well for single-platter disks - the field
> you need to get the *inner* surfaces of the platters, especially for
> a 5 or 6 platter disk, is quite astounding....

Why would it be hard to reach the inner surfaces - the disks
are not superconducting so the outer ones do not shield the
inner ones from a strong magnetic field.  You should be fine
as long as the field extend far enough to get the entire
drive.  A high-frequency device might have trouble,
but you don't need that - even a static field will do.

Helge Hafting



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 17:00   ` Stephan von Krawczynski
  2003-04-19 22:04     ` Alan Cox
@ 2003-04-20 13:59     ` John Bradford
  2003-04-20 16:55       ` Stephan von Krawczynski
  1 sibling, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 13:59 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Alan Cox, linux-kernel

> Ok, you mean active error-recovery on reading. My basic point is the writing
> case. A simple handling of write-errors from the drivers level and a retry to
> write on a different location could help a lot I guess.

A filesystem is not the place for that - it could either be done at a
lower level, like I suggested in a separate post, or at a much higher
level - E.G. a database which encounters a write error could dump it's
entire contents to a tape drive, shuts down, and page an
administrator, on the basis that the write error indicated impending
drive failiure.

> > Buy IDE disks in pairs use md1, and remember to continually send the
> > hosed ones back to the vendor/shop (and if they keep appearing DOA to
> > your local trading standards/fair trading type bodies).
> 
> Just to give some numbers: from 25 disk I bought during last half
> year 16 have gone dead within the first month. This is
> ridiculous. Of course they are all returned and guarantee-replaced,
> but it gets on ones nerves to continously replace disks, the rate
> could be lowered if one could use them at least 4 months (or upto a
> deadline number of bad blocks mapped by the fs - still guarantee but
> fewer replacement cycles).

Are you using the disks within their operational limits?  Are you sure
they are not overheating and/or being run 24/7 when they are not
intended to be?

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 20:38       ` Stephan von Krawczynski
@ 2003-04-20 14:21         ` John Bradford
  2003-04-21  9:09           ` Denis Vlasenko
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 14:21 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, linux-kernel

> 
> On Sat, 19 Apr 2003 21:04:28 +0100 (BST)
> John Bradford <john@grabjohn.com> wrote:
> 
> > > > > I wonder whether it would be a good idea to give the linux-fs
> > > > > (namely my preferred reiser and ext2 :-) some fault-tolerance.
> > > > 
> > > > Fault tollerance should be done at a lower level than the filesystem.
> > > 
> > > I know it _should_ to live in a nice and easy world. Unfortunately
> > > real life is different. The simple question is: you have tons of
> > > low-level drivers for all kinds of storage media, but you have
> > > comparably few filesystems. To me this sound like the preferred
> > > place for this type of behaviour can be fs, because all drivers
> > > inherit the feature if it lives in fs.
> > 
> > The sort of feature you are describing would really belong in a
> > separate layer, somewhat analogous to the MD driver, but for defect
> > management.  You could create a virtual block device that has 90% of
> > the capacity of the real block device, then allocte spare blocks from
> > the real device if and when blocks failed.
> 
> Well, of all work-arounds for the problem this one is probably the
> worst: it wastes space on good drives and runs out of space for sure
> on bad ones.

If 10% of the disk is bad, I wouldn't continue using it.

> What is so bad about the simple way: the one who wants to write
> (e.g. fs) and knows _where_ to write simply uses another newly
> allocated block and dumps the old one on a blacklist. The blacklist
> only for being able to count them (or get the sector-numbers) in
> case you are interested. If you weren't you might as well mark them
> allocated and that's it (which I would presume a _bad_ idea). If
> there are no free blocks left, well, then the medium is full. And
> that is just about the only cause for a write error then (if the
> medium is writeable at all).

Modern disks generally do this kind of thing themselves.  By the time
a disk actually reports write errors, I wouldn't want to continue
using it.  Preferably, I want to know _before_ then, generally by
using S.M.A.R.T. data.

> Don't make the thing bigger than it really is...

The problem you are describing doesn't really exist in a lot of
cases.  Modern hard disks do not have fixed areas corresponding to
specific blocks - they allocate the available space to blocks as
required.  The disk will just allocate a different area to hold the
block that was originally on the defective part of the media when that
block is re-written.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 21:13   ` Jos Hulzink
@ 2003-04-20 16:07     ` Stephan von Krawczynski
  2003-04-20 16:40       ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-20 16:07 UTC (permalink / raw)
  To: Jos Hulzink; +Cc: alan, linux-kernel

On Sat, 19 Apr 2003 23:13:53 +0200
Jos Hulzink <josh@stack.nl> wrote:

> [...]
> Fault tolerance in a filesystem layer means in practical terms that you are 
> guessing what a filesystem should look like, for the disk doesn't answer that
> 
> question anymore. IMHO you don't want that to be done automagically, for it 
> might go right sometimes, but also might trash everything on RW filesystems.

Let me clarify again: I don't want fancy stuff inside the filesystem that
magically knows something about right-or-wrong. The only _very small_
enhancement I would like to see is: driver tells fs there is an error while
writing a certain block => fs tries writing the same data onto another block.
That's it, no magic, no RAID stuff. Very simple.

> Fault tolerance OK, but the fs layer should only detect errors reported by
> the lower level drivers and handle them gracefully (which is something that
> might need impovement a little for some fs drivers), or else trust the data
> it gets. 

You are completely right, I don't want any more: nice management of an error a
low-level driver reports to the fs. Only I would like to see as an fs-answer to
this: ok, let's try another part of the media. Currently it just sinks like
titanic.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 22:04     ` Alan Cox
@ 2003-04-20 16:24       ` Stephan von Krawczynski
  0 siblings, 0 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-20 16:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On 19 Apr 2003 23:04:36 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Sad, 2003-04-19 at 18:00, Stephan von Krawczynski wrote:
> > Ok, you mean active error-recovery on reading. My basic point is the
> > writing case. A simple handling of write-errors from the drivers level and
> > a retry to write on a different location could help a lot I guess.
> 
> It would make no difference. The IDE drive firmware already knows about
> such things.

Hm, maybe this is only another field where "knowing" differs from "doing" (the
right thing) sometimes.

> > Just to give some numbers: from 25 disk I bought during last half year 16
> > have gone dead within the first month. This is ridiculous. Of course they
> > are all returned and guarantee-replaced, but it gets on ones nerves to
> > continously replace disks, the rate could be lowered if one could use them
> > at least 4 months (or upto a deadline number of bad blocks mapped by the fs
> > - still guarantee but fewer replacement cycles).
> 
> I'd be changing vendors and also looking at my power/heat/vibration for
> that level of problems. I'm sure google consider hard disks as a
> consumable but not the rest of us 8)

Maybe I have something in common with google, I am re-writing large parts (well
over 50%) of the harddrives capacity on a daily basis (in the discussed setup).
How many people really do that?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 16:07     ` Stephan von Krawczynski
@ 2003-04-20 16:40       ` John Bradford
  2003-04-20 17:01         ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 16:40 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Jos Hulzink, alan, linux-kernel

> > Fault tolerance in a filesystem layer means in practical terms
> > that you are guessing what a filesystem should look like, for the
> > disk doesn't answer that question anymore. IMHO you don't want
> > that to be done automagically, for it might go right sometimes,
> > but also might trash everything on RW filesystems.
> 
> Let me clarify again: I don't want fancy stuff inside the filesystem that
> magically knows something about right-or-wrong. The only _very small_
> enhancement I would like to see is: driver tells fs there is an error while
> writing a certain block => fs tries writing the same data onto another block.
> That's it, no magic, no RAID stuff. Very simple.

That doesn't belong in the filesystem.

Imagine you have ten blocks free, and you allocate data to all of them
in the filesystem.  The write goes to cache, and succeeds.

30 seconds later, the write cache is flushed, and an error is reported
back from the device.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 13:59     ` John Bradford
@ 2003-04-20 16:55       ` Stephan von Krawczynski
  2003-04-20 17:12         ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-20 16:55 UTC (permalink / raw)
  To: John Bradford; +Cc: alan, linux-kernel

On Sun, 20 Apr 2003 14:59:00 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > Ok, you mean active error-recovery on reading. My basic point is the
> > writing case. A simple handling of write-errors from the drivers level and
> > a retry to write on a different location could help a lot I guess.
> 
> A filesystem is not the place for that - it could either be done at a
> lower level, like I suggested in a separate post, or at a much higher
> level - E.G. a database which encounters a write error could dump it's
> entire contents to a tape drive, shuts down, and page an
> administrator, on the basis that the write error indicated impending
> drive failiure.

Can you tell me what is so particularly bad about the idea to cope a little bit
with braindead (or just-dying) hardware?
See, a car (to name a real good example) is not primarily built to have
accidents. Anyway everybody might agree that having a safety belt built into it
is a good idea, just to make the best out of a bad situation - even if it never
happens - , or not?

> Are you using the disks within their operational limits?  Are you sure
> they are not overheating and/or being run 24/7 when they are not
> intended to be?

No. The only thing we do is completely re-writing them once a day (data gets
exchanged). So our usage pattern is not: dump data on it and thats it (like
most of the people might do with big disks).

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 16:40       ` John Bradford
@ 2003-04-20 17:01         ` Stephan von Krawczynski
  2003-04-20 17:20           ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-20 17:01 UTC (permalink / raw)
  To: John Bradford; +Cc: josh, alan, linux-kernel

On Sun, 20 Apr 2003 17:40:29 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > > Fault tolerance in a filesystem layer means in practical terms
> > > that you are guessing what a filesystem should look like, for the
> > > disk doesn't answer that question anymore. IMHO you don't want
> > > that to be done automagically, for it might go right sometimes,
> > > but also might trash everything on RW filesystems.
> > 
> > Let me clarify again: I don't want fancy stuff inside the filesystem that
> > magically knows something about right-or-wrong. The only _very small_
> > enhancement I would like to see is: driver tells fs there is an error while
> > writing a certain block => fs tries writing the same data onto another
> > block. That's it, no magic, no RAID stuff. Very simple.
> 
> That doesn't belong in the filesystem.
> 
> Imagine you have ten blocks free, and you allocate data to all of them
> in the filesystem.  The write goes to cache, and succeeds.
> 
> 30 seconds later, the write cache is flushed, and an error is reported
> back from the device.

And where's the problem?
Your case:
Immediate failure. Disk error.

My case:
Immediate failure. Disk error (no space left for replacement)

There's no difference.


Thing is: If there are 11 blocks free and not ten, then you fail and I succeed
(if there's one bad block). You loose data, I don't.


Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 16:55       ` Stephan von Krawczynski
@ 2003-04-20 17:12         ` John Bradford
  2003-04-20 17:21           ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 17:12 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, alan, linux-kernel

> 
> On Sun, 20 Apr 2003 14:59:00 +0100 (BST)
> John Bradford <john@grabjohn.com> wrote:
> 
> > > Ok, you mean active error-recovery on reading. My basic point is the
> > > writing case. A simple handling of write-errors from the drivers level and
> > > a retry to write on a different location could help a lot I guess.
> > 
> > A filesystem is not the place for that - it could either be done at a
> > lower level, like I suggested in a separate post, or at a much higher
> > level - E.G. a database which encounters a write error could dump it's
> > entire contents to a tape drive, shuts down, and page an
> > administrator, on the basis that the write error indicated impending
> > drive failiure.
> 
> Can you tell me what is so particularly bad about the idea to cope a
> little bit with braindead (or just-dying) hardware?

Nothing - what is wrong is to implement it in a filesystem, where it
does not belong.

> See, a car (to name a real good example) is not primarily built to have
> accidents.

Stunt cars are built to survive accidents.  All cars _could_ be built
like stunt cars, but they aren't.

> Anyway everybody might agree that having a safety belt built into it
> is a good idea, just to make the best out of a bad situation - even
> if it never happens - , or not?

Exactly, that is why most modern hard disks retry on write failiure.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:01         ` Stephan von Krawczynski
@ 2003-04-20 17:20           ` John Bradford
  2003-04-21  9:32             ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 17:20 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, josh, alan, linux-kernel

> > > > Fault tolerance in a filesystem layer means in practical terms
> > > > that you are guessing what a filesystem should look like, for the
> > > > disk doesn't answer that question anymore. IMHO you don't want
> > > > that to be done automagically, for it might go right sometimes,
> > > > but also might trash everything on RW filesystems.
> > > 
> > > Let me clarify again: I don't want fancy stuff inside the filesystem that
> > > magically knows something about right-or-wrong. The only _very small_
> > > enhancement I would like to see is: driver tells fs there is an
> > > error while writing a certain block => fs tries writing the same
> > > data onto another block. That's it, no magic, no RAID
> > > stuff. Very simple. 
> > 
> > That doesn't belong in the filesystem.
> > 
> > Imagine you have ten blocks free, and you allocate data to all of them
> > in the filesystem.  The write goes to cache, and succeeds.
> > 
> > 30 seconds later, the write cache is flushed, and an error is reported
> > back from the device.
> 
> And where's the problem?
> Your case:
> Immediate failure. Disk error.
> 
> My case:
> Immediate failure. Disk error (no space left for replacement)
> 
> There's no difference.

In my case, the machine can continue as normal.  The filesystem is
intact, (with no blocks free).  The block device driver has to cope
with the error, which could be as simple as holding the data in RAM
until an operator has been paged to replace the disk.

In your case, the filesystem is no longer in a usable state.  If that
was the root filesystem, the machine will, at best, probably go in to
single user mode, with a read-only root filesystem.

> Thing is: If there are 11 blocks free and not ten, then you fail

Wrong.  See above.

> and I succeed (if there's one bad block). You loose data, I don't.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:12         ` John Bradford
@ 2003-04-20 17:21           ` Stephan von Krawczynski
  2003-04-20 18:48             ` Alan Cox
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-20 17:21 UTC (permalink / raw)
  To: John Bradford; +Cc: john, alan, linux-kernel

On Sun, 20 Apr 2003 18:12:54 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > Can you tell me what is so particularly bad about the idea to cope a
> > little bit with braindead (or just-dying) hardware?
> 
> Nothing - what is wrong is to implement it in a filesystem, where it
> does not belong.

I know you favor a layer between low-level driver and fs probably. Sure it is
clean design, and sure it sounds like overhead (Yet Another Layer).

> > See, a car (to name a real good example) is not primarily built to have
> > accidents.
> 
> Stunt cars are built to survive accidents.  All cars _could_ be built
> like stunt cars, but they aren't.

Well, I do really hope that my BMW is built to survive accidents, too. Because
if it is not, I go and buy a Mercedes immediately. We are looking for passive
safety stuff here, and if it _can_ make a difference to spend one buck more,
then I will do ...

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:21           ` Stephan von Krawczynski
@ 2003-04-20 18:48             ` Alan Cox
  2003-04-20 20:00               ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Alan Cox @ 2003-04-20 18:48 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, Linux Kernel Mailing List

On Sul, 2003-04-20 at 18:21, Stephan von Krawczynski wrote:
> I know you favor a layer between low-level driver and fs probably. Sure it is
> clean design, and sure it sounds like overhead (Yet Another Layer).

Wrong again - its actually irrelevant to the cost of mirroring data, the cost
is entirely in the PCI and memory bandwidth. The raid1 management overhead is
almost nil


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 10:51             ` Helge Hafting
@ 2003-04-20 19:04               ` Valdis.Kletnieks
  0 siblings, 0 replies; 74+ messages in thread
From: Valdis.Kletnieks @ 2003-04-20 19:04 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On Sun, 20 Apr 2003 12:51:54 +0200, Helge Hafting said:

> > These don't address the problem - if the drive won't go "ready" because
> > of a blown server platter, your data won't get overwritten but it's still
> > readable (a number of companies make good money at this).
> > 
> I see.  Your data is so special that you expect people to pay for
> reconstruction hoping to find something that pays for all
> that trouble and more.

No, I don't consider my data that special - I just say "screw it" and
go looking for the backups.  However, my point is that it's not THAT
hard to salvage data off drives - it's easy enough that companies are
making a living doing it.

The question is regarding sites that are paranoid about shipping out
their data when such recovery is possible.

> Why would it be hard to reach the inner surfaces - the disks
> are not superconducting so the outer ones do not shield the
> inner ones from a strong magnetic field.  You should be fine
> as long as the field extend far enough to get the entire
> drive.  A high-frequency device might have trouble,
> but you don't need that - even a static field will do.

I didn't say it was impossible - I said you had to use a degausser that
was up to the task.  A degausser that is rated to bulk-erase tape media
is probably *NOT* sufficiently strong to do a multi-platter disk.  Most
tapes need 350 Oe or so to bulk, while disks are in the 1500 Oe range.

And you need a really big-ass magnet to make a 1500 Oe field big enough to
cover an entire disk (especially if it's an older disk of the 5.25 or
even larger variety - older IBM mainframe disks were up to 14" platters).

The Canadian government seems to agree - they recommend either getting the
degaussing wand between the platters (page 10) or using a cavity degausser
(page 12).  Note that they do *not* certify cavity degaussers for disk drives
over 3 1/2" with a coercivity over 1100Oe.

http://collection.nlc-bnc.ca/100/200/301/ cse-cst/cse_app-ef/itspl-02.pdf 



[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 18:48             ` Alan Cox
@ 2003-04-20 20:00               ` John Bradford
  2003-04-21  1:51                 ` jw schultz
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-20 20:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephan von Krawczynski, John Bradford, Linux Kernel Mailing List

> > I know you favor a layer between low-level driver and fs
> > probably. Sure it is clean design, and sure it sounds like
> > overhead (Yet Another Layer).
> 
> Wrong again - its actually irrelevant to the cost of mirroring data, the cost
> is entirely in the PCI and memory bandwidth. The raid1 management overhead is
> almost nil

Actually what I was suggesting was even simpler - in the unlikely
event that we were talking about an MFM or similar interface disk that
_was_ basically like a big floppy, and did no error correction of it's
own, we _could_ reserve, say, one sector per track, and create a 
fault tollerant device that substituted the spare sector in the event
of a write fault.

The overhead would probably be exactly zero, becuase nobody would
actually compile the feature in and use it :-).

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 20:00               ` John Bradford
@ 2003-04-21  1:51                 ` jw schultz
  0 siblings, 0 replies; 74+ messages in thread
From: jw schultz @ 2003-04-21  1:51 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Sun, Apr 20, 2003 at 09:00:16PM +0100, John Bradford wrote:
> > > I know you favor a layer between low-level driver and fs
> > > probably. Sure it is clean design, and sure it sounds like
> > > overhead (Yet Another Layer).
> > 
> > Wrong again - its actually irrelevant to the cost of mirroring data, the cost
> > is entirely in the PCI and memory bandwidth. The raid1 management overhead is
> > almost nil
> 
> Actually what I was suggesting was even simpler - in the unlikely
> event that we were talking about an MFM or similar interface disk that
> _was_ basically like a big floppy, and did no error correction of it's
> own, we _could_ reserve, say, one sector per track, and create a 
> fault tollerant device that substituted the spare sector in the event
> of a write fault.
> 
> The overhead would probably be exactly zero, becuase nobody would
> actually compile the feature in and use it :-).

UFS used to do this because of MFM disks.  In building a
filesystem you could provide a list of bad blocks and the
filesystem would maintian a block remap table.  On Solaris
at least the reserved space is still required.  By the time
most of the modern filesystems were created any hard disk
worth using in production had its own CPU and memory and
merely emulated a disk drive to the host while managing zone
recording and block remapping internally. 

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:41       ` Dr. David Alan Gilbert
                           ` (2 preceding siblings ...)
  2003-04-20 10:09         ` Geert Uytterhoeven
@ 2003-04-21  8:37         ` Denis Vlasenko
  2003-05-05 12:38         ` Pavel Machek
  4 siblings, 0 replies; 74+ messages in thread
From: Denis Vlasenko @ 2003-04-21  8:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, linux-kernel

On 19 April 2003 21:41, Dr. David Alan Gilbert wrote:
> Hi,
>   Besides the problem that most drive manufacturers now seem to use
> cheese as the data storage surface, I think there are some other
> problems:
>
>  1) I don't trust drive firmware.
>        2) I don't think all drives are set to remap sectors by default.
>        3) I don't believe that all drivers recover neatly from a drive error.
>        4) It is OK saying return the drive and get a new one - but many of
>           us can't do this in a commercial environment where the contents of
>                 the drive are confidential - leading to stacks of dead drives
>                 (often many inside their now short warranty periods).

And sadly,

   2) I don't trust Linux (driver + fs)
      will react adequately to disk errors.

Drive failures aren't that frequent, and relevant code paths
doomed to stay rarely tested (unless we put 0.00001% 'faked'
failures there ;)
--
vda

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 14:21         ` John Bradford
@ 2003-04-21  9:09           ` Denis Vlasenko
  2003-04-21  9:35             ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Denis Vlasenko @ 2003-04-21  9:09 UTC (permalink / raw)
  To: John Bradford, Stephan von Krawczynski; +Cc: John Bradford, linux-kernel

On 20 April 2003 17:21, John Bradford wrote:
> > What is so bad about the simple way: the one who wants to write
> > (e.g. fs) and knows _where_ to write simply uses another newly
> > allocated block and dumps the old one on a blacklist. The blacklist
> > only for being able to count them (or get the sector-numbers) in
> > case you are interested. If you weren't you might as well mark them
> > allocated and that's it (which I would presume a _bad_ idea). If
> > there are no free blocks left, well, then the medium is full. And
> > that is just about the only cause for a write error then (if the
> > medium is writeable at all).
>
> Modern disks generally do this kind of thing themselves.  By the time
               ^^^^^^^^^^^^
How many times does Stephan need to say it? 'Generally do'
is not enough, because it means 'sometimes they dont'.

Most filesystems *are* designed with badblock lists and such,
it is possible to teach fs drivers to tolerate write errors
by adding affected blocks to the list and continuing (as opposed
to 'remounted ro, BOOM!'). As usual, this can only happen if someone
will step forward and code it.

Do you think it would be a Wrong Thing to do?
--
vda

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 20:33       ` Andreas Dilger
@ 2003-04-21  9:25         ` Denis Vlasenko
  2003-04-21  9:42           ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Denis Vlasenko @ 2003-04-21  9:25 UTC (permalink / raw)
  To: Andreas Dilger, John Bradford; +Cc: Stephan von Krawczynski, linux-kernel

On 19 April 2003 23:33, Andreas Dilger wrote:
> On Apr 19, 2003  21:04 +0100, John Bradford wrote:
> > > > > I wonder whether it would be a good idea to give the linux-fs
> > > > > (namely my preferred reiser and ext2 :-) some
> > > > > fault-tolerance.
>
> I'm not against this in principle, but in practise it is almost
> useless. Modern disk drives do bad sector remapping at write time, so
> unless something is terribly wrong you will never see a write error
> (which is exactly the time that the filesystem could do such
> remapping).  Normally, you will only see an error like this at read
> time, at which point it is too late to fix.

It is *not* useless.

I have at least 4 disks with some bad sectors. Know what?
They are still in use in a school lab and as 'big diskettes'
(transferring movies etc). I refuse to dump them just because
'todays disks are cheap'. I don't want my fs to die just because
these disks develop (ohhhh) a single new bad sector.
--
vda

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:20           ` John Bradford
@ 2003-04-21  9:32             ` Stephan von Krawczynski
  2003-04-21  9:55               ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21  9:32 UTC (permalink / raw)
  To: John Bradford; +Cc: john, josh, alan, linux-kernel

On Sun, 20 Apr 2003 18:20:16 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > > > > Fault tolerance in a filesystem layer means in practical terms
> > > > > that you are guessing what a filesystem should look like, for the
> > > > > disk doesn't answer that question anymore. IMHO you don't want
> > > > > that to be done automagically, for it might go right sometimes,
> > > > > but also might trash everything on RW filesystems.
> > > > 
> > > > Let me clarify again: I don't want fancy stuff inside the filesystem
> > > > that magically knows something about right-or-wrong. The only _very
> > > > small_ enhancement I would like to see is: driver tells fs there is an
> > > > error while writing a certain block => fs tries writing the same
> > > > data onto another block. That's it, no magic, no RAID
> > > > stuff. Very simple. 
> > > 
> > > That doesn't belong in the filesystem.
> > > 
> > > Imagine you have ten blocks free, and you allocate data to all of them
> > > in the filesystem.  The write goes to cache, and succeeds.
> > > 
> > > 30 seconds later, the write cache is flushed, and an error is reported
> > > back from the device.
> > 
> > And where's the problem?
> > Your case:
> > Immediate failure. Disk error.
> > 
> > My case:
> > Immediate failure. Disk error (no space left for replacement)
> > 
> > There's no difference.
> 
> In my case, the machine can continue as normal.  The filesystem is
> intact, (with no blocks free).  The block device driver has to cope
> with the error, which could be as simple as holding the data in RAM
> until an operator has been paged to replace the disk.

Forgive my ignorance, but I have not seen a case up to today where ide, aicX or
3ware has called me up for a replacement unit, written to it and been ok
afterwards. What the heck are you talking of?
I am not really interested in what a low-level driver could do unless there is
none that does it...
And again, how do you think this should work out on your _root_ partition? (see
below)

> In your case, the filesystem is no longer in a usable state.

I have yet to see an fs thats in a writeable state after the medium is full ...

>  If that
> was the root filesystem, the machine will, at best, probably go in to
> single user mode, with a read-only root filesystem.

How come?

> > Thing is: If there are 11 blocks free and not ten, then you fail
> 
> Wrong.  See above.

Please tell me when you were last "paged to replace the disk"? If you can't
tell me, then you know I am right by now.

> > and I succeed (if there's one bad block). You loose data, I don't.
> 
> John.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:09           ` Denis Vlasenko
@ 2003-04-21  9:35             ` John Bradford
  2003-04-21 11:03               ` Stephan von Krawczynski
  2003-04-21 11:22               ` Denis Vlasenko
  0 siblings, 2 replies; 74+ messages in thread
From: John Bradford @ 2003-04-21  9:35 UTC (permalink / raw)
  To: vda; +Cc: John Bradford, Stephan von Krawczynski, linux-kernel

> > > What is so bad about the simple way: the one who wants to write
> > > (e.g. fs) and knows _where_ to write simply uses another newly
> > > allocated block and dumps the old one on a blacklist. The blacklist
> > > only for being able to count them (or get the sector-numbers) in
> > > case you are interested. If you weren't you might as well mark them
> > > allocated and that's it (which I would presume a _bad_ idea). If
> > > there are no free blocks left, well, then the medium is full. And
> > > that is just about the only cause for a write error then (if the
> > > medium is writeable at all).
> >
> > Modern disks generally do this kind of thing themselves.  By the time
>                ^^^^^^^^^^^^
> How many times does Stephan need to say it? 'Generally do'
> is not enough, because it means 'sometimes they dont'.

OK, _ALL_ modern disks do.

Name an IDE or SCSI disk on sale today that doesn't retry on write
failiure.  Forget I said 'Generally do'.

> Most filesystems *are* designed with badblock lists and such,
> it is possible to teach fs drivers to tolerate write errors
> by adding affected blocks to the list and continuing (as opposed
> to 'remounted ro, BOOM!'). As usual, this can only happen if someone
> will step forward and code it.
> 
> Do you think it would be a Wrong Thing to do?

Yes, I do.

It achieves nothing useful, and gives people a false sense of security.

We have moved on since the 1980s, and I believe that it is now up to
the drive firmware, or the block device driver to do this, it has no
place in a filesystem.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:25         ` Denis Vlasenko
@ 2003-04-21  9:42           ` John Bradford
  2003-04-21 10:25             ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-21  9:42 UTC (permalink / raw)
  To: vda; +Cc: Andreas Dilger, John Bradford, Stephan von Krawczynski, linux-kernel

> > > > > > I wonder whether it would be a good idea to give the linux-fs
> > > > > > (namely my preferred reiser and ext2 :-) some
> > > > > > fault-tolerance.
> >
> > I'm not against this in principle, but in practise it is almost
> > useless. Modern disk drives do bad sector remapping at write time, so
> > unless something is terribly wrong you will never see a write error
> > (which is exactly the time that the filesystem could do such
> > remapping).  Normally, you will only see an error like this at read
> > time, at which point it is too late to fix.
> 
> It is *not* useless.
> 
> I have at least 4 disks with some bad sectors. Know what?
> They are still in use in a school lab and as 'big diskettes'
> (transferring movies etc). I refuse to dump them just because
> 'todays disks are cheap'. I don't want my fs to die just because
> these disks develop (ohhhh) a single new bad sector.

Read my previous posts.

A layer between device and filesystem can solve this.  It doesn't
belong in the filesystem.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:32             ` Stephan von Krawczynski
@ 2003-04-21  9:55               ` John Bradford
  2003-04-21 11:24                 ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: John Bradford @ 2003-04-21  9:55 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, josh, alan, linux-kernel

> > > > > > Fault tolerance in a filesystem layer means in practical terms
> > > > > > that you are guessing what a filesystem should look like, for the
> > > > > > disk doesn't answer that question anymore. IMHO you don't want
> > > > > > that to be done automagically, for it might go right sometimes,
> > > > > > but also might trash everything on RW filesystems.
> > > > > 
> > > > > Let me clarify again: I don't want fancy stuff inside the filesystem
> > > > > that magically knows something about right-or-wrong. The only _very
> > > > > small_ enhancement I would like to see is: driver tells fs there is an
> > > > > error while writing a certain block => fs tries writing the same
> > > > > data onto another block. That's it, no magic, no RAID
> > > > > stuff. Very simple. 
> > > > 
> > > > That doesn't belong in the filesystem.
> > > > 
> > > > Imagine you have ten blocks free, and you allocate data to all of them
> > > > in the filesystem.  The write goes to cache, and succeeds.
> > > > 
> > > > 30 seconds later, the write cache is flushed, and an error is reported
> > > > back from the device.
> > > 
> > > And where's the problem?
> > > Your case:
> > > Immediate failure. Disk error.
> > > 
> > > My case:
> > > Immediate failure. Disk error (no space left for replacement)
> > > 
> > > There's no difference.
> > 
> > In my case, the machine can continue as normal.  The filesystem is
> > intact, (with no blocks free).  The block device driver has to cope
> > with the error, which could be as simple as holding the data in RAM
> > until an operator has been paged to replace the disk.
> 
> Forgive my ignorance, but I have not seen a case up to today where
> ide, aicX or 3ware has called me up for a replacement unit, written
> to it and been ok afterwards. What the heck are you talking of?

Modern disks error correct on write.  The user doesn't even know that
they are doing it.  If a disk actually reports back a write error, it
is usually very broken.

Incidentally, I don't see how your idea would even be implementable
without disabling write caching.

> I am not really interested in what a low-level driver could do
> unless there is none that does it...

I assume you mean 'one that does it'.

If nobody was interested in what a low-level driver could do unless
there was one that does it already, we wouldn't be innovating anything
new.

> And again, how do you think this should work out on your _root_
> partition? (see below)

1. Hot plug a new disk
2. Duplicate the read-only root file system on to it
3. Pivot root

> > In your case, the filesystem is no longer in a usable state.
> 
> I have yet to see an fs thats in a writeable state after the medium
> is full ...

It is perfectly writable, for example, for a delete operation.

> >  If that
> > was the root filesystem, the machine will, at best, probably go in to
> > single user mode, with a read-only root filesystem.
> 
> How come?

In my opinion, that would be the best course of action, if the device
holding the root filesystem is faulty.

> > > Thing is: If there are 11 blocks free and not ten, then you fail
> > 
> > Wrong.  See above.
> 
> Please tell me when you were last "paged to replace the disk"? If you can't
> tell me, then you know I am right by now.

I have never been paged to replace a disk by a Linux system.

That is why I would like to see this functionality added to Linux.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:42           ` John Bradford
@ 2003-04-21 10:25             ` Stephan von Krawczynski
  2003-04-21 10:50               ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21 10:25 UTC (permalink / raw)
  To: John Bradford; +Cc: vda, adilger, john, linux-kernel

On Mon, 21 Apr 2003 10:42:35 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > > > > > > I wonder whether it would be a good idea to give the linux-fs
> > > > > > > (namely my preferred reiser and ext2 :-) some
> > > > > > > fault-tolerance.
> > >
> > > I'm not against this in principle, but in practise it is almost
> > > useless. Modern disk drives do bad sector remapping at write time, so
> > > unless something is terribly wrong you will never see a write error
> > > (which is exactly the time that the filesystem could do such
> > > remapping).  Normally, you will only see an error like this at read
> > > time, at which point it is too late to fix.
> > 
> > It is *not* useless.
> > 
> > I have at least 4 disks with some bad sectors. Know what?
> > They are still in use in a school lab and as 'big diskettes'
> > (transferring movies etc). I refuse to dump them just because
> > 'todays disks are cheap'. I don't want my fs to die just because
> > these disks develop (ohhhh) a single new bad sector.
> 
> Read my previous posts.
> 
> A layer between device and filesystem can solve this.  It doesn't
> belong in the filesystem.

Yes it _can_, but is it _intelligent_ to do it there?


Ok, lets do it vice versa:

What do you need to do it?
- a free/allocated block list (for knowing where to put the mapped block)
- a bad block list for monitoring purposes
- spare blocks for really putting the data in

You say:
we re-invent/re-install the above information in a new layer. In this case you
have the problem to find known-to-be-free blocks. In other words, you have to
pre-alloc blocks (a fixed number) on the device, because else you interfer with
fs. fs must not see your mapped-blocks-in-spe, or else will use them sooner or
later. In other words you _waste_ them in case they are never needed.

I say:
we already have the needed information inside every fs, why not use it?
No space wasted, no double information.

If you say "it does not belong to fs" then please tell me: where in what bible
do you read that? Your argument sounds like "god-given" to me. Do you see
simple argueable technical issues?

I do not say it is _easy_ to do, I say it is an intelligent option. Note the
difference.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 10:25             ` Stephan von Krawczynski
@ 2003-04-21 10:50               ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-21 10:50 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, vda, adilger, linux-kernel

> I say:
> we already have the needed information inside every fs, why not use it?
> No space wasted, no double information.

Bad block remapping in software would be useful only on relatively
obscure devices which don't already do it.  I consider it to be
counter productive, and a bad thing to do on devices that already do
it themselves.

Therefore, it is an ideal candidate for being done in a separate
layer.

Your argument seems to be that if a write error is encountered, the
simplest thing to do is to try another block, and mark the first one
as unusable.

On devices which have a fixed physical area for each logical block,
and where no write caching was done, your approach might be the best
one.

However, modern operating systems often cache a lot of data to write
out when the disk is idle.  The filesystem will typically allocate
this data to various blocks immediately.  A write failiure 30 seconds
later would mean that the filesystem then has to change that
allocation from the bad block to a new block.  It's possible that
there might not even be a new block available by that time.  On disks
which do their own bad block management, if one block re-allocation
failed, because there is no space left, it's quite possible that all
future re-allocations will fail as well.

On devices which never report back bad blocks to the operating system,
the space for the bad block table is wasted.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:35             ` John Bradford
@ 2003-04-21 11:03               ` Stephan von Krawczynski
  2003-04-21 12:04                 ` John Bradford
  2003-04-21 11:22               ` Denis Vlasenko
  1 sibling, 1 reply; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21 11:03 UTC (permalink / raw)
  To: John Bradford; +Cc: vda, john, linux-kernel

On Mon, 21 Apr 2003 10:35:21 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:

> > > > What is so bad about the simple way: the one who wants to write
> > > > (e.g. fs) and knows _where_ to write simply uses another newly
> > > > allocated block and dumps the old one on a blacklist. The blacklist
> > > > only for being able to count them (or get the sector-numbers) in
> > > > case you are interested. If you weren't you might as well mark them
> > > > allocated and that's it (which I would presume a _bad_ idea). If
> > > > there are no free blocks left, well, then the medium is full. And
> > > > that is just about the only cause for a write error then (if the
> > > > medium is writeable at all).
> > >
> > > Modern disks generally do this kind of thing themselves.  By the time
> >                ^^^^^^^^^^^^
> > How many times does Stephan need to say it? 'Generally do'
> > is not enough, because it means 'sometimes they dont'.
> 
> OK, _ALL_ modern disks do.

Stop this thread, we are arguing with god.

> Name an IDE or SCSI disk on sale today that doesn't retry on write
> failiure.  Forget I said 'Generally do'.

IBM DMVS18V (SCSI)
Maxtor ATA133 160 GB DiamondMax Plus.

Maybe they _should_, but I can tell you they in fact sometimes don't (IBM very,
very seldom, Maxtor just about all the time)

> > Most filesystems *are* designed with badblock lists and such,
> > it is possible to teach fs drivers to tolerate write errors
> > by adding affected blocks to the list and continuing (as opposed
> > to 'remounted ro, BOOM!'). As usual, this can only happen if someone
> > will step forward and code it.
> > 
> > Do you think it would be a Wrong Thing to do?
> 
> Yes, I do.
> 
> It achieves nothing useful, and gives people a false sense of security.

How do _you_ know that? What makes _you_ argue for what _I_ think is useful and
_my_ sense of security? You are on thin ice ...
 
> We have moved on since the 1980s, and I believe that it is now up to
> the drive firmware, or the block device driver to do this, it has no
> place in a filesystem.

Interestingly I owned one of those 30 MB MFM Seagate howling drives back in the
80s. I had no errors on it until I threw it away for its unbelievable noise
rate. Today I throw away one (very low-noise) disk about every week for
shooting yet another fs somewhere near midnight.
Indeed we moved on, only the direction looks sometimes questionable ...

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:35             ` John Bradford
  2003-04-21 11:03               ` Stephan von Krawczynski
@ 2003-04-21 11:22               ` Denis Vlasenko
  2003-04-21 11:46                 ` Stephan von Krawczynski
  2003-04-21 12:13                 ` John Bradford
  1 sibling, 2 replies; 74+ messages in thread
From: Denis Vlasenko @ 2003-04-21 11:22 UTC (permalink / raw)
  To: John Bradford; +Cc: John Bradford, Stephan von Krawczynski, linux-kernel

On 21 April 2003 12:35, John Bradford wrote:
> > > Modern disks generally do this kind of thing themselves.  By the
> > > time
> >
> >                ^^^^^^^^^^^^
> > How many times does Stephan need to say it? 'Generally do'
> > is not enough, because it means 'sometimes they dont'.
>
> OK, _ALL_ modern disks do.
>
> Name an IDE or SCSI disk on sale today that doesn't retry on write
> failiure.  Forget I said 'Generally do'.

I don't know about drives currently on sale, but I think
it is possible that some Flash or DRAM-based IDE pseudo-disks
do not have extensive sector remapping features. They can just
do ECC thing and error out.

Also if disk just runs out of spare sectors, it has no other
option other than just report failure, right? (Oh,
of course it can decide to execute 'my firmware is buggy'
option instead ;)

But.

The disk, which I hold in my hand *right now*, namely:
	WD Caviar 21200
MDL: WDAC21200-00H
P/N: 99-004211-000
CCC: E3 2 APR 97 S
DCM: AFAAYAW
WD S/N: WT342 251 1943

does have some bad sectors and otherwise performs satisfactorily.
It's my 'big diskette'. So, if I decide to write some MP3s
on it and carry 'em home, and it will suddenly struck a new bad
sector...

Why in hell should I see my fs remounted RO?
Why do I have to read entire disk to my main disk,
recreate the fs with new badblock map, write back everything,
and retry writing MP3???

I prefer a big fat ugly kernel printk (KERN_ERR) across my console
and all the logs: "ext3fs: write error at sector #NNNN. Marking as bad.
Your disk may be failing!"

What's wrong with me?
--
vda

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21  9:55               ` John Bradford
@ 2003-04-21 11:24                 ` Stephan von Krawczynski
  2003-04-21 11:50                   ` Alan Cox
  2003-04-21 12:14                   ` John Bradford
  0 siblings, 2 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21 11:24 UTC (permalink / raw)
  To: John Bradford; +Cc: john, josh, alan, linux-kernel

On Mon, 21 Apr 2003 10:55:12 +0100 (BST)
John Bradford <john@grabjohn.com> wrote:


> > And again, how do you think this should work out on your _root_
> > partition? (see below)
> 
> 1. Hot plug a new disk

On Linux IDE ??

Are you sure?

Regards,
Stephan



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:22               ` Denis Vlasenko
@ 2003-04-21 11:46                 ` Stephan von Krawczynski
  2003-04-21 12:13                 ` John Bradford
  1 sibling, 0 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21 11:46 UTC (permalink / raw)
  To: vda; +Cc: john, linux-kernel

On Mon, 21 Apr 2003 14:22:01 +0300
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:

> On 21 April 2003 12:35, John Bradford wrote:
> > > > Modern disks generally do this kind of thing themselves.  By the
> > > > time
> > >
> > >                ^^^^^^^^^^^^
> > > How many times does Stephan need to say it? 'Generally do'
> > > is not enough, because it means 'sometimes they dont'.
> >
> > OK, _ALL_ modern disks do.
> >
> > Name an IDE or SCSI disk on sale today that doesn't retry on write
> > failiure.  Forget I said 'Generally do'.
> 
> I don't know about drives currently on sale, but I think
> it is possible that some Flash or DRAM-based IDE pseudo-disks
> do not have extensive sector remapping features. They can just
> do ECC thing and error out.

Good example. Very good example, because it shows a possibility that some part
of a "drive" may be technically damaged and have _no_ influence at all on the
rest of the "media".

> [...]
> I prefer a big fat ugly kernel printk (KERN_ERR) across my console
> and all the logs: "ext3fs: write error at sector #NNNN. Marking as bad.
> Your disk may be failing!"

I would favor that, too.

> What's wrong with me?

Maybe you don't own a good color copy station for printing your own money bills
... ;-)

Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:24                 ` Stephan von Krawczynski
@ 2003-04-21 11:50                   ` Alan Cox
  2003-04-21 12:14                   ` John Bradford
  1 sibling, 0 replies; 74+ messages in thread
From: Alan Cox @ 2003-04-21 11:50 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, josh, Linux Kernel Mailing List

On Llu, 2003-04-21 at 12:24, Stephan von Krawczynski wrote:
> On Mon, 21 Apr 2003 10:55:12 +0100 (BST)
> John Bradford <john@grabjohn.com> wrote:
> 
> 
> > > And again, how do you think this should work out on your _root_
> > > partition? (see below)
> > 
> > 1. Hot plug a new disk
> 
> On Linux IDE ??
> 
> Are you sure?

I wouldn't recommend it on any IDE unless your controller has the right
features *AND* someone bothered to wire them. Even then you'll need
some user apps to handle the isolation etc



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:03               ` Stephan von Krawczynski
@ 2003-04-21 12:04                 ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-21 12:04 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, vda, linux-kernel

> > Name an IDE or SCSI disk on sale today that doesn't retry on write
> > failiure.  Forget I said 'Generally do'.
> 
> IBM DMVS18V (SCSI)
> Maxtor ATA133 160 GB DiamondMax Plus.
> 
> Maybe they _should_, but I can tell you they in fact sometimes don't
> (IBM very, very seldom, Maxtor just about all the time)

How do you know those disks don't retry on write failiure?  How do you
know they aren't retrying and failing?

> How do _you_ know that? What makes _you_ argue for what _I_ think is
> useful and _my_ sense of security? You are on thin ice ...

Linux is an open source operating system, you are welcome to add the
feature if you want it.

> > We have moved on since the 1980s, and I believe that it is now up to
> > the drive firmware, or the block device driver to do this, it has no
> > place in a filesystem.
> 
> Interestingly I owned one of those 30 MB MFM Seagate howling drives
> back in the 80s. I had no errors on it until I threw it away for its
> unbelievable noise rate. Today I throw away one (very low-noise)
> disk about every week for shooting yet another fs somewhere near
> midnight.
> Indeed we moved on, only the direction looks sometimes questionable ...

Ask the disk manufacturers for advice.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:22               ` Denis Vlasenko
  2003-04-21 11:46                 ` Stephan von Krawczynski
@ 2003-04-21 12:13                 ` John Bradford
  1 sibling, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-21 12:13 UTC (permalink / raw)
  To: vda; +Cc: John Bradford, Stephan von Krawczynski, linux-kernel

> > Name an IDE or SCSI disk on sale today that doesn't retry on write
> > failiure.  Forget I said 'Generally do'.
> 
> I don't know about drives currently on sale, but I think
> it is possible that some Flash or DRAM-based IDE pseudo-disks
> do not have extensive sector remapping features. They can just
> do ECC thing and error out.

Flash devices generally have wear-leveling, so I assume that they must
be doing some extensive sector remapping all the time.  I could be
wrong on that account, though.

> Also if disk just runs out of spare sectors, it has no other
> option other than just report failure, right? (Oh,
> of course it can decide to execute 'my firmware is buggy'
> option instead ;)

Yeah, but if a device which is intellegent about bad-block remapping
actually runs out of spare sectors, that's a different failiure that
having a single defective sector.  In a server, it would definitely be
time to replace it.

> But.
> 
> The disk, which I hold in my hand *right now*, namely:
> 	WD Caviar 21200
> MDL: WDAC21200-00H
> P/N: 99-004211-000
> CCC: E3 2 APR 97 S
> DCM: AFAAYAW
> WD S/N: WT342 251 1943
> 
> does have some bad sectors and otherwise performs satisfactorily.

OK.

> It's my 'big diskette'.

[snip]

Then why don't we invent a new filesystem, for known potentially
faulty media, which handles this case - why bloat all the existing
filesystems with code to handle it?  That idea isn't that far away
from the extra layer I suggested a few posts ago, and achieves the
same sort of thing.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:24                 ` Stephan von Krawczynski
  2003-04-21 11:50                   ` Alan Cox
@ 2003-04-21 12:14                   ` John Bradford
  1 sibling, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-21 12:14 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: John Bradford, josh, alan, linux-kernel

> > > And again, how do you think this should work out on your _root_
> > > partition? (see below)
> > 
> > 1. Hot plug a new disk
> 
> On Linux IDE ??
> 
> Are you sure?

Yes.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 16:22 ` John Bradford
                     ` (2 preceding siblings ...)
  2003-04-19 17:54   ` Felipe Alfaro Solana
@ 2003-04-25  0:07   ` Stewart Smith
  2003-04-25  0:52     ` Richard B. Johnson
  3 siblings, 1 reply; 74+ messages in thread
From: Stewart Smith @ 2003-04-25  0:07 UTC (permalink / raw)
  To: John Bradford; +Cc: Stephan von Krawczynski, linux-kernel

On Sunday, April 20, 2003, at 02:22  AM, John Bradford wrote:

>> I wonder whether it would be a good idea to give the linux-fs
>> (namely my preferred reiser and ext2 :-) some fault-tolerance.
>
> Fault tollerance should be done at a lower level than the filesystem.

I would (partly) disagree. On the FS level, you would still have to 
deal with the data having gone away (or become corrupted). Simply 
passing a (known) corrupted block to a FS isn't going to do anything 
useful. Having the FS know that "this data is known crap" could tell it 
to
a) go look at a backup structure (e.g. one of the many superblock 
copies)
b) guess (e.g. in disk allocation bitmap, just think of them all as 
used)
c) fail with error (e.g. "cannot read directory due to a physical 
problem with the disk"
d) try to reconstruct the data (e.g. search around the disk for magic 
numbers)

<snip>
> The filesystem doesn't know or care what device it is stored on, and
> therefore shouldn't try to predict likely failiures.

but it should be tolerant of them and able to recover to some extent. 
Generally, the first sign that a disk is dying (to an end user) is when 
really-weird-stuff(tm) starts happening. A nice error message from the 
file system when they try to go into the directory (or whatever) would 
be a lot nicer.

You could generalize the failure down to an extents type record (i.e. 
offset and length) which would suit 99.9% of cases (i think :). In the 
case of post-detection of error, the extra effort is probably worth it.

these kinda issues are coming up in my honors thesis too, so there 
might even be the (dreaded) code and discussion sometime near the end 
of the year :)
------------------------------
Stewart Smith
stewartsmith@mac.com
Ph: +61 4 3884 4332
ICQ: 6734154


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 17:18   ` Florian Weimer
  2003-04-19 18:07     ` Stephan von Krawczynski
  2003-04-19 22:02     ` Alan Cox
@ 2003-04-25  0:11     ` Stewart Smith
  2 siblings, 0 replies; 74+ messages in thread
From: Stewart Smith @ 2003-04-25  0:11 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-kernel

On Sunday, April 20, 2003, at 03:18  AM, Florian Weimer wrote:
> IDE disks automatically remap defective sectors, so you won't see any
> of them unless the disk is already quite broken.

IIRC:
A drive that has trouble reading a sector will remap it. If the sector 
is dead and it can't read it at all, you're screwed and you don't get 
your data. This is why you still see 'unreadable sector' error messages 
from your drive.
------------------------------
Stewart Smith
stewartsmith@mac.com
Ph: +61 4 3884 4332
ICQ: 6734154


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-25  0:07   ` Stewart Smith
@ 2003-04-25  0:52     ` Richard B. Johnson
  2003-04-25  7:13       ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Richard B. Johnson @ 2003-04-25  0:52 UTC (permalink / raw)
  To: Stewart Smith; +Cc: John Bradford, Stephan von Krawczynski, linux-kernel

On Fri, 25 Apr 2003, Stewart Smith wrote:

> On Sunday, April 20, 2003, at 02:22  AM, John Bradford wrote:
>
> >> I wonder whether it would be a good idea to give the linux-fs
> >> (namely my preferred reiser and ext2 :-) some fault-tolerance.
> >
> > Fault tollerance should be done at a lower level than the filesystem.
>
> I would (partly) disagree. On the FS level, you would still have to
> deal with the data having gone away (or become corrupted). Simply
> passing a (known) corrupted block to a FS isn't going to do anything
> useful. Having the FS know that "this data is known crap" could tell it
> to
> a) go look at a backup structure (e.g. one of the many superblock
> copies)
> b) guess (e.g. in disk allocation bitmap, just think of them all as
> used)
> c) fail with error (e.g. "cannot read directory due to a physical
> problem with the disk"
> d) try to reconstruct the data (e.g. search around the disk for magic
> numbers)
>
> <snip>
> > The filesystem doesn't know or care what device it is stored on, and
> > therefore shouldn't try to predict likely failiures.
>
> but it should be tolerant of them and able to recover to some extent.
> Generally, the first sign that a disk is dying (to an end user) is when
> really-weird-stuff(tm) starts happening. A nice error message from the
> file system when they try to go into the directory (or whatever) would
> be a lot nicer.
>
> You could generalize the failure down to an extents type record (i.e.
> offset and length) which would suit 99.9% of cases (i think :). In the
> case of post-detection of error, the extra effort is probably worth it.
>
> these kinda issues are coming up in my honors thesis too, so there
> might even be the (dreaded) code and discussion sometime near the end
> of the year :)
> ------------------------------
> Stewart Smith
> stewartsmith@mac.com
> Ph: +61 4 3884 4332
> ICQ: 6734154

With most devices used for file-systems most all writes succeed.
So the file-system doesn't even know that there was some error
until it tries to read the data, probably next week. Through the
ages, attempts to fix this have destroyed any real I/O capability.
You need to re-read the data after you've written it. You can't
read it back right away because it will still be in the device's
sector buffer and will not be read from the physical media. That
means you need to keep copies of data buffers in memory and re-read
and compare a long time after other data was written. The makes
things VAXen-like, with I/O measured in bytes/fortnight. Once you
find a bad "block", you can make these owned by a file...

	DUA0:[000000]BADBLOCK.SYS

... hehehe  so they never get used again. Periodically, you can
look at the size of the file and, if it's getting large, you know
that you are going to have to low-level format the drive and start
all over gain ;^)

On Unix/Linux machines, perhaps 90 percent of the file data never
gets written to the physical media. It stays in RAM until it's
deleted. Even if you made new kinds of disk-drives that, internally
read-after-write, writing all that data to a drive, on the off-chance
of some error in the 10 percent that stays, is going to take your
I/O bandwidth to a snails pace.

These Unix file-systems are really RAM disks that overflow to
disk drives. You can run for many days with a disk-drive off-line
and not even know it. Been there, done that.

If you have disk-drive problems, leave your machine on continuously
with the monitor off when you are not using it. The disk drives will
then fail only after a power-failure (really). Keep some new ones
handy and keep backups. The drives get beat-up when they start-up and
run-down. If you can get them started, they will run for several
continuous years. At the company I work for, we have about 1,000
employees that use PCs. At one time we had a repair department with
10 or so employees doing noting but repairing PCs. Once I convinced
everybody to keep their machines on <forever>. The repair department
had so little work, it closed. But... if we ever have a power-failure,
it takes several weeks to get all those PCs repaired.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-25  0:52     ` Richard B. Johnson
@ 2003-04-25  7:13       ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-25  7:13 UTC (permalink / raw)
  To: root; +Cc: Stewart Smith, John Bradford, Stephan von Krawczynski, linux-kernel

> > >> I wonder whether it would be a good idea to give the linux-fs
> > >> (namely my preferred reiser and ext2 :-) some fault-tolerance.
> > >
> > > Fault tollerance should be done at a lower level than the filesystem.
> >
> > I would (partly) disagree. On the FS level, you would still have to
> > deal with the data having gone away (or become corrupted). Simply
> > passing a (known) corrupted block to a FS isn't going to do anything
> > useful. Having the FS know that "this data is known crap" could tell it
> > to
> > a) go look at a backup structure (e.g. one of the many superblock
> > copies)
> > b) guess (e.g. in disk allocation bitmap, just think of them all as
> > used)
> > c) fail with error (e.g. "cannot read directory due to a physical
> > problem with the disk"
> > d) try to reconstruct the data (e.g. search around the disk for magic
> > numbers)
> >
> > <snip>
> > > The filesystem doesn't know or care what device it is stored on, and
> > > therefore shouldn't try to predict likely failiures.
> >
> > but it should be tolerant of them and able to recover to some extent.
> > Generally, the first sign that a disk is dying (to an end user) is when
> > really-weird-stuff(tm) starts happening. A nice error message from the
> > file system when they try to go into the directory (or whatever) would
> > be a lot nicer.
> >
> > You could generalize the failure down to an extents type record (i.e.
> > offset and length) which would suit 99.9% of cases (i think :). In the
> > case of post-detection of error, the extra effort is probably worth it.
> >
> > these kinda issues are coming up in my honors thesis too, so there
> > might even be the (dreaded) code and discussion sometime near the end
> > of the year :)
> > ------------------------------
> > Stewart Smith
> > stewartsmith@mac.com
> > Ph: +61 4 3884 4332
> > ICQ: 6734154
> 
> With most devices used for file-systems most all writes succeed.
> So the file-system doesn't even know that there was some error
> until it tries to read the data, probably next week. Through the
> ages, attempts to fix this have destroyed any real I/O capability.

The fix is to dispense with the disk device altogether, and have a
huge battery-backed RAM.  It's practical already - two gigs of ECC RAM
and some logic to make it appear as an IDE or SCSI device would cost
very little to build.

Infact, you don't even need to do that.

Just put three gigs of RAM in an existing machine, and set it to boot
from CD with the root filesystem on a RAM disk, and use further RAM
disks for all of your partitions.  Copy the contents of the RAM disk
containing user data over the LAN to another box every 30 minutes.
Patch the kernel to dump the contents of the RAM disks to another box
over the LAN if it oopses.

I've actually thought of co-locating a machine and running a webserver
entirely from RAM this way.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-19 18:41       ` Dr. David Alan Gilbert
                           ` (3 preceding siblings ...)
  2003-04-21  8:37         ` Denis Vlasenko
@ 2003-05-05 12:38         ` Pavel Machek
  4 siblings, 0 replies; 74+ messages in thread
From: Pavel Machek @ 2003-05-05 12:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-kernel

Hi!

>   Besides the problem that most drive manufacturers now seem to use
> cheese as the data storage surface, I think there are some other
> problems:
> 
>   1) I don't trust drive firmware.

I created crc loop method just for this.
Make it md5 loop method and you would
be able to trust that...
				Pavel
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 14:14     ` Valdis.Kletnieks
@ 2003-05-06  7:03       ` Mike Fedyk
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Fedyk @ 2003-05-06  7:03 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Stephan von Krawczynski, linux-kernel

On Mon, Apr 21, 2003 at 10:14:19AM -0400, Valdis.Kletnieks@vt.edu wrote:
> No amount of code wanking in the filesystem is going to save you if you hit
> an error on your swap partition - but an 'md'-like driver might be able to
> save you.

Unless you swap to a file... ;)

Which can be a overall benifet if you have one filesystem/partition on a
drive.  Your swap is closer to you data, and thus faster seek time. :)

Now let's watch all of the people mention all of the deadlocks that come
along with this...  (really, I'd like to know.)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:19   ` Stephan von Krawczynski
  2003-04-21 11:52     ` Alan Cox
@ 2003-04-21 14:14     ` Valdis.Kletnieks
  2003-05-06  7:03       ` Mike Fedyk
  1 sibling, 1 reply; 74+ messages in thread
From: Valdis.Kletnieks @ 2003-04-21 14:14 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On Mon, 21 Apr 2003 13:19:34 +0200, Stephan von Krawczynski <skraw@ithnet.com>  said:

> I can very well accept that argument. What I am trying to do is only make
> _someone_ writing a fs listen to the problem, and maybe - only maybe - in _hi
s_
> fs it is not as complicated and so he simply hacks it in. I am only arguing f
or
> having a choice. Not more. If e.g. reiserfs had the feature I could simply
> shoot all extX stuff and use my preferred fs all the time. That's just about

So what do you do if your mythical file system supports bad block relocation
but doesn't support something else you need, like journaling or quotas or
whatever?

Nobody's mentioned the most obvious reason why it doesn't belong in the
filesystem, but needs to be in something like the 'md' layer like (I think)
John Bradford suggested:

No amount of code wanking in the filesystem is going to save you if you hit
an error on your swap partition - but an 'md'-like driver might be able to
save you.


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-21 11:19   ` Stephan von Krawczynski
@ 2003-04-21 11:52     ` Alan Cox
  2003-04-21 14:14     ` Valdis.Kletnieks
  1 sibling, 0 replies; 74+ messages in thread
From: Alan Cox @ 2003-04-21 11:52 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Linux Kernel Mailing List

On Llu, 2003-04-21 at 12:19, Stephan von Krawczynski wrote:
> I can very well accept that argument. What I am trying to do is only make
> _someone_ writing a fs listen to the problem, and maybe - only maybe - in _his_
> fs it is not as complicated and so he simply hacks it in. I am only arguing for
> having a choice. Not more. If e.g. reiserfs had the feature I could simply
> shoot all extX stuff and use my preferred fs all the time. That's just about

You can interest Hans Reiser I'm sure. Just find $100,000.

Its economically saner, better design and a lot more things to just use
md.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
       [not found] ` <03Apr21.020150edt.41463@gpu.utcc.utoronto.ca>
@ 2003-04-21 11:19   ` Stephan von Krawczynski
  2003-04-21 11:52     ` Alan Cox
  2003-04-21 14:14     ` Valdis.Kletnieks
  0 siblings, 2 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21 11:19 UTC (permalink / raw)
  To: linux-kernel

On Mon, 21 Apr 2003 02:01:46 -0400
someone wrote:

> You write:
> | Can you tell me what is so particularly bad about the idea to cope a
> | little bit with braindead (or just-dying) hardware?
> 
> [...]
>  It probably could be done. I do not think it would be small or easy.
> Especially if filesystem developers feel that modern drives only start
> experiencing user-visible write errors about when they are going to
> explode in general, they may rationally feel that the work is not worth
> it.

I can very well accept that argument. What I am trying to do is only make
_someone_ writing a fs listen to the problem, and maybe - only maybe - in _his_
fs it is not as complicated and so he simply hacks it in. I am only arguing for
having a choice. Not more. If e.g. reiserfs had the feature I could simply
shoot all extX stuff and use my preferred fs all the time. That's just about
it. No religion involved. I am not arguing this type of feature as a
_must-have_. I only think regarding the neat stuff that is already inside
reiser (just to name my currently preferred fs) it would be very kind to have
write-error-recovery additionally.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:28 Chuck Ebbert
@ 2003-04-21  9:36 ` Stephan von Krawczynski
  0 siblings, 0 replies; 74+ messages in thread
From: Stephan von Krawczynski @ 2003-04-21  9:36 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

On Sun, 20 Apr 2003 13:28:21 -0400
Chuck Ebbert <76306.1226@compuserve.com> wrote:

> Stephan wrote:
> 
> 
> > Maybe I have something in common with google, I am re-writing large parts
> > (well over 50%) of the harddrives capacity on a daily basis (in the
> > discussed setup). how many people really do that?
> 
> 
>   I'll bet the people who do are using SCSI disks...

Guess what, I would do that too, if it were affordable. I have to say I have
never ever experienced problems like these with SCSI disks. Never.
Currently there is a _big_ difference in prices between IDE and SCSI drives. On
the other hand one could argue that there is a good reason for it...

Regards,
Stephan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 17:44 Chuck Ebbert
  0 siblings, 0 replies; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 17:44 UTC (permalink / raw)
  To: arjanv; +Cc: linux-kernel


>> You will if it writes and fails to read back. The disk can't invent a
>> sector that is gone. 
>
> but linux can if you use an raid1 mirror... maybe we should teach the md
> layer to write back the data from the other disk on a "bad sector"
> error.


  NTFS does this in the filesystem by moving the affected cluster
somewhere else, then marking it bad in its allocation map.  Of course
in order to do that it has to get notifications from the ft disk
driver...


------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 17:44 Chuck Ebbert
  0 siblings, 0 replies; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 17:44 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel


>>   I have some ugly code that forces all reads from a mirror set to
>> a specific copy, set via a global sysctl.  This lets you do things
>> like make a backup from disk 0, then verify against disk 1 and take
>> action if something is wrong.
>
> That's interesting.  Have you thought of making it read from _both_
> disks and check that the data matches, before passing it back?


  It didn't seem to be worth doing, since a userspace program could
be written to do the same thing using my small patch.  Only problem
is, it uses a global sysctl that affects every mirror set in the machine,
so it could affect performance of every mirror if used during load.


------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 17:28 Chuck Ebbert
  0 siblings, 0 replies; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 17:28 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel


>>   I buy three drives at a time so I have a matching spare, because AFAIC
>> you shouldn't be doing RAID on unmatched drives.
>
> Err, yes you should :-).
>
> Unless they are spindle syncronised, the advantage of identical
> physical layout diminishes, and the disadvantage of quite possibly
> getting components from the same, (faulty), batch increases :-).


 Yeah, I know, and some of my serial numbers are too close together
for comfort but I still like everything matched up:


hde: MAXTOR 4K060H3, ATA DISK drive
hdg: MAXTOR 4K060H3, ATA DISK drive
hdi: MAXTOR 4K060H3, ATA DISK drive
 hde: hde1 hde2 hde3 hde4 < hde5 hde6 hde7 hde8 hde9 >
 hdg: hdg1 hdg2 hdg3 hdg4 < hdg5 hdg6 hdg7 hdg8 hdg9 >
 hdi: hdi1 hdi2 hdi3 hdi4 < hdi5 hdi6 hdi7 hdi8 hdi9 >



------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 17:28 Chuck Ebbert
  2003-04-21  9:36 ` Stephan von Krawczynski
  0 siblings, 1 reply; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 17:28 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

Stephan wrote:


> Maybe I have something in common with google, I am re-writing large parts (well
> over 50%) of the harddrives capacity on a daily basis (in the discussed setup).
> how many people really do that?


  I'll bet the people who do are using SCSI disks...


------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 17:03 Chuck Ebbert
@ 2003-04-20 17:25 ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-20 17:25 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: arjanv, linux-kernel

> >> You will if it writes and fails to read back. The disk can't invent a
> >> sector that is gone. 
> >
> > but linux can if you use an raid1 mirror... maybe we should teach the md
> > layer to write back the data from the other disk on a "bad sector"
> > error.
> 
> 
>   I have some ugly code that forces all reads from a mirror set to
> a specific copy, set via a global sysctl.  This lets you do things
> like make a backup from disk 0, then verify against disk 1 and take
> action if something is wrong.

That's interesting.  Have you thought of making it read from _both_
disks and check that the data matches, before passing it back?

RAID1 mirrors guard against drive failiure, but if a drive returns bad
data, but doesn't report an error, that will usually go unnoticed.

By reading from both disks, and checking that the data was the same,
we could guard against broken firmware.

Of course, this would reduce performane quite a bit, but it might have
some uses.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 17:03 Chuck Ebbert
  2003-04-20 17:25 ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 17:03 UTC (permalink / raw)
  To: arjanv; +Cc: linux-kernel

arjan wrote:


>> You will if it writes and fails to read back. The disk can't invent a
>> sector that is gone. 
>
> but linux can if you use an raid1 mirror... maybe we should teach the md
> layer to write back the data from the other disk on a "bad sector"
> error.


  I have some ugly code that forces all reads from a mirror set to
a specific copy, set via a global sysctl.  This lets you do things
like make a backup from disk 0, then verify against disk 1 and take
action if something is wrong.


------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
  2003-04-20 15:06 Chuck Ebbert
@ 2003-04-20 15:19 ` John Bradford
  0 siblings, 0 replies; 74+ messages in thread
From: John Bradford @ 2003-04-20 15:19 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

> > Buy IDE disks in pairs use md1, and remember to continually send the
> > hosed ones back to the vendor/shop (and if they keep appearing DOA to
> > your local trading standards/fair trading type bodies).
> 
> 
>   I buy three drives at a time so I have a matching spare, because AFAIC
> you shouldn't be doing RAID on unmatched drives.

Err, yes you should :-).

Unless they are spindle syncronised, the advantage of identical
physical layout diminishes, and the disadvantage of quite possibly
getting components from the same, (faulty), batch increases :-).

>   Using RAID1 is especially important when using software instead
> of hardware for fault-tolerance because the software is more likely to
> have bugs just because of the 'culture' of hardware vs. software
> developers, and the RAID5 algorithm is very hard to get right anyway,
> especially in failure/rebuild mode.  Even on a hardware controller
> RAID5 is still inherently less reliable.

The advantage of RAID1 over a SLED is probably greater than the
advantage of RAID5 over RAID1.

>  (...and what's all this about unreliable drives, anyway?  Every drive
> I have bought since 1987 still works.)

I haven't had a drive failiure for a long time.  Maybe I'm just really
lucky.

John.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Are linux-fs's drive-fault-tolerant by concept?
@ 2003-04-20 15:06 Chuck Ebbert
  2003-04-20 15:19 ` John Bradford
  0 siblings, 1 reply; 74+ messages in thread
From: Chuck Ebbert @ 2003-04-20 15:06 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:


> Buy IDE disks in pairs use md1, and remember to continually send the
> hosed ones back to the vendor/shop (and if they keep appearing DOA to
> your local trading standards/fair trading type bodies).


  I buy three drives at a time so I have a matching spare, because AFAIC
you shouldn't be doing RAID on unmatched drives.

  Using RAID1 is especially important when using software instead
of hardware for fault-tolerance because the software is more likely to
have bugs just because of the 'culture' of hardware vs. software
developers, and the RAID5 algorithm is very hard to get right anyway,
especially in failure/rebuild mode.  Even on a hardware controller
RAID5 is still inherently less reliable.

 (...and what's all this about unreliable drives, anyway?  Every drive
I have bought since 1987 still works.)
------
 Chuck

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2003-05-06  6:51 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-19 16:04 Are linux-fs's drive-fault-tolerant by concept? Stephan von Krawczynski
2003-04-19 15:29 ` Alan Cox
2003-04-19 17:00   ` Stephan von Krawczynski
2003-04-19 22:04     ` Alan Cox
2003-04-20 16:24       ` Stephan von Krawczynski
2003-04-20 13:59     ` John Bradford
2003-04-20 16:55       ` Stephan von Krawczynski
2003-04-20 17:12         ` John Bradford
2003-04-20 17:21           ` Stephan von Krawczynski
2003-04-20 18:48             ` Alan Cox
2003-04-20 20:00               ` John Bradford
2003-04-21  1:51                 ` jw schultz
2003-04-19 21:13   ` Jos Hulzink
2003-04-20 16:07     ` Stephan von Krawczynski
2003-04-20 16:40       ` John Bradford
2003-04-20 17:01         ` Stephan von Krawczynski
2003-04-20 17:20           ` John Bradford
2003-04-21  9:32             ` Stephan von Krawczynski
2003-04-21  9:55               ` John Bradford
2003-04-21 11:24                 ` Stephan von Krawczynski
2003-04-21 11:50                   ` Alan Cox
2003-04-21 12:14                   ` John Bradford
2003-04-19 16:22 ` John Bradford
2003-04-19 16:36   ` Russell King
2003-04-19 16:45     ` John Bradford
2003-04-19 16:52   ` Stephan von Krawczynski
2003-04-19 20:04     ` John Bradford
2003-04-19 20:33       ` Andreas Dilger
2003-04-21  9:25         ` Denis Vlasenko
2003-04-21  9:42           ` John Bradford
2003-04-21 10:25             ` Stephan von Krawczynski
2003-04-21 10:50               ` John Bradford
2003-04-19 20:38       ` Stephan von Krawczynski
2003-04-20 14:21         ` John Bradford
2003-04-21  9:09           ` Denis Vlasenko
2003-04-21  9:35             ` John Bradford
2003-04-21 11:03               ` Stephan von Krawczynski
2003-04-21 12:04                 ` John Bradford
2003-04-21 11:22               ` Denis Vlasenko
2003-04-21 11:46                 ` Stephan von Krawczynski
2003-04-21 12:13                 ` John Bradford
2003-04-19 20:05     ` John Bradford
2003-04-19 23:13     ` Arnaldo Carvalho de Melo
2003-04-19 17:54   ` Felipe Alfaro Solana
2003-04-25  0:07   ` Stewart Smith
2003-04-25  0:52     ` Richard B. Johnson
2003-04-25  7:13       ` John Bradford
     [not found] ` <20030419161011$0136@gated-at.bofh.it>
2003-04-19 17:18   ` Florian Weimer
2003-04-19 18:07     ` Stephan von Krawczynski
2003-04-19 18:41       ` Dr. David Alan Gilbert
2003-04-19 20:56         ` Helge Hafting
2003-04-19 21:15           ` Valdis.Kletnieks
2003-04-20 10:51             ` Helge Hafting
2003-04-20 19:04               ` Valdis.Kletnieks
2003-04-19 21:57         ` Alan Cox
2003-04-20 10:09         ` Geert Uytterhoeven
2003-04-21  8:37         ` Denis Vlasenko
2003-05-05 12:38         ` Pavel Machek
2003-04-19 22:02     ` Alan Cox
2003-04-20  8:41       ` Arjan van de Ven
2003-04-25  0:11     ` Stewart Smith
2003-04-20 15:06 Chuck Ebbert
2003-04-20 15:19 ` John Bradford
2003-04-20 17:03 Chuck Ebbert
2003-04-20 17:25 ` John Bradford
2003-04-20 17:28 Chuck Ebbert
2003-04-21  9:36 ` Stephan von Krawczynski
2003-04-20 17:28 Chuck Ebbert
2003-04-20 17:44 Chuck Ebbert
2003-04-20 17:44 Chuck Ebbert
     [not found] <mail.linux.kernel/20030420185512.763df745.skraw@ithnet.com>
     [not found] ` <03Apr21.020150edt.41463@gpu.utcc.utoronto.ca>
2003-04-21 11:19   ` Stephan von Krawczynski
2003-04-21 11:52     ` Alan Cox
2003-04-21 14:14     ` Valdis.Kletnieks
2003-05-06  7:03       ` Mike Fedyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).