From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030458AbWARXhQ (ORCPT ); Wed, 18 Jan 2006 18:37:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030459AbWARXhP (ORCPT ); Wed, 18 Jan 2006 18:37:15 -0500 Received: from ns1.suse.de ([195.135.220.2]:20874 "EHLO mx1.suse.de") by vger.kernel.org with ESMTP id S1030458AbWARXhO (ORCPT ); Wed, 18 Jan 2006 18:37:14 -0500 From: Neil Brown To: Mark Lord Date: Thu, 19 Jan 2006 10:37:03 +1100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17358.53535.449726.814333@cse.unsw.edu.au> Cc: Helge Hafting , Cynbe ru Taren , linux-kernel@vger.kernel.org Subject: Re: FYI: RAID5 unusably unstable through 2.6.14 In-Reply-To: message from Mark Lord on Wednesday January 18 References: <43CE1E52.3030907@aitel.hist.no> <43CE6997.6090005@rtr.ca> X-Mailer: VM 7.19 under Emacs 21.4.1 X-face: v[Gw_3E*Gng}4rRrKRYotwlE?.2|**#s9D Helge Hafting wrote: > > > > As other have showed - "mdadm" can reassemble your > > broken raid - and it'll work well in those cases where > > the underlying drives indeed are ok. It will fail > > spectacularly if you have a real double fault though, > > but then nothing short of raid-6 can save you. > > No, actually there are several things we *could* do, > if only the will-to-do-so existed. You not only need the will. You also need the ability and the time, and the three must be combined into the one person... > > For example, one bad sector on a drive doesn't mean that > the entire drive has failed. It just means that one 512-byte > chunk of the drive has failed. > > We could rewrite the failed area of the drive, allowing the > onboard firmware to repair the fault internally, likely by > remapping physical sectors. This is nothing unusual, as all > drives these days ship from the factory with many bad sectors > that have already been remapped to "fix" them. One or two > more in the field is no reason to toss a perfectly good drive. Very recent 2.6 kernels do exactly this. They don't drop a drive on a read error, only on a write error. On a read error they generate the data from elsewhere and schedule a write, then a re-read. NeilBrown