From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261283AbTDUN7N (ORCPT ); Mon, 21 Apr 2003 09:59:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261288AbTDUN7N (ORCPT ); Mon, 21 Apr 2003 09:59:13 -0400 Received: from chiark.greenend.org.uk ([193.201.200.170]:61200 "EHLO chiark.greenend.org.uk") by vger.kernel.org with ESMTP id S261283AbTDUN7L (ORCPT ); Mon, 21 Apr 2003 09:59:11 -0400 From: Peter Benie MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16035.64514.811736.721619@chiark.greenend.org.uk> Date: Mon, 21 Apr 2003 15:11:14 +0100 To: linux-kernel@vger.kernel.org (linux-kernel) Subject: Verifying a RAID device (Was: Are linux-fs's drive-fault-tolerant by concept?) In-Reply-To: <200304201725.h3KHP5lU000751@81-2-122-30.bradfords.org.uk> References: <200304201306_MC3-1-3537-115@compuserve.com> <200304201725.h3KHP5lU000751@81-2-122-30.bradfords.org.uk> X-Mailer: VM 7.03 under 21.4 (patch 6) "Common Lisp" XEmacs Lucid Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org John Bradford writes: > [Chuck Ebbert writes] > > I have some ugly code that forces all reads from a mirror set to > > a specific copy, set via a global sysctl. This lets you do things > > like make a backup from disk 0, then verify against disk 1 and take > > action if something is wrong. > > That's interesting. Have you thought of making it read from _both_ > disks and check that the data matches, before passing it back? > > RAID1 mirrors guard against drive failiure, but if a drive returns bad > data, but doesn't report an error, that will usually go unnoticed. Checking the disks periodically guards against another important failure mode. Consider this scenario: Start with a RAID1 mirror using two apparently working disks. The first disk develops a media fault, however, this goes unnoticed because there happens to be no data stored on that part of the media. Later, a fault is detected during a read of the second disk; the disk is marked off off-line, but all the data is still readable on the first disk so there's no need to panic yet. You replace the known faulty disk with a new one. The md driver automatically reconstructs the array from the first disk. Since the md driver doesn't know about the filesystem, it reads every disk block, regardless of whether it contains data or not. The latent error on the first disk is now discovered, and the first disk is now marked off-line. The replacement is only partially reconstructed, so it remains inactive. Oops, there are no active disks left - the md device has failed! To guard against this, it is a good idea to periodically read all of every disk in a RAID to detect faults early. Ideally, this should be done as another background task, like reconstruction. Checking that the data is valid is then a trivial extra step. If a read fails, you mark the disk relevant disk off-line, as usual. If the parity check fails, you just return a bad blocks list. Peter