From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261578AbVFTWtr (ORCPT ); Mon, 20 Jun 2005 18:49:47 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261222AbVFTWso (ORCPT ); Mon, 20 Jun 2005 18:48:44 -0400 Received: from cog1.w2cog.org ([206.251.188.12]:50076 "EHLO mail1.w2cog.org") by vger.kernel.org with ESMTP id S262318AbVFTWTo (ORCPT ); Mon, 20 Jun 2005 18:19:44 -0400 Date: Mon, 20 Jun 2005 17:19:19 -0500 (CDT) From: Roy Keene To: Kyle Moffett cc: Erik Slagter , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: Problem with 2.6 kernel and lots of I/O In-Reply-To: Message-ID: References: <20050601195922.GA589@openzaurus.ucw.cz> <1117966262.5027.9.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org All, Actually, the problem I have isn't specific to the using it over the local device. Quite often I have the problem where the secondary node goes down and comes back up after some time and needs to be resyncd. This is done on the master (raid1_resync) by hot-removing /dev/nbd1 and then hot-adding it back. The result ? The slave node becomes completely unusable despite the fact that only nbd-server processes (two, the listener and the accepted socket) are running on there and nothing in the kernel context (well, at least w.r.t. to nbd, obviously some kernel code is involved ! :-P, but the nbd module doesn't even have to be loaded). And by unusable I mean I can no longer open files for writing, attempting to do so results in a hang until the resync is complete. This is not-so-bad when the slave is being resync'd as the primary is still fully usable, but it really sucks when the primary goes down and needs to be resync'd from the secondary upon coming back up. I'm thinking my system disks' RAID controller may be really horrible, or horribly supported. I have a RAID5 (hardware, uses the megaraid_mbox driver) of 3 x 73gb 10K RPM SCSI-320 disks and my write performance is .. horrible. I've looked at "drbd" and it looks very promising, but I haven't had a chance to implement it yet, but it promises to resolve my resync time issues at least. Roy Keene Planning Systems Inc. On Mon, 6 Jun 2005, Kyle Moffett wrote: > On Jun 5, 2005, at 06:11:02, Erik Slagter wrote: >> On Wed, 2005-06-01 at 21:59 +0200, Pavel Machek wrote: >> >>>> Start RAID in degraded mode with remote device (nbd1) >>>> Hot-add local device (nbd0) >>> >>> Stop right here. You may not use nbd over loopback. >> >> Any specific reason (just curious)? > > IIRC, because of the way the loopback delivers packets from the > same context as they are sent, it is possible (and quite easy) > to either deadlock or peg the CPU and make everything hang and > be unuseable. DRBD likewise used to have problems with testing > over the loopback until they added a special configuration > option to be extra careful and yield CPU. > > Cheers, > Kyle Moffet > >