All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem with 2.6 kernel and lots of I/O
@ 2005-05-31 15:55 Roy Keene
       [not found] ` <200505312040.30812.bernd-schubert@web.de>
  2005-06-01 19:59 ` Pavel Machek
  0 siblings, 2 replies; 13+ messages in thread
From: Roy Keene @ 2005-05-31 15:55 UTC (permalink / raw)
  To: linux-kernel

Hello,

 	I have a (well, at least one) show-stopping problems with the 2.6 
kernel while doing heavy I/O.  I have a (software) RAID1 of network block 
devices (nbd0 and nbd1) set up on two identical machines in an 
active-passive HA cluster configuration.  When the "primary" node goes 
down and comes back up it recovers the RAID as follows:

 	Start RAID in degraded mode with remote device (nbd1)
 	Hot-add local device (nbd0)

This all works.  Hot-adding the local device causes a resync and that is 
where the problems begin.  Once the resync begins the system becomes 
unusable.  Anything that wants to write something to the syslog socket 
("/dev/log") syncronously hangs until the resync completes.  The system 
load goes up to 18 or so.  Writing stuff to the local disk ("/etc" for 
example, which is not part of the RAID) sometimes hangs.  When the resync 
is complete everything is happy again.  Resyncing takes about 25 minutes 
(20GB over a dedicated network interface to the client at 1000Mbps) and 
makes the recovery time unacceptable.  Also, during this recovery the OOM 
killer will occasionally be invoked and kill something randomly even 
though there is typically plenty of unused swap lying around before 
(though perhaps "java" just decides to eat all of that VERY quickly and I 
don't notice this, since that is what the OOM killer choses to kill.)

Does anyone have any ideas ?


Information about the systems:

Info: Linux cog1 2.6.9-5.0.5.ELsmp #1 SMP Fri Apr 8 14:29:47 EDT 2005 i686 i686 i386 GNU/Linux
Dist: RedHat Enterprise Linux 4
Spec:
     2 x 3.2GHz Xeon (each system, with hyperthreading so 4 logical processors)
     4GB of physical RAM
     2GB of configured swap (partition, contigious)
     2 x 1000Mbps (Intel 82546GB) network cards (HA cluster link is
               provided by a cross over cable between the two nodes)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
       [not found] ` <200505312040.30812.bernd-schubert@web.de>
@ 2005-05-31 19:00   ` Roy Keene
  2005-06-01  1:16     ` Kyle Moffett
  0 siblings, 1 reply; 13+ messages in thread
From: Roy Keene @ 2005-05-31 19:00 UTC (permalink / raw)
  To: bernd-schubert; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 1221 bytes --]

Bernd,

 	The ENBD project requires a kernel patch for 2.6 support and I 
would like to remain with the vendor supplied (and "blessed" kernel) for 
support reasons.

 	Since I cannot provide feedback on the latest version of the 
kernel and I don't want to have an "unblessed" kernel, I've opened a 
ticket with RedHat attached to my support contract for this problem.  I 
mainly posted my question here to see if this was a known issue with 2.6 
in general or NBD-specific.

 	I had not heard of "DRBD" before now, but it looks interesting.  I 
am looking into it further.

Thanks,
 	Roy Keene

On Tue, 31 May 2005, Bernd Schubert wrote:

> Hello Roy,
>
> Roy Keene wrote:
>
>> Hello,
>>
>>   I have a (well, at least one) show-stopping problems with the 2.6
>> kernel while doing heavy I/O.  I have a (software) RAID1 of network block
>> devices (nbd0 and nbd1) set up on two identical machines in an
>
> what about using drbd or enbd? AFAIK both are much better tested/suited for
> network raid.
>
> Cheers,
>  Bernd
>
>
> -- 
> Bernd Schubert
> PCI / Theoretische Chemie
> Universität Heidelberg
> INF 229
> 69120 Heidelberg
> e-mail: bernd.schubert@pci.uni-heidelberg.de
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-05-31 19:00   ` Roy Keene
@ 2005-06-01  1:16     ` Kyle Moffett
  0 siblings, 0 replies; 13+ messages in thread
From: Kyle Moffett @ 2005-06-01  1:16 UTC (permalink / raw)
  To: Roy Keene; +Cc: bernd-schubert, linux-kernel

On May 31, 2005, at 15:00:46, Roy Keene wrote:
> I had not heard of "DRBD" before now, but it looks interesting.
> I am looking into it further.

For DRBD, I recommend first installing DRBD, then setting up and
installing Heartbeat.  On Debian the process is something like
the following:

# apt-get install kernel-package kernel-source-${VERSION} drbd0.7- 
module-source drbd0.7-utils heartbeat
# cd /usr/src
# tar -xjf kernel-source-${VERSION}.tar.bz2
# cd kernel-source-${VERSION}
# cp /boot/config-${VERSION}-whatever ./.config
# make-kpkg --revision ${REVISION} --append-to-version -$ 
{MOREVERSION} --added-modules drbd --config oldconfig --us --uc  
configure modules_image
# dpkg -i /usr/src/drbd0.7-module-${VERSION}-${MOREVERSION}_$ 
{DRBD_VERSION}+${REVISION}_${ARCH}.deb

Then read the heartbeat docs to make yourself a /etc/ha.d/haresources
file.  Mine looks like this:
   # Address monarch.csl.tjhsst.edu, Kerberos master, LDAP master
   king    IPaddr::198.38.19.1 Kerberos::TJHSST.EDU::CSL.TJHSST.EDU

   # webkdc.tjhsst.edu
   king    IPaddr::198.38.19.2

   # weblogin.tjhsst.edu
   king    IPaddr::198.38.19.3

   # cups.csl.tjhsst.edu
   king    IPaddr::198.38.19.8

   # mirror.tjhsst.edu
   king    IPaddr::198.38.19.9

   # AFS volumes
   king    drbddisk::afs0  AFSMount::/vicepa
   emperor drbddisk::afs1  AFSMount::/vicepb

NOTE: AFSMount is a custom heartbeat script, and I use a slightly
modified drbddisk script as well.


Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$  
r  !y?(-)
------END GEEK CODE BLOCK------




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-05-31 15:55 Problem with 2.6 kernel and lots of I/O Roy Keene
       [not found] ` <200505312040.30812.bernd-schubert@web.de>
@ 2005-06-01 19:59 ` Pavel Machek
  2005-06-05 10:11   ` Erik Slagter
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2005-06-01 19:59 UTC (permalink / raw)
  To: Roy Keene; +Cc: linux-kernel

Hi!

> 	Start RAID in degraded mode with remote device (nbd1)
> 	Hot-add local device (nbd0)

Stop right here. You may not use nbd over loopback.

				Pavel
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-01 19:59 ` Pavel Machek
@ 2005-06-05 10:11   ` Erik Slagter
  2005-06-06  5:46     ` Kyle Moffett
  0 siblings, 1 reply; 13+ messages in thread
From: Erik Slagter @ 2005-06-05 10:11 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Roy Keene, linux-kernel

On Wed, 2005-06-01 at 21:59 +0200, Pavel Machek wrote:
> > 	Start RAID in degraded mode with remote device (nbd1)
> > 	Hot-add local device (nbd0)
> 
> Stop right here. You may not use nbd over loopback.

Any specific reason (just curious)?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-05 10:11   ` Erik Slagter
@ 2005-06-06  5:46     ` Kyle Moffett
  2005-06-20 22:19       ` Roy Keene
  0 siblings, 1 reply; 13+ messages in thread
From: Kyle Moffett @ 2005-06-06  5:46 UTC (permalink / raw)
  To: Erik Slagter; +Cc: Pavel Machek, Roy Keene, linux-kernel

On Jun 5, 2005, at 06:11:02, Erik Slagter wrote:
> On Wed, 2005-06-01 at 21:59 +0200, Pavel Machek wrote:
>
>>>     Start RAID in degraded mode with remote device (nbd1)
>>>     Hot-add local device (nbd0)
>>
>> Stop right here. You may not use nbd over loopback.
>
> Any specific reason (just curious)?

IIRC, because of the way the loopback delivers packets from the
same context as they are sent, it is possible (and quite easy)
to either deadlock or peg the CPU and make everything hang and
be unuseable.  DRBD likewise used to have problems with testing
over the loopback until they added a special configuration
option to be extra careful and yield CPU.

Cheers,
Kyle Moffet


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-06  5:46     ` Kyle Moffett
@ 2005-06-20 22:19       ` Roy Keene
  2005-06-20 23:18         ` Kyle Moffett
  0 siblings, 1 reply; 13+ messages in thread
From: Roy Keene @ 2005-06-20 22:19 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Erik Slagter, Pavel Machek, linux-kernel

All,

 	Actually, the problem I have isn't specific to the using it over 
the local device.  Quite often I have the problem where the secondary node 
goes down and comes back up after some time and needs to be resyncd.  This 
is done on the master (raid1_resync) by hot-removing /dev/nbd1 and then 
hot-adding it back.

The result ?  The slave node becomes completely unusable despite the fact 
that only nbd-server processes (two, the listener and the accepted socket) 
are running on there and nothing in the kernel context (well, at least 
w.r.t. to nbd, obviously some kernel code is involved ! :-P, but the nbd 
module doesn't even have to be loaded). And by unusable I mean I can no 
longer open files for writing, attempting to do so results in a hang until 
the resync is complete.

This is not-so-bad when the slave is being resync'd as the primary is 
still fully usable, but it really sucks when the primary goes down and 
needs to be resync'd from the secondary upon coming back up.

I'm thinking my system disks' RAID controller may be really horrible, or 
horribly supported.  I have a RAID5 (hardware, uses the megaraid_mbox 
driver) of 3 x 73gb 10K RPM SCSI-320 disks and my write performance is .. 
horrible.

I've looked at "drbd" and it looks very promising, but I haven't had a 
chance to implement it yet, but it promises to resolve my resync time 
issues at least.


 	Roy Keene
 	Planning Systems Inc.

On Mon, 6 Jun 2005, Kyle Moffett wrote:

> On Jun 5, 2005, at 06:11:02, Erik Slagter wrote:
>> On Wed, 2005-06-01 at 21:59 +0200, Pavel Machek wrote:
>> 
>>>>     Start RAID in degraded mode with remote device (nbd1)
>>>>     Hot-add local device (nbd0)
>>> 
>>> Stop right here. You may not use nbd over loopback.
>> 
>> Any specific reason (just curious)?
>
> IIRC, because of the way the loopback delivers packets from the
> same context as they are sent, it is possible (and quite easy)
> to either deadlock or peg the CPU and make everything hang and
> be unuseable.  DRBD likewise used to have problems with testing
> over the loopback until they added a special configuration
> option to be extra careful and yield CPU.
>
> Cheers,
> Kyle Moffet
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-20 22:19       ` Roy Keene
@ 2005-06-20 23:18         ` Kyle Moffett
  2005-06-20 23:54           ` Roy Keene
  0 siblings, 1 reply; 13+ messages in thread
From: Kyle Moffett @ 2005-06-20 23:18 UTC (permalink / raw)
  To: Roy Keene; +Cc: Erik Slagter, Pavel Machek, linux-kernel

On Jun 20, 2005, at 18:19:19, Roy Keene wrote:
> On Mon, 6 Jun 2005, Kyle Moffett wrote:
>>> IIRC, because of the way the loopback delivers packets from the
>>> same context as they are sent, it is possible (and quite easy)
>>> to either deadlock or peg the CPU and make everything hang and
>>> be unuseable.  DRBD likewise used to have problems with testing
>>> over the loopback until they added a special configuration
>>> option to be extra careful and yield CPU.
>
> Actually, the problem I have isn't specific to the using it over
> the local device.  Quite often I have the problem where the
> secondary node goes down and comes back up after some time and
> needs to be resyncd.  This is done on the master (raid1_resync) by
> hot-removing /dev/nbd1 and then hot-adding it back.

No, see, when you hot-add /dev/nbd1, the kernel md resync thread
begins processing the resync.  The resync operation on two nbds
involves:
   1) Send data request packet from nbd0
   2) Wait for response
   3) Send data packet to nbd1
   4) Wait for response
   5) Repeat until done

On a normal net device, the "Send data request packet" causes the
system to drop the packet on the wire and go away to do other stuff
for a while, whereas on the loopback, it can schedule immediately
to the process receiving the packet, which is the kernel itself.
The kernel then processes the packet and returns the result, over
the loopback.  It then sends the response to the other server over
a real net connection.  During most of this time, the kernel is
taking big locks and turning interrupts off and on and such, causing
massive hangs until resync finishes.  Since you mentioned bad write
performance with your RAID controller, I suspect its driver may also
turn off interrupts, take excessive locks, or do other madness,
further worsening system responsiveness.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so simple that there are obviously no deficiencies. And the  
other way is to make it so complicated that there are no obvious  
deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-20 23:18         ` Kyle Moffett
@ 2005-06-20 23:54           ` Roy Keene
  2005-06-21  2:47             ` Kyle Moffett
  2005-06-21  7:41             ` Pavel Machek
  0 siblings, 2 replies; 13+ messages in thread
From: Roy Keene @ 2005-06-20 23:54 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Erik Slagter, Pavel Machek, linux-kernel

But the problem doesn't occur with the "local" end, it's with the 
"recieving" end (which may be the same thing, but mostly it's not, since I 
tend to reboot the secondary node more).

The problem occurs on the node running `nbd-server' in userspace and not 
nessicarily having "nbd" support.

"nbd1" is a remote nbd device to the secondary server, which then becomes 
highly unusable.  I'm not sure what you're attempting to convey to me, as 
the server that is running raid1_resync (reading from nbd0, which 
cooresponds with a local nbd-client binding) is perfectly usable in the 
example I gave, but the remote node is not...


 	Roy Keene
 	Planning Systems Inc.

On Mon, 20 Jun 2005, Kyle Moffett wrote:

> On Jun 20, 2005, at 18:19:19, Roy Keene wrote:
>> On Mon, 6 Jun 2005, Kyle Moffett wrote:
>>>> IIRC, because of the way the loopback delivers packets from the
>>>> same context as they are sent, it is possible (and quite easy)
>>>> to either deadlock or peg the CPU and make everything hang and
>>>> be unuseable.  DRBD likewise used to have problems with testing
>>>> over the loopback until they added a special configuration
>>>> option to be extra careful and yield CPU.
>> 
>> Actually, the problem I have isn't specific to the using it over
>> the local device.  Quite often I have the problem where the
>> secondary node goes down and comes back up after some time and
>> needs to be resyncd.  This is done on the master (raid1_resync) by
>> hot-removing /dev/nbd1 and then hot-adding it back.
>
> No, see, when you hot-add /dev/nbd1, the kernel md resync thread
> begins processing the resync.  The resync operation on two nbds
> involves:
>  1) Send data request packet from nbd0
>  2) Wait for response
>  3) Send data packet to nbd1
>  4) Wait for response
>  5) Repeat until done
>
> On a normal net device, the "Send data request packet" causes the
> system to drop the packet on the wire and go away to do other stuff
> for a while, whereas on the loopback, it can schedule immediately
> to the process receiving the packet, which is the kernel itself.
> The kernel then processes the packet and returns the result, over
> the loopback.  It then sends the response to the other server over
> a real net connection.  During most of this time, the kernel is
> taking big locks and turning interrupts off and on and such, causing
> massive hangs until resync finishes.  Since you mentioned bad write
> performance with your RAID controller, I suspect its driver may also
> turn off interrupts, take excessive locks, or do other madness,
> further worsening system responsiveness.
>
> Cheers,
> Kyle Moffett
>
> --
> There are two ways of constructing a software design. One way is to make it 
> so simple that there are obviously no deficiencies. And the other way is to 
> make it so complicated that there are no obvious deficiencies.
> -- C.A.R. Hoare
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-20 23:54           ` Roy Keene
@ 2005-06-21  2:47             ` Kyle Moffett
  2005-06-21  7:41             ` Pavel Machek
  1 sibling, 0 replies; 13+ messages in thread
From: Kyle Moffett @ 2005-06-21  2:47 UTC (permalink / raw)
  To: Roy Keene; +Cc: Erik Slagter, Pavel Machek, linux-kernel

On Jun 20, 2005, at 19:54:23, Roy Keene wrote:
> But the problem doesn't occur with the "local" end, it's with the  
> "recieving" end (which may be the same thing, but mostly it's not,  
> since I tend to reboot the secondary node more).
>
> The problem occurs on the node running `nbd-server' in userspace  
> and not nessicarily having "nbd" support.
>
> "nbd1" is a remote nbd device to the secondary server, which then  
> becomes highly unusable.  I'm not sure what you're attempting to  
> convey to me, as the server that is running raid1_resync (reading  
> from nbd0, which cooresponds with a local nbd-client binding) is  
> perfectly usable in the example I gave, but the remote node is not...

Oh!  Sorry, I got your systems confused.  In that case, you are  
definitely
having a SCSI RAID controller or driver issue.  Please forgive the  
confusion.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so simple that there are obviously no deficiencies. And the  
other way is to make it so complicated that there are no obvious  
deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-20 23:54           ` Roy Keene
  2005-06-21  2:47             ` Kyle Moffett
@ 2005-06-21  7:41             ` Pavel Machek
  2005-06-21 14:23               ` Roy Keene
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2005-06-21  7:41 UTC (permalink / raw)
  To: Roy Keene; +Cc: Kyle Moffett, Erik Slagter, linux-kernel

Hi!

> But the problem doesn't occur with the "local" end, it's with the 
> "recieving" end (which may be the same thing, but mostly it's not, since I 
> tend to reboot the secondary node more).
> 
> The problem occurs on the node running `nbd-server' in userspace and not 
> nessicarily having "nbd" support.

nbd-server is nice and simple userland application, doing no magic. If
that makes machine unusable... well, fix the machine ;-). It may me mm
problem or something... It is basically not nbd related. [Remember,
nbd-server is just another userland process, "nothing to do with nbd",
nothing special].
									Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
  2005-06-21  7:41             ` Pavel Machek
@ 2005-06-21 14:23               ` Roy Keene
  0 siblings, 0 replies; 13+ messages in thread
From: Roy Keene @ 2005-06-21 14:23 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kyle Moffett, Erik Slagter, linux-kernel

Exactly my point.  The problem isn't the NBD, it's the lots of I/O.

 	Roy Keene
 	Planning Systems Inc.

On Tue, 21 Jun 2005, Pavel Machek wrote:

> Hi!
>
>> But the problem doesn't occur with the "local" end, it's with the
>> "recieving" end (which may be the same thing, but mostly it's not, since I
>> tend to reboot the secondary node more).
>>
>> The problem occurs on the node running `nbd-server' in userspace and not
>> nessicarily having "nbd" support.
>
> nbd-server is nice and simple userland application, doing no magic. If
> that makes machine unusable... well, fix the machine ;-). It may me mm
> problem or something... It is basically not nbd related. [Remember,
> nbd-server is just another userland process, "nothing to do with nbd",
> nothing special].
> 									Pavel
> -- 
> teflon -- maybe it is a trademark, but it should not be.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with 2.6 kernel and lots of I/O
@ 2005-05-31 16:12 Parag Warudkar
  0 siblings, 0 replies; 13+ messages in thread
From: Parag Warudkar @ 2005-05-31 16:12 UTC (permalink / raw)
  To: Roy Keene, linux-kernel

 > Info: Linux cog1 2.6.9-5.0.5.ELsmp #1 SMP Fri Apr 8 14:29:47 EDT 2005 i686 i686 
> i386 GNU/Linux
> Dist: RedHat Enterprise Linux 4
> Spec:
>      2 x 3.2GHz Xeon (each system, with hyperthreading so 4 logical processors)
>      4GB of physical RAM
>      2GB of configured swap (partition, contigious)
>      2 x 1000Mbps (Intel 82546GB) network cards (HA cluster link is
>                provided by a cross over cable between the two nodes)
> -

Since you are using a vendor kernel which is older than the current 2.6 kernel.org one - you are better off posting to appropriate vendor mailing  list or ask them for support if you have a contract. Or else, try to reproduce the problem with latest kernel.org kernel and then re-post the information here.

Parag



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-06-21 14:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-31 15:55 Problem with 2.6 kernel and lots of I/O Roy Keene
     [not found] ` <200505312040.30812.bernd-schubert@web.de>
2005-05-31 19:00   ` Roy Keene
2005-06-01  1:16     ` Kyle Moffett
2005-06-01 19:59 ` Pavel Machek
2005-06-05 10:11   ` Erik Slagter
2005-06-06  5:46     ` Kyle Moffett
2005-06-20 22:19       ` Roy Keene
2005-06-20 23:18         ` Kyle Moffett
2005-06-20 23:54           ` Roy Keene
2005-06-21  2:47             ` Kyle Moffett
2005-06-21  7:41             ` Pavel Machek
2005-06-21 14:23               ` Roy Keene
2005-05-31 16:12 Parag Warudkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.