linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Kernel 2.5] Qlogic 2x00 driver
@ 2002-10-15 19:20 Simon Roscic
  2002-10-15 19:31 ` Arjan van de Ven
  2002-10-16  5:02 ` GrandMasterLee
  0 siblings, 2 replies; 35+ messages in thread
From: Simon Roscic @ 2002-10-15 19:20 UTC (permalink / raw)
  To: linux-kernel

hi,

as the feature freeze of 2.5 comes close, i want to ask if the driver for
the qlogic sanblade 2200/2300 series of hba's will be included in 2.5 ...
are there any plan's to do so ?   has it been discussed before ?

i ask because i use those hba's together with ibm's fastt500 storage system,
and it will be nice to have this driver in the default kernel ...

i use version 5.36.3 of the qlogic 2x00 driver in production
(vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
and i never had any problems with this driver ...
(2 lotus domino servers and 1 fileserver all 3 are attached to the ibm fastt500
storage system using qlogic sanblade 2200 cards)

i don't know how many people use those qlogic card's, i got them together
with the fastt500 storage system, ...

the current driver is avaiable here (it's GPL't):
http://www.qlogic.com/support/os_detail.asp?productid=112&osid=26

the qlogic 2x00 driver is also in andrea arcangelis "-aa" patches, so possibly
he knows better if it would be useful to integrate those drivers into 2.5 or
not, ...

thanks for your time,
please CC me, i'm currently not subscribed to lkml,
simon.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-15 19:20 [Kernel 2.5] Qlogic 2x00 driver Simon Roscic
@ 2002-10-15 19:31 ` Arjan van de Ven
  2002-10-15 19:53   ` Simon Roscic
  2002-10-16  5:02 ` GrandMasterLee
  1 sibling, 1 reply; 35+ messages in thread
From: Arjan van de Ven @ 2002-10-15 19:31 UTC (permalink / raw)
  To: Simon Roscic; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

On Tue, 2002-10-15 at 21:20, Simon Roscic wrote:
> hi,
> 
> as the feature freeze of 2.5 comes close, i want to ask if the driver for
> the qlogic sanblade 2200/2300 series of hba's will be included in 2.5 ...
> are there any plan's to do so ?   has it been discussed before ?
> 
> i ask because i use those hba's together with ibm's fastt500 storage system,
> and it will be nice to have this driver in the default kernel ...
> 
> i use version 5.36.3 of the qlogic 2x00 driver in production
> (vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
> and i never had any problems with this driver ...

Oh so you haven't notices how it buffer-overflows the kernel stack, how
it has major stack hog issues, how it keeps the io request lock (and
interrupts disabled) for a WEEK ?





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-15 19:31 ` Arjan van de Ven
@ 2002-10-15 19:53   ` Simon Roscic
  2002-10-16  2:51     ` Michael Clark
  0 siblings, 1 reply; 35+ messages in thread
From: Simon Roscic @ 2002-10-15 19:53 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

On Tuesday 15 October 2002 21:31, Arjan van de Ven <arjanv@redhat.com> wrote:
> Oh so you haven't notices how it buffer-overflows the kernel stack, how
> it has major stack hog issues, how it keeps the io request lock (and
> interrupts disabled) for a WEEK ?

doesn't sound good, ...
as i said, i don't have any problems (= failures, data loss, etc.) with this driver, 
sounds like i should update to a newer driver version, wich qlogic 2x00 driver 
version do you recommend ?   or does this affect all versions of this driver ?

(performance on the machines i use is quite good, a dbench 256 gave me
approx. 60 mb/s (ibm xseries 342, 1x pentium 3 - 1,2 ghz, 512 - 1024 mb ram))

arjan, thanks for the info, i didn't notice that the driver was that bad,
i had much to do in the past months, so i possibly missed disscussions
about the qlogic2x00 stuff, sorry,

simon.
(please CC me, i'm not subscribed to lkml)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-15 19:53   ` Simon Roscic
@ 2002-10-16  2:51     ` Michael Clark
  2002-10-16  3:56       ` GrandMasterLee
  2002-10-16 16:28       ` Simon Roscic
  0 siblings, 2 replies; 35+ messages in thread
From: Michael Clark @ 2002-10-16  2:51 UTC (permalink / raw)
  To: Simon Roscic; +Cc: Arjan van de Ven, linux-kernel

Version 6.1b5 does appear to be a big improvement from looking
at the code (certainly much more readable than version 4.x end earlier).

Although the method for creating the different modules for
different hardware is pretty ugly.

in qla2300.c

#define ISP2300
[snip]
#include "qla2x00.c"

in qla2200.c

#define ISP2200
[snip]
#include "qla2x00.c"

I'm sure this would have to go before it got it.

~mc

On 10/16/02 03:53, Simon Roscic wrote:
> On Tuesday 15 October 2002 21:31, Arjan van de Ven <arjanv@redhat.com> wrote:
> 
>>Oh so you haven't notices how it buffer-overflows the kernel stack, how
>>it has major stack hog issues, how it keeps the io request lock (and
>>interrupts disabled) for a WEEK ?

This may have been the cause of problems I had running qla driver with
lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.

This was happening in pretty much all kernels I tried (a variety of
redhat kernels and aa kernels). Removing LVM has solved the problem.
Although i was blaming LVM - maybe it was a buffer overflow in qla driver.

The rh kernel I tried had quite an old version (4.31) of the driver
suffered from problems recovering from LIP resets. The latest 6.x drivers
seem to handle this much better.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  2:51     ` Michael Clark
@ 2002-10-16  3:56       ` GrandMasterLee
  2002-10-16  4:30         ` Michael Clark
  2002-10-16 16:28       ` Simon Roscic
  1 sibling, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  3:56 UTC (permalink / raw)
  To: Michael Clark; +Cc: Simon Roscic, Arjan van de Ven, linux-kernel

You might wanna look at version 6.01 instead. I say this because it's
*not* a beta driver. 


On Tue, 2002-10-15 at 21:51, Michael Clark wrote:
> Version 6.1b5 does appear to be a big improvement from looking
> at the code (certainly much more readable than version 4.x end earlier).
> 
> Although the method for creating the different modules for
> different hardware is pretty ugly.
> 
> in qla2300.c
> 
> #define ISP2300
> [snip]
> #include "qla2x00.c"
> 
> in qla2200.c
> 
> #define ISP2200
> [snip]
> #include "qla2x00.c"
> 
> I'm sure this would have to go before it got it.
> 
> ~mc
> 
> On 10/16/02 03:53, Simon Roscic wrote:
> > On Tuesday 15 October 2002 21:31, Arjan van de Ven <arjanv@redhat.com> wrote:
> > 
> >>Oh so you haven't notices how it buffer-overflows the kernel stack, how
> >>it has major stack hog issues, how it keeps the io request lock (and
> >>interrupts disabled) for a WEEK ?
> 
> This may have been the cause of problems I had running qla driver with
> lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.
> 
> This was happening in pretty much all kernels I tried (a variety of
> redhat kernels and aa kernels). Removing LVM has solved the problem.
> Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
> 
> The rh kernel I tried had quite an old version (4.31) of the driver
> suffered from problems recovering from LIP resets. The latest 6.x drivers
> seem to handle this much better.
> 
> ~mc
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  3:56       ` GrandMasterLee
@ 2002-10-16  4:30         ` Michael Clark
  2002-10-16  4:35           ` J Sloan
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-16  4:30 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: Simon Roscic, Arjan van de Ven, linux-kernel

I doubt it will make a difference. LVM and qlogic drivers seem
to be a bad mix. I've already tried the beta5 of 6.01
and same problem exists - ooops about every 5-8 days.
Removing LVM and solved the problem.

The changelog only lists small changes since between 6.01 and 6.01b5

Although one entry suggests a fix to a race in qla2x00_done
that would allow multiple completions on the same IO. Not sure
if this relates to my problem with LVM as this occured with
earlier versions of the qlogic driver without the dpc threads.

~mc

On 10/16/02 11:56, GrandMasterLee wrote:
> You might wanna look at version 6.01 instead. I say this because it's
> *not* a beta driver. 
> 
> 
> On Tue, 2002-10-15 at 21:51, Michael Clark wrote:
> 
>>Version 6.1b5 does appear to be a big improvement from looking
>>at the code (certainly much more readable than version 4.x end earlier).
>>
>>Although the method for creating the different modules for
>>different hardware is pretty ugly.
>>
>>in qla2300.c
>>
>>#define ISP2300
>>[snip]
>>#include "qla2x00.c"
>>
>>in qla2200.c
>>
>>#define ISP2200
>>[snip]
>>#include "qla2x00.c"
>>
>>I'm sure this would have to go before it got it.
>>
>>~mc
>>
>>On 10/16/02 03:53, Simon Roscic wrote:
>>
>>>On Tuesday 15 October 2002 21:31, Arjan van de Ven <arjanv@redhat.com> wrote:
>>>
>>>
>>>>Oh so you haven't notices how it buffer-overflows the kernel stack, how
>>>>it has major stack hog issues, how it keeps the io request lock (and
>>>>interrupts disabled) for a WEEK ?
>>>
>>This may have been the cause of problems I had running qla driver with
>>lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.
>>
>>This was happening in pretty much all kernels I tried (a variety of
>>redhat kernels and aa kernels). Removing LVM has solved the problem.
>>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
>>
>>The rh kernel I tried had quite an old version (4.31) of the driver
>>suffered from problems recovering from LIP resets. The latest 6.x drivers
>>seem to handle this much better.
>>
>>~mc
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Michael Clark,  . . . . . . . . . . . . . . .  michael@metaparadigm.com
Managing Director,  . . . . . . . . . . . . . . .  phone: +65 6395 6277
Metaparadigm Pte. Ltd.  . . . . . . . . . . . . . mobile: +65 9645 9612
25F Paterson Road, Singapore 238515  . . . . . . . . fax: +65 6234 4043

I'm successful because I'm lucky.  The harder I work, the luckier I get.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  4:30         ` Michael Clark
@ 2002-10-16  4:35           ` J Sloan
  2002-10-16  4:43             ` GrandMasterLee
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: J Sloan @ 2002-10-16  4:35 UTC (permalink / raw)
  To: Michael Clark
  Cc: GrandMasterLee, Simon Roscic, Arjan van de Ven, linux-kernel

Just to make sure we are on the same page,
was that LVM1, LVM2, or EVMS?

Joe

Michael Clark wrote:

> I doubt it will make a difference. LVM and qlogic drivers seem
> to be a bad mix. I've already tried the beta5 of 6.01
> and same problem exists - ooops about every 5-8 days.
> Removing LVM and solved the problem.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  4:35           ` J Sloan
@ 2002-10-16  4:43             ` GrandMasterLee
  2002-10-16  6:03               ` Michael Clark
  2002-10-16  4:58             ` GrandMasterLee
  2002-10-16  5:28             ` Michael Clark
  2 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  4:43 UTC (permalink / raw)
  To: J Sloan; +Cc: Michael Clark, Simon Roscic, Arjan van de Ven, linux-kernel

My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
and up, using LVM in a production capacity. Both servers we have, have
crashed mysteriously, without any kernel dump, etc, but all hardware
diags come out clean.

All hardware configuration bits are perfect, as can be anyway, and we
still get this behaviour. After 5-6.5 days...the box black screens. So
bad so, that all the XFS volumes we have, never enter a shutdown. We
must repair them all, today this happened, and we lost one part of the
tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.




On Tue, 2002-10-15 at 23:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
> 
> Joe
> 
> Michael Clark wrote:
> 
> > I doubt it will make a difference. LVM and qlogic drivers seem
> > to be a bad mix. I've already tried the beta5 of 6.01
> > and same problem exists - ooops about every 5-8 days.
> > Removing LVM and solved the problem.
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  4:35           ` J Sloan
  2002-10-16  4:43             ` GrandMasterLee
@ 2002-10-16  4:58             ` GrandMasterLee
  2002-10-16  5:28             ` Michael Clark
  2 siblings, 0 replies; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  4:58 UTC (permalink / raw)
  To: J Sloan; +Cc: Michael Clark, Simon Roscic, Arjan van de Ven, linux-kernel

On Tue, 2002-10-15 at 23:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
> 
> Joe
> 
> Michael Clark wrote:
> 


Quick question on this, could this problem be exacerbated, perhaps, by
large pagebuf usage that XFS performs, as well as the FS buffers that it
allocates, since XFS allocates a lot of read/write buffers for logging?





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-15 19:20 [Kernel 2.5] Qlogic 2x00 driver Simon Roscic
  2002-10-15 19:31 ` Arjan van de Ven
@ 2002-10-16  5:02 ` GrandMasterLee
  2002-10-16 16:38   ` Simon Roscic
  1 sibling, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  5:02 UTC (permalink / raw)
  To: Simon Roscic; +Cc: linux-kernel

On Tue, 2002-10-15 at 14:20, Simon Roscic wrote:
> hi,
...
> i ask because i use those hba's together with ibm's fastt500 storage system,
> and it will be nice to have this driver in the default kernel ...
> 
> i use version 5.36.3 of the qlogic 2x00 driver in production
> (vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
> and i never had any problems with this driver ...
> (2 lotus domino servers and 1 fileserver all 3 are attached to the ibm fastt500
> storage system using qlogic sanblade 2200 cards)


Do you use LVM, EVMS, MD, other, or none?

TIA

--The GrandMaster

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  4:35           ` J Sloan
  2002-10-16  4:43             ` GrandMasterLee
  2002-10-16  4:58             ` GrandMasterLee
@ 2002-10-16  5:28             ` Michael Clark
  2002-10-16  5:40               ` Andreas Dilger
  2 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-16  5:28 UTC (permalink / raw)
  To: J Sloan; +Cc: GrandMasterLee, Simon Roscic, Arjan van de Ven, linux-kernel

LVM1 tried in numerous versions of 2.4.x both aa and rh version.

Every one i was getting oops when used with a combination
of ext3, LVM1 and qla2x00 driver.

Since taking LVM1 out of the picture, my oopsing problem has
gone away. This could of course not be LVM1's fault but the
fact that qla driver is a stack hog or something - i don't have
enough information to draw any conclusions all at the moment
i'm too scared to try LVM again (plus the time it takes to
migrate a few hundred gigs of storage).

~mc

On 10/16/02 12:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
> 
> Joe
> 
> Michael Clark wrote:
> 
>> I doubt it will make a difference. LVM and qlogic drivers seem
>> to be a bad mix. I've already tried the beta5 of 6.01
>> and same problem exists - ooops about every 5-8 days.
>> Removing LVM and solved the problem.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  5:28             ` Michael Clark
@ 2002-10-16  5:40               ` Andreas Dilger
  2002-10-17  1:59                 ` Andrew Vasquez
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2002-10-16  5:40 UTC (permalink / raw)
  To: Michael Clark
  Cc: J Sloan, GrandMasterLee, Simon Roscic, Arjan van de Ven, linux-kernel

On Oct 16, 2002  13:28 +0800, Michael Clark wrote:
> Every one i was getting oops when used with a combination
> of ext3, LVM1 and qla2x00 driver.
> 
> Since taking LVM1 out of the picture, my oopsing problem has
> gone away. This could of course not be LVM1's fault but the
> fact that qla driver is a stack hog or something - i don't have
> enough information to draw any conclusions all at the moment
> i'm too scared to try LVM again (plus the time it takes to
> migrate a few hundred gigs of storage).

Yes, we have seen that ext3 is a stack hog in some cases, and I
know there were some fixes in later LVM versions to remove some
huge stack allocations.  Arjan also reported stack problems with
qla2x00, so it is not a surprise that the combination causes
problems.

In 2.5 there is the "4k IRQ stack" patch floating around, which
would avoid these problems.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  4:43             ` GrandMasterLee
@ 2002-10-16  6:03               ` Michael Clark
  2002-10-16  6:31                 ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-16  6:03 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: J Sloan, Simon Roscic, Arjan van de Ven, linux-kernel

On 10/16/02 12:43, GrandMasterLee wrote:
> My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
> and up, using LVM in a production capacity. Both servers we have, have
> crashed mysteriously, without any kernel dump, etc, but all hardware
> diags come out clean.

I tell you my honest hunch - remove LVM and try again. This has made
my life a little more peaceful lately. Even with a 2-3 minute outages
while our cluster automatically fails over - the 100's of users whining
about their sessions being disconnected makes you a bit depressed.

> All hardware configuration bits are perfect, as can be anyway, and we
> still get this behaviour. After 5-6.5 days...the box black screens. So
> bad so, that all the XFS volumes we have, never enter a shutdown. We
> must repair them all, today this happened, and we lost one part of the
> tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.

We had the black screen also until we got the machines oopsing over
serial. The oops was actually showing up in ext3 with a corrupted
bufferhead. Without LVM, i've measured my longest uptime, 17 days x
4 machines in the cluster (68 days) ie. we only did it 17 days ago.

~mc


> 
> 
> 
> 
> On Tue, 2002-10-15 at 23:35, J Sloan wrote:
> 
>>Just to make sure we are on the same page,
>>was that LVM1, LVM2, or EVMS?
>>
>>Joe
>>
>>Michael Clark wrote:
>>
>>
>>>I doubt it will make a difference. LVM and qlogic drivers seem
>>>to be a bad mix. I've already tried the beta5 of 6.01
>>>and same problem exists - ooops about every 5-8 days.
>>>Removing LVM and solved the problem.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  6:03               ` Michael Clark
@ 2002-10-16  6:31                 ` GrandMasterLee
  2002-10-16  6:40                   ` Michael Clark
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  6:31 UTC (permalink / raw)
  To: Michael Clark; +Cc: J Sloan, Simon Roscic, Arjan van de Ven, linux-kernel

On Wed, 2002-10-16 at 01:03, Michael Clark wrote:
> On 10/16/02 12:43, GrandMasterLee wrote:
> > My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
> > and up, using LVM in a production capacity. Both servers we have, have
> > crashed mysteriously, without any kernel dump, etc, but all hardware
> > diags come out clean.
> 
> I tell you my honest hunch - remove LVM and try again. This has made
> my life a little more peaceful lately. Even with a 2-3 minute outages
> while our cluster automatically fails over - the 100's of users whining
> about their sessions being disconnected makes you a bit depressed.

Almost making it to your go-live date, only to have everything come
crashing down all around you is quite depressing.

> > All hardware configuration bits are perfect, as can be anyway, and we
> > still get this behaviour. After 5-6.5 days...the box black screens. So
> > bad so, that all the XFS volumes we have, never enter a shutdown. We
> > must repair them all, today this happened, and we lost one part of the
> > tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
> 
> We had the black screen also until we got the machines oopsing over
> serial. The oops was actually showing up in ext3 with a corrupted
> bufferhead. Without LVM, i've measured my longest uptime, 17 days x
> 4 machines in the cluster (68 days) ie. we only did it 17 days ago.
> 
> ~mc


I believe you, that was my next thought, but I didn't know if that would
really help just to be honest. Thanks for the input there. 

I've been going crazy trying to catch any piece of sanity out of this
thing to understand if this was what was happening or not. I feel a bit
dumb for not trying serial console yet, but I knew either that or KDB
should tell us something. I will see what we can do, it will take less
time to do this, than to reload everything all over again.

Should I remove LVM all together, or just not use it? In your opinion.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  6:31                 ` GrandMasterLee
@ 2002-10-16  6:40                   ` Michael Clark
  2002-10-16  6:48                     ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-16  6:40 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: J Sloan, Simon Roscic, Arjan van de Ven, linux-kernel

On 10/16/02 14:31, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 01:03, Michael Clark wrote:
> 
>>On 10/16/02 12:43, GrandMasterLee wrote:
>>
[snip]
>>>All hardware configuration bits are perfect, as can be anyway, and we
>>>still get this behaviour. After 5-6.5 days...the box black screens. So
>>>bad so, that all the XFS volumes we have, never enter a shutdown. We
>>>must repair them all, today this happened, and we lost one part of the
>>>tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
>>
>>We had the black screen also until we got the machines oopsing over
>>serial. The oops was actually showing up in ext3 with a corrupted
>>bufferhead. Without LVM, i've measured my longest uptime, 17 days x
>>4 machines in the cluster (68 days) ie. we only did it 17 days ago.
> 
> I believe you, that was my next thought, but I didn't know if that would
> really help just to be honest. Thanks for the input there. 
> 
> I've been going crazy trying to catch any piece of sanity out of this
> thing to understand if this was what was happening or not. I feel a bit
> dumb for not trying serial console yet, but I knew either that or KDB
> should tell us something. I will see what we can do, it will take less
> time to do this, than to reload everything all over again.
> 
> Should I remove LVM all together, or just not use it? In your opinion.

I just didn't load the module after migrating my volumes. If the problem
is a stack problem, then its probably not necessarily a bug in LVM
- just the combination of it, ext3 and the qlogic driver don't mix well
- so if its not being used, then it won't be increasing the stack footprint.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  6:40                   ` Michael Clark
@ 2002-10-16  6:48                     ` GrandMasterLee
  2002-10-16  6:59                       ` Michael Clark
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-16  6:48 UTC (permalink / raw)
  To: Michael Clark; +Cc: J Sloan, Simon Roscic, Arjan van de Ven, linux-kernel

On Wed, 2002-10-16 at 01:40, Michael Clark wrote:
...
> > Should I remove LVM all together, or just not use it? In your opinion.
> 
> I just didn't load the module after migrating my volumes. If the problem
> is a stack problem, then its probably not necessarily a bug in LVM
> - just the combination of it, ext3 and the qlogic driver don't mix well
> - so if its not being used, then it won't be increasing the stack footprint.
> 
> ~mc
> 

Not to be dense, but it's compiled into my kernel, that's why I ask. We
try not to use modules where we can help it. So I'm thinking, if no VG
are actively used, then LVM won't affect the stack much. I just don't
know if that's true or not. 

 --The GrandMaster

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  6:48                     ` GrandMasterLee
@ 2002-10-16  6:59                       ` Michael Clark
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Clark @ 2002-10-16  6:59 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: J Sloan, Simon Roscic, Arjan van de Ven, linux-kernel

On 10/16/02 14:48, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 01:40, Michael Clark wrote:
> ...
> 
>>>Should I remove LVM all together, or just not use it? In your opinion.
>>
>>I just didn't load the module after migrating my volumes. If the problem
>>is a stack problem, then its probably not necessarily a bug in LVM
>>- just the combination of it, ext3 and the qlogic driver don't mix well
>>- so if its not being used, then it won't be increasing the stack footprint.
> 
> Not to be dense, but it's compiled into my kernel, that's why I ask. We
> try not to use modules where we can help it. So I'm thinking, if no VG
> are actively used, then LVM won't affect the stack much. I just don't
> know if that's true or not. 

Correct. Won't effect the stack at all.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  2:51     ` Michael Clark
  2002-10-16  3:56       ` GrandMasterLee
@ 2002-10-16 16:28       ` Simon Roscic
  2002-10-16 16:49         ` Michael Clark
  1 sibling, 1 reply; 35+ messages in thread
From: Simon Roscic @ 2002-10-16 16:28 UTC (permalink / raw)
  To: Michael Clark; +Cc: linux-kernel

On Wednesday 16 October 2002 04:51, Michael Clark <michael@metaparadigm.com>  wrote:
> Version 6.1b5 does appear to be a big improvement from looking
> at the code (certainly much more readable than version 4.x end earlier).
i'll try version 6.01 or so next week and i will see what happens.
thanks for your help.

> Although the method for creating the different modules for
> different hardware is pretty ugly.
>...
i see.

> This was happening in pretty much all kernels I tried (a variety of
> redhat kernels and aa kernels). Removing LVM has solved the problem.
> Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
looks like i had a lot of luck, because my 3 servers wich are using the
qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
in the next few day's.

i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for 
not running into this problem, ...


simon.
(please CC me, i'm not subscribed to lkml)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  5:02 ` GrandMasterLee
@ 2002-10-16 16:38   ` Simon Roscic
  2002-10-17  3:08     ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Simon Roscic @ 2002-10-16 16:38 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

On Wednesday 16 October 2002 07:02, GrandMasterLee <masterlee@digitalroadkill.net> wrote:
> Do you use LVM, EVMS, MD, other, or none?
>

none.
it's a XFS filesystem with the folowing mount options:
rw,noatime,logbufs=8,logbsize=32768

(this apply's to all 3 machines)

simon.
(please CC me, i'm not subscribed to lkml)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16 16:28       ` Simon Roscic
@ 2002-10-16 16:49         ` Michael Clark
  2002-10-17  3:12           ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-16 16:49 UTC (permalink / raw)
  To: Simon Roscic; +Cc: linux-kernel

On 10/17/02 00:28, Simon Roscic wrote:

>>This was happening in pretty much all kernels I tried (a variety of
>>redhat kernels and aa kernels). Removing LVM has solved the problem.
>>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
> 
> looks like i had a lot of luck, because my 3 servers wich are using the
> qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
> in the next few day's.
> 
> i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for 
> not running into this problem, ...

Seems to be the correlation so far. qlogic driver without lvm works okay.
qlogic driver with lvm, oopsorama.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16  5:40               ` Andreas Dilger
@ 2002-10-17  1:59                 ` Andrew Vasquez
  2002-10-17  2:44                   ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Vasquez @ 2002-10-17  1:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Michael Clark, J Sloan, GrandMasterLee, Simon Roscic, Arjan van de Ven

On Tue, 15 Oct 2002, Andreas Dilger wrote:

> On Oct 16, 2002  13:28 +0800, Michael Clark wrote:
> > Every one i was getting oops when used with a combination
> > of ext3, LVM1 and qla2x00 driver.
> > 
> > Since taking LVM1 out of the picture, my oopsing problem has
> > gone away. This could of course not be LVM1's fault but the
> > fact that qla driver is a stack hog or something - i don't have
> > enough information to draw any conclusions all at the moment
> > i'm too scared to try LVM again (plus the time it takes to
> > migrate a few hundred gigs of storage).
> 
> Yes, we have seen that ext3 is a stack hog in some cases, and I
> know there were some fixes in later LVM versions to remove some
> huge stack allocations.  Arjan also reported stack problems with
> qla2x00, so it is not a surprise that the combination causes
> problems.
> 
The stack issues were a major problem in the 5.3x series driver.  I
believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
the fixes.

IAC, I believe the support tech working with MasterLee had asked 
for additional information regarding the configuration as well as
some basic logs.  Ideally we'd like to setup a similiar configuration
in house and see what's happening...

--
Andrew Vasquez | praka@san.rr.com |
DSS: 0x508316BB, FP: 79BD 4FAC 7E82 FF70 6C2B  7E8B 168F 5529 5083 16BB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  1:59                 ` Andrew Vasquez
@ 2002-10-17  2:44                   ` GrandMasterLee
  2002-10-17  3:11                     ` Andrew Vasquez
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-17  2:44 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: linux-kernel, Michael Clark, J Sloan, Simon Roscic, Arjan van de Ven

On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> > Yes, we have seen that ext3 is a stack hog in some cases, and I
> > know there were some fixes in later LVM versions to remove some
> > huge stack allocations.  Arjan also reported stack problems with
> > qla2x00, so it is not a surprise that the combination causes
> > problems.
> > 
> The stack issues were a major problem in the 5.3x series driver.  I
> believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> the fixes.

Does this mean that 6.01 will NOT work either? What drivers will be
affected? We've already made the move to remove LVM from the mix, but
your comments above give me some doubt as to how definite it is, that
the stack clobbering will be fixed by doing so. 


> IAC, I believe the support tech working with MasterLee had asked 
> for additional information regarding the configuration as well as
> some basic logs.  Ideally we'd like to setup a similiar configuration
> in house and see what's happening...

In-house? Just curious. What can "I" do to know if our configuration
won't get broken, just by removing LVM? TIA.


> --
> Andrew Vasquez | praka@san.rr.com |
> DSS: 0x508316BB, FP: 79BD 4FAC 7E82 FF70 6C2B  7E8B 168F 5529 5083 16BB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16 16:38   ` Simon Roscic
@ 2002-10-17  3:08     ` GrandMasterLee
  2002-10-17 17:47       ` Simon Roscic
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-17  3:08 UTC (permalink / raw)
  To: Simon Roscic; +Cc: linux-kernel

Do you actually get the lockups then?

On Wed, 2002-10-16 at 11:38, Simon Roscic wrote:
> On Wednesday 16 October 2002 07:02, GrandMasterLee <masterlee@digitalroadkill.net> wrote:
> > Do you use LVM, EVMS, MD, other, or none?
> >
> 
> none.
> it's a XFS filesystem with the folowing mount options:
> rw,noatime,logbufs=8,logbsize=32768
> 
> (this apply's to all 3 machines)
> 
> simon.
> (please CC me, i'm not subscribed to lkml)
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  2:44                   ` GrandMasterLee
@ 2002-10-17  3:11                     ` Andrew Vasquez
  2002-10-17  3:42                       ` GrandMasterLee
  2002-10-17  9:40                       ` Michael Clark
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Vasquez @ 2002-10-17  3:11 UTC (permalink / raw)
  To: linux-kernel

On Wed, 16 Oct 2002, GrandMasterLee wrote:

> On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> > > Yes, we have seen that ext3 is a stack hog in some cases, and I
> > > know there were some fixes in later LVM versions to remove some
> > > huge stack allocations.  Arjan also reported stack problems with
> > > qla2x00, so it is not a surprise that the combination causes
> > > problems.
> > > 
> > The stack issues were a major problem in the 5.3x series driver.  I
> > believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> > contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> > the fixes.
> 
> Does this mean that 6.01 will NOT work either? What drivers will be
> affected? We've already made the move to remove LVM from the mix, but
> your comments above give me some doubt as to how definite it is, that
> the stack clobbering will be fixed by doing so. 
> 
The 6.x series driver basically branched from the 5.x series driver.  
Changes made, many moons ago, are already in the 6.x series driver.
To quell your concerns, yes, stack overflow is not an issue with the
6.x series driver. 

I believe if we are to get anywhere regarding this issue, we need to 
shift focus from stack corruption in early versions of the driver.

> > IAC, I believe the support tech working with MasterLee had asked 
> > for additional information regarding the configuration as well as
> > some basic logs.  Ideally we'd like to setup a similiar configuration
> > in house and see what's happening...
> 
> In-house?
> 
Sorry, short introduction, Andrew Vasquez, Linux driver development at
QLogic.

> Just curious. What can "I" do to know if our configuration
> won't get broken, just by removing LVM? TIA.
>
I've personally never used LVM before, so I cannot even begin to
attempt to answer your question --  please work with the tech on this
one, if it's a driver problem, we'd like to fix it.

--
Andrew

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-16 16:49         ` Michael Clark
@ 2002-10-17  3:12           ` GrandMasterLee
  2002-10-17  3:54             ` Michael Clark
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-17  3:12 UTC (permalink / raw)
  To: Michael Clark; +Cc: Simon Roscic, linux-kernel

On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
> On 10/17/02 00:28, Simon Roscic wrote:
> 
> >>This was happening in pretty much all kernels I tried (a variety of
> >>redhat kernels and aa kernels). Removing LVM has solved the problem.
> >>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
> > 
> > looks like i had a lot of luck, because my 3 servers wich are using the
> > qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
> > in the next few day's.
> > 
> > i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for 
> > not running into this problem, ...

So then, it seems that LVM is adding stress to the system in a way that
is bad for the kernel. Perhaps the read-ahead in conjunction with the
large buffers from XFS, plus the amount of volumes we run(22 on the
latest machine to crash).

> Seems to be the correlation so far. qlogic driver without lvm works okay.
> qlogic driver with lvm, oopsorama.

Michael, what exactly do your servers do? Are they DB servers with ~1Tb
connected, or file-servers with hundreds of gigs, etc?

> ~mc
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  3:11                     ` Andrew Vasquez
@ 2002-10-17  3:42                       ` GrandMasterLee
  2002-10-17  9:40                       ` Michael Clark
  1 sibling, 0 replies; 35+ messages in thread
From: GrandMasterLee @ 2002-10-17  3:42 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: linux-kernel

On Wed, 2002-10-16 at 22:11, Andrew Vasquez wrote:
...
> > Does this mean that 6.01 will NOT work either? What drivers will be
> > affected? We've already made the move to remove LVM from the mix, but
> > your comments above give me some doubt as to how definite it is, that
> > the stack clobbering will be fixed by doing so. 
> >

I was asking because We crashed, while using this driver, AND LVM. 
 
> The 6.x series driver basically branched from the 5.x series driver.  
> Changes made, many moons ago, are already in the 6.x series driver.
> To quell your concerns, yes, stack overflow is not an issue with the
> 6.x series driver. 
> 
> I believe if we are to get anywhere regarding this issue, we need to 
> shift focus from stack corruption in early versions of the driver.

In this way, you mean, that it is not an issue since you guys don't try
to use LVM.


> > > IAC, I believe the support tech working with MasterLee had asked 
> > > for additional information regarding the configuration as well as
> > > some basic logs.  Ideally we'd like to setup a similiar configuration
> > > in house and see what's happening...
> > 
> > In-house?
> > 
> Sorry, short introduction, Andrew Vasquez, Linux driver development at
> QLogic.

Nice to meet ya. :)

> > Just curious. What can "I" do to know if our configuration
> > won't get broken, just by removing LVM? TIA.
> >
> I've personally never used LVM before, so I cannot even begin to
> attempt to answer your question --  

We've removed LVM from the config, per Michael's issue and
recommendation, but I'm just scared that we *could* see this issue with
XFS and Qlogic. Since you're saying the 6.01 has no stack clobbering
issues, then is it XFS, LVM and Qlogic? 

> please work with the tech on this
> one, if it's a driver problem, we'd like to fix it.

I'm going to try, but we've got to get up and in production ASAP. Since
it takes *days* to cause the crash, I don't know how I can cause it  and
get the stack dump.

> --
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  3:12           ` GrandMasterLee
@ 2002-10-17  3:54             ` Michael Clark
  2002-10-17  4:08               ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-17  3:54 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: Simon Roscic, linux-kernel

On 10/17/02 11:12, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
>>Seems to be the correlation so far. qlogic driver without lvm works okay.
>>qlogic driver with lvm, oopsorama.
> 
> 
> Michael, what exactly do your servers do? Are they DB servers with ~1Tb
> connected, or file-servers with hundreds of gigs, etc?

My customer currently has about 400Gb on this particular 4 node Application
cluster (actually 2 x 2 node clusters using kimberlite HA software).

It has 11 logical hosts (services) spread over the 4 nodes with services such
as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
and were previously using LVM to slice up the storage.

The cluster usually has around 200-300 active users.

We have had oops (in ext3) on differing logical hosts which where running
different services. ie. has oopsed on the node mastering the fileserver,
and also on the node mastering the oracle database.

Cross fingers, since removing LVM (which was the only change we have made,
same kernel) we have had 3 times our longest uptime and still counting.

By the sounds, from earlier emails I had posted, users had responded
to me who were also using qlogic and none of them had had any problems,
the key factor was none of them were running LVM - this is what made
me think to try and remove it (it was really just a hunch). We had
gone through months of changing kernel versions, changing GigE network
adapters, driver versions, etc, to no avail, then finally the LVM removal.

Due to the potential nature of it being a stack problem. The problem
really can't just be pointed at LVM but more the additive effect this
would have on some underlying stack problem.

I believe the RedHat kernels i tried (rh7.2 2.4.9-34 errata was the most
recent) also had this 'stack' problem. I am currently using 2.4.19pre10aa4.

I would hate to reccomend you remove LVM and it not work, but i
must say it has worked for me (i'm just glad i didn't go to XFS instead
of removing LVM as i did - as this was the other option i was pondering).

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  3:54             ` Michael Clark
@ 2002-10-17  4:08               ` GrandMasterLee
  2002-10-17  5:03                 ` Michael Clark
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-17  4:08 UTC (permalink / raw)
  To: Michael Clark; +Cc: Simon Roscic, linux-kernel

On Wed, 2002-10-16 at 22:54, Michael Clark wrote:
> On 10/17/02 11:12, GrandMasterLee wrote:
> > On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
> >>Seems to be the correlation so far. qlogic driver without lvm works okay.
> >>qlogic driver with lvm, oopsorama.
> > 
> > 
> > Michael, what exactly do your servers do? Are they DB servers with ~1Tb
> > connected, or file-servers with hundreds of gigs, etc?
> 
> My customer currently has about 400Gb on this particular 4 node Application
> cluster (actually 2 x 2 node clusters using kimberlite HA software).
> 
> It has 11 logical hosts (services) spread over the 4 nodes with services such
> as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
> netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
> and were previously using LVM to slice up the storage.

On each of the Nodes, correct?

> The cluster usually has around 200-300 active users.
> 
> We have had oops (in ext3) on differing logical hosts which where running
> different services. ie. has oopsed on the node mastering the fileserver,
> and also on the node mastering the oracle database.

And again, each was running LVM in a shared storage mode for failover?

> Cross fingers, since removing LVM (which was the only change we have made,
> same kernel) we have had 3 times our longest uptime and still counting.
> 
> By the sounds, from earlier emails I had posted, users had responded
> to me who were also using qlogic and none of them had had any problems,
> the key factor was none of them were running LVM - this is what made
> me think to try and remove it (it was really just a hunch). We had
> gone through months of changing kernel versions, changing GigE network
> adapters, driver versions, etc, to no avail, then finally the LVM removal.

Kewl. That makes me feel much better now too. 

> Due to the potential nature of it being a stack problem. The problem
> really can't just be pointed at LVM but more the additive effect this
> would have on some underlying stack problem.
> 
> I believe the RedHat kernels i tried (rh7.2 2.4.9-34 errata was the most
> recent) also had this 'stack' problem. I am currently using 2.4.19pre10aa4.

Kewl. I'm using 2.4.19-aa1 (rc5-aa1, but hell, it's the same thing).

> I would hate to reccomend you remove LVM and it not work, but i
> must say it has worked for me (i'm just glad i didn't go to XFS instead
> of removing LVM as i did - as this was the other option i was pondering).

I hear you. We were pondering changing to EXT3, and not just EXT3, RHAS
also. i.e. more money, unknown kernel config, etc. I was going to be
*very* upset.  Are you running FC2(qla2300Fs in FC2 config) or FC1?

TIA

> ~mc
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  4:08               ` GrandMasterLee
@ 2002-10-17  5:03                 ` Michael Clark
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Clark @ 2002-10-17  5:03 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: Simon Roscic, linux-kernel

On 10/17/02 12:08, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 22:54, Michael Clark wrote:
> 
>>On 10/17/02 11:12, GrandMasterLee wrote:
>>
>>>On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
>>>
>>>>Seems to be the correlation so far. qlogic driver without lvm works okay.
>>>>qlogic driver with lvm, oopsorama.
>>>
>>>
>>>Michael, what exactly do your servers do? Are they DB servers with ~1Tb
>>>connected, or file-servers with hundreds of gigs, etc?
>>
>>My customer currently has about 400Gb on this particular 4 node Application
>>cluster (actually 2 x 2 node clusters using kimberlite HA software).
>>
>>It has 11 logical hosts (services) spread over the 4 nodes with services such
>>as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
>>netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
>>and were previously using LVM to slice up the storage.
> 
> 
> On each of the Nodes, correct?

We had originally planned to split up the storage in the RAID head
using individual luns for each cluster logical host - so we could use SCSI
reservations - but we encountered problems with the RAID heads device queue.

The RAID head has a global queue depth of 64 and to alleviate
queue problems with the RAID heading locking up, we needed to minimise
the number of luns, so late in the piece we added LVM to split up the storage.

We are using LVM in a clustered fashion. ie. we export most of the array
as one big lun, and slice it into lvs, each one associated with a logical host.
All lvs are accessible from all 4 physical hosts in the cluster. Care and
application locking ensures only 1 physical host mounts any lv/partition at
the same time (except for cluster quorum partitions which need to be accessed
concurrently from 2 nodes - and for these we have seperate quorum disks in
the array).

lvm metadata changes are made from one node while the others are down
(or just have volumes deactivated, unmounted, then lvm-mod removed)
to avoid screwing our metadata because lvm is not cluster aware.

We are not using mutlipath put have the cluster arranged in a topology
such that the HA RAID Head has 2 controllers with each side of the cluster
hanging of a different one ie. L and R. If we have a path failure, we will
just loose CPU capacity (25-50% depending). The logical hosts will automatically
move onto a physical node which still has connectivity to the RAID head
(by the cluster software checking of connectivity to the quorum paritions).
This gives us a good level of redundancy without the added cost of 2 paths
from each host. ie. after a path failure we run with degraded performance only.

We are using vanilla 2300's.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  3:11                     ` Andrew Vasquez
  2002-10-17  3:42                       ` GrandMasterLee
@ 2002-10-17  9:40                       ` Michael Clark
  2002-10-18  6:45                         ` GrandMasterLee
  1 sibling, 1 reply; 35+ messages in thread
From: Michael Clark @ 2002-10-17  9:40 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: linux-kernel

On 10/17/02 11:11, Andrew Vasquez wrote:
> On Wed, 16 Oct 2002, GrandMasterLee wrote:
> 
> 
>>On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
>>
>>>The stack issues were a major problem in the 5.3x series driver.  I
>>>believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
>>>contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
>>>the fixes.
>>
>>Does this mean that 6.01 will NOT work either? What drivers will be
>>affected? We've already made the move to remove LVM from the mix, but
>>your comments above give me some doubt as to how definite it is, that
>>the stack clobbering will be fixed by doing so. 
>>
> 
> The 6.x series driver basically branched from the 5.x series driver.  
> Changes made, many moons ago, are already in the 6.x series driver.
> To quell your concerns, yes, stack overflow is not an issue with the
> 6.x series driver. 
> 
> I believe if we are to get anywhere regarding this issue, we need to 
> shift focus from stack corruption in early versions of the driver.

Well corruption of bufferheads was happening for me with a potentially
stack deep setup (ext3+LVM+qlogic). Maybe it has been fixed in the
non-LVM case but is still an issue as I have had it with 6.0.1b3 -
The stack fix is listed in 6.0b13 which is quite a few release behind
the one i've had the problem with.

I posted the oops to lk about 3 weeks ago. Wasn't sure it was a qlogic
problem at the time, and still am not certain - maybe just sum of
stack(ext3+lvm+qlogic). Even if qla stack was trimmed for the common case,
it may still be a problem when LVM is active as there would be much
deeper stacks during block io.

http://marc.theaimsgroup.com/?l=linux-kernel&m=103302016311188&w=2

The oops doesn't show qlogic at all although it is a corrupt bufferhead
that is causing the oops so may have been silently corrupted earlier
by a qlogic interrupt or block io submission while deep inside lvm and
ext3 or some such, ie. the oops is one of those difficult sort that shows
up corruption from some earlier event that is not directly traceable from
the oops itself.

~mc


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  3:08     ` GrandMasterLee
@ 2002-10-17 17:47       ` Simon Roscic
  2002-10-18  6:42         ` GrandMasterLee
  0 siblings, 1 reply; 35+ messages in thread
From: Simon Roscic @ 2002-10-17 17:47 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

On Thursday 17 October 2002 05:08, GrandMasterLee <masterlee@digitalroadkill.net> wrote:
> Do you actually get the lockups then?

no, i didn't had any lookups, each of the machines currently have an uptime 
of only 16 day's and that's because we had to shutdown the power in our 
whole company for a half day. 
the best uptime i had was approx 40-50 day's, then i got the following
problem: the lotus domino server processes (not the whole machine) were
freezing every week, but that is a known problem for heavy loaded domino
servers, you have to increase the ammount of ipc memory for java or something 
(of the domino server), and since i did this everything works without problems.

the "primary" lotus domino server also got quite swap happy in the last weeks,
currently he has to serve almost everything that has to do with notes, the 
second server isn't realy in use yet ...

if you are interested, procinfo -a shows this on one of the 3 machines:
(all 3 are the same, except that the "primary" lotus domino server has 2 cpu's
and 2 gb ram, the other 2, have 1 cpu and 1 gb ram)

---------------- procinfo ----------------
Linux 2.4.17-xfs-smp (root@adam-neu) (gcc 2.95.3 20010315 ) #1 2CPU [adam.]

Memory:      Total        Used        Free      Shared     Buffers      Cached
Mem:       2061272     2050784       10488           0        2328     1865288
Swap:      1056124      265652      790472

Bootup: Tue Oct  1 17:42:07 2002    Load average: 0.14 0.08 0.02 1/445 11305

user  :   1d 20:15:32.03   5.7%  page in :1196633058  disk 1:  1670401r  953006w
nice  :       0:00:24.69   0.0%  page out:261985556  disk 2: 27762380r11039499w
system:      13:05:51.19   1.7%  swap in :  5870304  disk 3:        4r       0w
idle  :  29d 18:09:25.15  92.5%  swap out:  5099371  disk 4:        4r       0w
uptime:  16d  1:45:36.53         context :2810591591

irq  0: 138873653 timer                 irq 12:    104970 PS/2 Mouse
irq  1:      5597 keyboard              irq 14:        54 ide0
irq  2:         0 cascade [4]           irq 18:   8659653 ips
irq  3:         1                       irq 20: 421419256 e1000
irq  4:         1                       irq 24:  38444870 qla2200
irq  6:         3                       irq 28:     17728 e100
irq  8:         2 rtc

Kernel Command Line:
  auto BOOT_IMAGE=Linux ro root=803 BOOT_FILE=/boot/vmlinuz

Modules:
 24 *sg               6  lp              25  parport         59 *e100
 48 *e1000          165 *qla2200

Character Devices:                      Block Devices:
  1 mem              10 misc              2 fd
  2 pty              21 sg                3 ide0
  3 ttyp             29 fb                8 sd
  4 ttyS            128 ptm              65 sd
  5 cua             136 pts              66 sd
  6 lp              162 raw
  7 vcs             254 HbaApiDev

File Systems:
[rootfs]            [bdev]              [proc]              [sockfs]
[tmpfs]             [pipefs]            ext3                ext2
[nfs]               [smbfs]             [devpts]            xfs
---------------- procinfo ----------------

the kernel running on the 3 machines is a "vanilla" 2.4.17
plus XFS, plus ext3-0.9.17, plus intel ether express 100 and
and intel ether express 1000 driver (e100 and e1000), and
the qlogic qla2x00 5.36.3 driver ...

i think i will wait for 2.4.20 and then make a new kernel for the 3 machines ...

simon.
(please CC me, i'm not subscribed to lkml)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17 17:47       ` Simon Roscic
@ 2002-10-18  6:42         ` GrandMasterLee
  2002-10-18 15:11           ` Simon Roscic
  0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2002-10-18  6:42 UTC (permalink / raw)
  To: Simon Roscic; +Cc: linux-kernel

On Thu, 2002-10-17 at 12:47, Simon Roscic wrote:
> On Thursday 17 October 2002 05:08, GrandMasterLee <masterlee@digitalroadkill.net> wrote:
> > Do you actually get the lockups then?
> 
> no, i didn't had any lookups, each of the machines currently have an uptime 
> of only 16 day's and that's because we had to shutdown the power in our 
> whole company for a half day. 
> ...

One question about your config, are you using, on ANY machines,
QLA2300's or PCI-X, and 5.38.x or 6.xx qlogic drivers? If so, then
you've experienced no lockups with those machines too?


> if you are interested, procinfo -a shows this on one of the 3 machines:
> (all 3 are the same, except that the "primary" lotus domino server has 2 cpu's
> and 2 gb ram, the other 2, have 1 cpu and 1 gb ram)
> 
> ---------------- procinfo ----------------
> Linux 2.4.17-xfs-smp (root@adam-neu) (gcc 2.95.3 20010315 ) #1 2CPU [adam.]

Thanks for the info. I will hopefully have >5 days uptime now too. If
not, anyone need a Systems Architect? :-D

--The GrandMaster

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-17  9:40                       ` Michael Clark
@ 2002-10-18  6:45                         ` GrandMasterLee
  0 siblings, 0 replies; 35+ messages in thread
From: GrandMasterLee @ 2002-10-18  6:45 UTC (permalink / raw)
  To: Michael Clark; +Cc: Andrew Vasquez, linux-kernel

On Thu, 2002-10-17 at 04:40, Michael Clark wrote:
> On 10/17/02 11:11, Andrew Vasquez wrote:
> > On Wed, 16 Oct 2002, GrandMasterLee wrote:
> > 
> > 
> >>On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> >>
> >>>The stack issues were a major problem in the 5.3x series driver.  I
> >>>believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> >>>contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> >>>the fixes.
> >>
> >>Does this mean that 6.01 will NOT work either? What drivers will be
> >>affected? We've already made the move to remove LVM from the mix, but
> >>your comments above give me some doubt as to how definite it is, that
> >>the stack clobbering will be fixed by doing so. 
> >>
> > 
> > The 6.x series driver basically branched from the 5.x series driver.  
> > Changes made, many moons ago, are already in the 6.x series driver.
> > To quell your concerns, yes, stack overflow is not an issue with the
> > 6.x series driver. 
> > 
> > I believe if we are to get anywhere regarding this issue, we need to 
> > shift focus from stack corruption in early versions of the driver.
> 
> Well corruption of bufferheads was happening for me with a potentially
> stack deep setup (ext3+LVM+qlogic). Maybe it has been fixed in the
> non-LVM case but is still an issue as I have had it with 6.0.1b3 -
> The stack fix is listed in 6.0b13 which is quite a few release behind
> the one i've had the problem with.

I don't disagree, but I saw the same things with XFS filesystems on LVM
also. This leads me to my next question. Does anyone on this list use
XFS plus QLA2300's with 500GB+ mounted by several volumes on Qlogic
driver 5.38.x or > and have greater than 20 days uptime to date?


> I posted the oops to lk about 3 weeks ago. Wasn't sure it was a qlogic
> problem at the time, and still am not certain - maybe just sum of
> stack(ext3+lvm+qlogic). Even if qla stack was trimmed for the common case,
> it may still be a problem when LVM is active as there would be much
> deeper stacks during block io.
> 

Kewl..thanks much.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
  2002-10-18  6:42         ` GrandMasterLee
@ 2002-10-18 15:11           ` Simon Roscic
  0 siblings, 0 replies; 35+ messages in thread
From: Simon Roscic @ 2002-10-18 15:11 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

On Friday 18 October 2002 08:42, GrandMasterLee <masterlee@digitalroadkill.net> wrote:
> One question about your config, are you using, on ANY machines,
> QLA2300's or PCI-X, and 5.38.x or 6.xx qlogic drivers? If so, then
> you've experienced no lockups with those machines too?

no, the 3 machines i use, are basically the same (ibm xseries 342), 
and have the same qlogic cards (qla2200), all 3 machines use the
same kernel (2.4.17+xfs+ext3-0.9.17+e100+e1000+qla2x00-5.36.3).

a few details, possibly something help's you:

---------------- lspci ----------------
00:00.0 Host bridge: ServerWorks CNB20HE (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:06.0 VGA compatible controller: S3 Inc. Savage 4 (rev 06)
00:0f.0 ISA bridge: ServerWorks OSB4 (rev 51)
00:0f.1 IDE interface: ServerWorks: Unknown device 0211
00:0f.2 USB Controller: ServerWorks: Unknown device 0220 (rev 04)
01:02.0 RAID bus controller: IBM Netfinity ServeRAID controller
01:03.0 Ethernet controller: Intel Corporation 82543GC Gigabit Ethernet Controller (rev 02)
01:07.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
02:05.0 SCSI storage controller: QLogic Corp. QLA2200 (rev 05)
---------------------------------------

--------- dmesg (qla stuff)--------
qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2
scsi1: Found a QLA2200  @ bus 2, device 0x5, irq 24, iobase 0x2100
scsi(1): Configure NVRAM parameters...
scsi(1): Verifying loaded RISC code...
scsi(1): Verifying chip...
scsi(1): Waiting for LIP to complete...
scsi(1): LIP reset occurred
scsi(1): LIP occurred.
scsi(1): LOOP UP detected
scsi1: Topology - (Loop), Host Loop address  0x7d
scsi-qla0-adapter-port=210000e08b064002\;
scsi-qla0-tgt-0-di-0-node=200600a0b80c3d8c\;
scsi-qla0-tgt-0-di-0-port=200600a0b80c3d8d\;
scsi-qla0-tgt-0-di-0-control=00\;
scsi-qla0-tgt-0-di-0-preferred=ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff\;
scsi1 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 5 irq 24
        Firmware version:  2.01.37, Driver version 5.36.3
  Vendor: IBM       Model: 3552              Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: 3552              Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: 3552              Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: 3552              Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(1:0:0:0): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:1): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:2): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:3): Enabled tagged queuing, queue depth 16.
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 0, lun 1
Attached scsi disk sdd at scsi1, channel 0, id 0, lun 2
Attached scsi disk sde at scsi1, channel 0, id 0, lun 3
SCSI device sdb: 125829120 512-byte hdwr sectors (64425 MB)
 sdb: sdb1
SCSI device sdc: 125829120 512-byte hdwr sectors (64425 MB)
 sdc: sdc1
SCSI device sdd: 125829120 512-byte hdwr sectors (64425 MB)
 sdd: sdd1
SCSI device sde: 48599040 512-byte hdwr sectors (24883 MB)
 sde: sde1
---------------------------------------

alle 3 machines have the same filesystem concept:

internal storage -> ext3  (linux and programs)
storage on fastt500 -> xfs  (data only)

except the mount options, because the fileserver also need's
quotas, he has: 

rw,noatime,quota,usrquota,grpquota,logbufs=8,logbsize=32768


i consider the primary lotus domino server to be the machine wich
has to handle the highest load of the 3, because he currently has
to handle almost everything that has to do with lotus notes in your
company, it's friday afternoon here, so the load isn't realy high,
but it's possibly nice for you to know how much load the machine
has to handle:

---------------------------------------
  4:59pm  up 16 days, 23:17,  3 users,  load average: 0.43, 0.18, 0.11
445 processes: 441 sleeping, 4 running, 0 zombie, 0 stopped
CPU0 states:  6.13% user,  1.57% system,  0.0% nice, 91.57% idle
CPU1 states: 26.19% user,  5.27% system,  0.0% nice, 68.17% idle
Mem:  2061272K av, 2049636K used,   11636K free,       0K shrd,    1604K buff
Swap: 1056124K av,  262532K used,  793592K free                 1856420K cached
---------------------------------------
(/local/notesdata is on the fastt500)
adam:/ # lsof |grep /local/notesdata/ |wc -l
  33076
---------------------------------------
adam:/ # lsof |wc -l
   84604
---------------------------------------

simon.
(please CC me, i'm not subscribed to lkml)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Kernel 2.5] Qlogic 2x00 driver
@ 2002-10-19  2:17 rwhron
  0 siblings, 0 replies; 35+ messages in thread
From: rwhron @ 2002-10-19  2:17 UTC (permalink / raw)
  To: praka; +Cc: linux-kernel

> short introduction, Andrew Vasquez, Linux driver development at
> QLogic.

I'd like to see the QLogic 6.x driver in the Linus and Marcelo
trees.  In the tests I've run, it has better throughput and
lower latency than the standard driver.  The improvement
on journaled filesystems and synchronous I/O is where the 6.x
driver shines.

How about the latest final 6.x driver for 2.4.21-pre,
and the 6.x beta driver for 2.5.x? :)

For people who like numbers...

tiobench-0.3.3 is a multithreaded I/O benchmark.

Unit information
================
File size = 12888 megabytes
Blk Size  = 4096 bytes
Num Thr   = number of threads
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load


2.4.19-pre10-aa4 has QLogic 6.x driver
2.4.19-pre10-aa4-oql has old (standard) QLogic driver.

Ext3 fs has dramatic improvements in throughput and generally
much lower average and maximum latency with the QLogic 6.x driver.

Sequential Reads ext3
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1   50.34 28.23%    0.230     159.48  0.00000  0.00000  178
2.4.19-pre10-aa4-oql   1   49.26 28.80%    0.235     305.75  0.00000  0.00000  171

2.4.19-pre10-aa4      32    7.24  4.34%   50.964   10279.10  1.72278  0.00000  167
2.4.19-pre10-aa4-oql  32    5.00  2.90%   73.941   16359.07  1.99270  0.00000  173

2.4.19-pre10-aa4      64    7.13  4.32%  102.581   21062.55  2.13490  0.00000  165
2.4.19-pre10-aa4-oql  64    4.79  2.78%  152.879   32394.43  2.12285  0.00318  172

2.4.19-pre10-aa4     128    6.92  4.20%  209.363   41943.96  2.22813  1.24283  165
2.4.19-pre10-aa4-oql 128    4.56  2.68%  317.688   66597.95  2.25661  1.98520  170

2.4.19-pre10-aa4     256    6.82  4.09%  418.905   87535.80  2.27172  2.11080  167
2.4.19-pre10-aa4-oql 256    4.43  2.62%  645.476  133976.17  2.30475  2.13893  169


Random Reads ext3
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1    0.64  0.73%   18.338     111.72  0.00000  0.00000   87
2.4.19-pre10-aa4-oql   1    0.63  0.64%   18.482      99.78  0.00000  0.00000  100

2.4.19-pre10-aa4      32    2.38  2.58%  122.580   14073.47  0.65000  0.00000   92
2.4.19-pre10-aa4-oql  32    1.60  1.64%  179.073   19904.25  0.75000  0.00000   98

2.4.19-pre10-aa4      64    2.36  3.05%  202.490   15891.16  3.04939  0.00000   78
2.4.19-pre10-aa4-oql  64    1.60  2.00%  292.337   25545.70  3.20061  0.00000   80

2.4.19-pre10-aa4     128    2.38  2.72%  355.104   17775.10  6.07358  0.00000   88
2.4.19-pre10-aa4-oql 128    1.56  2.25%  536.262   27685.78  6.95565  0.00000   69

2.4.19-pre10-aa4     256    2.39  3.66%  667.890   18035.51 13.02083  0.00000   65
2.4.19-pre10-aa4-oql 256    1.59  2.19%  995.664   27016.55 15.13020  0.00000   73


Sequential Writes ext3
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1   44.19 56.57%    0.243    5740.37  0.00003  0.00000   78
2.4.19-pre10-aa4-oql   1   36.93 47.25%    0.291    6984.53  0.00010  0.00000   78

2.4.19-pre10-aa4      32   20.07 130.9%    8.552   11636.29  0.04701  0.00000   15
2.4.19-pre10-aa4-oql  32   18.21 120.4%   10.188   14347.38  0.09546  0.00000   15

2.4.19-pre10-aa4      64   17.02 115.5%   19.166   37065.25  0.30819  0.00019   15
2.4.19-pre10-aa4-oql  64   14.64 102.4%   21.978   28398.20  0.42292  0.00000   14

2.4.19-pre10-aa4     128   14.50 100.8%   44.053   51945.59  0.87108  0.00175   14
2.4.19-pre10-aa4-oql 128   11.34 86.43%   54.214   48119.96  1.15720  0.00410   13

2.4.19-pre10-aa4     256   11.70 78.84%  104.914   54905.07  2.27391  0.02009   15
2.4.19-pre10-aa4-oql 256    9.13 60.18%  131.867   60897.68  2.84818  0.03341   15


Random Writes ext3
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1    4.42  4.14%    0.086       1.57  0.00000  0.00000  107
2.4.19-pre10-aa4-oql   1    3.54  3.17%    0.086       1.17  0.00000  0.00000  112

2.4.19-pre10-aa4      32    4.24 11.74%    0.280      13.71  0.00000  0.00000   36
2.4.19-pre10-aa4-oql  32    3.47  9.46%    0.294      69.40  0.00000  0.00000   37

2.4.19-pre10-aa4      64    4.25 12.05%    0.283      10.86  0.00000  0.00000   35
2.4.19-pre10-aa4-oql  64    3.47 10.30%    0.356      41.28  0.00000  0.00000   34

2.4.19-pre10-aa4     128    4.38 105.7%   19.575    2590.92  0.75605  0.00000    4
2.4.19-pre10-aa4-oql 128    3.48 11.37%    0.433      97.01  0.00000  0.00000   31

2.4.19-pre10-aa4     256    4.19 11.36%    0.269       9.55  0.00000  0.00000   37
2.4.19-pre10-aa4-oql 256    3.44 10.09%    0.270       8.84  0.00000  0.00000   34

Sequential Reads ext2
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1   50.22 27.99%    0.231     916.73  0.00000  0.00000  179
2.4.19-pre10-aa4-oql   1   49.31 28.56%    0.235     671.86  0.00000  0.00000  173

2.4.19-pre10-aa4      32   40.58 25.30%    8.851   14977.86  0.13511  0.00000  160
2.4.19-pre10-aa4-oql  32   36.54 21.75%    9.534   33524.30  0.15471  0.00009  168

2.4.19-pre10-aa4      64   40.30 24.97%   17.409   29836.18  0.39867  0.00009  161
2.4.19-pre10-aa4-oql  64   36.75 22.04%   18.318   66908.84  0.17503  0.07566  167

2.4.19-pre10-aa4     128   40.58 25.06%   33.476   59144.35  0.40223  0.03750  162
2.4.19-pre10-aa4-oql 128   36.65 21.80%   35.286  116917.79  0.19392  0.16241  168

2.4.19-pre10-aa4     256   40.43 25.21%   64.505  116303.68  0.42705  0.36046  160
2.4.19-pre10-aa4-oql 256   36.67 22.05%   66.686  247490.25  0.22520  0.19671  166


Random Reads ext2
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1    0.73  0.79%   16.139     122.84  0.00000  0.00000   92
2.4.19-pre10-aa4-oql   1    0.74  0.71%   15.813     108.18  0.00000  0.00000  104

2.4.19-pre10-aa4      32    5.21  6.12%   65.858     263.66  0.00000  0.00000   85
2.4.19-pre10-aa4-oql  32    3.49  3.65%   96.096     622.30  0.00000  0.00000   96

2.4.19-pre10-aa4      64    5.34  8.04%  124.586    1666.34  0.00000  0.00000   66
2.4.19-pre10-aa4-oql  64    3.66  3.85%  161.338    9160.40  0.47883  0.00000   95

2.4.19-pre10-aa4     128    5.31  9.59%  188.362    6900.68  1.23488  0.00000   55
2.4.19-pre10-aa4-oql 128    3.69  5.39%  256.389   10303.55  3.70464  0.00000   68

2.4.19-pre10-aa4     256    5.35  7.01%  321.321    7466.05  4.06250  0.00000   76
2.4.19-pre10-aa4-oql 256    3.70  5.84%  445.043   11064.58  8.25521  0.00000   63


Sequential Writes ext2
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1   44.74 29.79%    0.237   10862.42  0.00022  0.00000  150
2.4.19-pre10-aa4-oql   1   39.82 26.00%    0.266   12820.23  0.00136  0.00000  153

2.4.19-pre10-aa4      32   37.54 46.78%    8.435   12528.86  0.02540  0.00000   80
2.4.19-pre10-aa4-oql  32   32.76 38.44%    9.718   12559.96  0.12649  0.00000   85

2.4.19-pre10-aa4      64   37.22 46.35%   16.613   21438.79  0.54842  0.00000   80
2.4.19-pre10-aa4-oql  64   32.77 38.17%   18.967   29070.67  0.52241  0.00003   86

2.4.19-pre10-aa4     128   37.11 45.77%   32.377   48430.99  0.55281  0.00200   81
2.4.19-pre10-aa4-oql 128   32.71 38.04%   37.046   56332.65  0.52779  0.00315   86

2.4.19-pre10-aa4     256   36.97 46.31%   62.846   84414.17  0.58346  0.42848   80
2.4.19-pre10-aa4-oql 256   33.06 38.38%   70.013   93041.19  0.53021  0.45341   86


Random Writes ext2
                     Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel               Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
-------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4       1    4.60  3.73%    0.071      11.32  0.00000  0.00000  123
2.4.19-pre10-aa4-oql   1    3.71  2.38%    0.063       1.39  0.00000  0.00000  156

2.4.19-pre10-aa4      32    4.62  8.18%    0.183      17.48  0.00000  0.00000   56
2.4.19-pre10-aa4-oql  32    3.85  6.81%    0.184      13.01  0.00000  0.00000   56

2.4.19-pre10-aa4      64    4.62  8.75%    0.179      11.23  0.00000  0.00000   53
2.4.19-pre10-aa4-oql  64    3.83  6.01%    0.185      11.79  0.00000  0.00000   64

2.4.19-pre10-aa4     128    4.49  7.82%    0.181      11.25  0.00000  0.00000   57
2.4.19-pre10-aa4-oql 128    3.90  7.64%    0.186      13.04  0.00000  0.00000   51

2.4.19-pre10-aa4     256    4.41  8.92%    0.180      10.56  0.00000  0.00000   49
2.4.19-pre10-aa4-oql 256    3.70  7.32%    0.178      11.26  0.00000  0.00000   51



Sequential Reads reiserfs
                      Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel                Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
--------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4        1   47.63 30.34%    0.244     150.93  0.00000  0.00000  157
2.4.19-pre10-aa4-oql    1   47.99 30.42%    0.242     153.50  0.00000  0.00000  158

2.4.19-pre10-aa4       32   36.75 25.80%    9.761   12904.09  0.08792  0.00000  142
2.4.19-pre10-aa4-oql   32   30.68 20.76%   11.026   47422.16  0.15427  0.00591  148

2.4.19-pre10-aa4       64   34.26 24.31%   20.720   22812.18  0.60685  0.00000  141
2.4.19-pre10-aa4-oql   64   31.50 20.98%   20.887   74077.79  0.17828  0.09984  150

2.4.19-pre10-aa4      128   35.94 25.46%   37.882   50116.12  0.53921  0.00388  141
2.4.19-pre10-aa4-oql  128   32.41 21.82%   39.056  137240.55  0.20548  0.17551  149

2.4.19-pre10-aa4      256   35.28 25.17%   74.221  102660.73  0.59475  0.48787  140
2.4.19-pre10-aa4-oql  256   31.67 21.85%   73.905  248764.61  0.26824  0.23384  145


Random Reads reiserfs
                      Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel                Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
--------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4        1    0.59  0.86%   19.727     129.29  0.00000  0.00000   69
2.4.19-pre10-aa4-oql    1    0.60  0.75%   19.634     124.21  0.00000  0.00000   79

2.4.19-pre10-aa4       32    4.23  5.23%   80.588     363.42  0.00000  0.00000   81
2.4.19-pre10-aa4-oql   32    2.98  3.50%  107.835     352.72  0.00000  0.00000   85

2.4.19-pre10-aa4       64    4.28  5.16%  139.187    7890.49  0.45363  0.00000   83
2.4.19-pre10-aa4-oql   64    3.12  5.36%  184.145   10398.28  0.78125  0.00000   58

2.4.19-pre10-aa4      128    4.51  6.99%  213.281    8206.57  1.71370  0.00000   65
2.4.19-pre10-aa4-oql  128    3.13  4.57%  296.529   12265.36  4.43549  0.00000   68

2.4.19-pre10-aa4      256    4.51  7.61%  378.383    8946.18  6.09375  0.00000   59
2.4.19-pre10-aa4-oql  256    3.10  6.34%  539.497   13397.75 10.57291  0.00000   49


Sequential Writes reiserfs
                      Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel                Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
--------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4        1   30.24 49.54%    0.373   33577.32  0.00201  0.00124   61
2.4.19-pre10-aa4-oql    1   28.25 46.58%    0.400   37683.70  0.00204  0.00137   61

2.4.19-pre10-aa4       32   35.33 158.8%    8.978   27421.64  0.13778  0.00102   22
2.4.19-pre10-aa4-oql   32   30.49 145.0%   10.526   48710.93  0.14690  0.00973   21

2.4.19-pre10-aa4       64   32.43 156.6%   19.320   88360.82  0.29164  0.02301   21
2.4.19-pre10-aa4-oql   64   29.31 126.4%   21.320   75316.32  0.33388  0.01316   23

2.4.19-pre10-aa4      128   32.93 140.0%   35.576   95977.80  0.43586  0.11368   24
2.4.19-pre10-aa4-oql  128   29.18 116.8%   40.995   92350.00  0.46883  0.15281   25

2.4.19-pre10-aa4      256   31.38 138.1%   68.583  175532.20  0.67412  0.22071   23
2.4.19-pre10-aa4-oql  256   28.33 117.1%   79.807  128832.50  0.75308  0.32549   24


Random Writes reiserfs
                      Num                   Avg      Maximum     Lat%     Lat%  CPU
Kernel                Thr   Rate  (CPU%)  Latency    Latency       2s      10s  Eff
--------------------- ---  --------------------------------------------------------
2.4.19-pre10-aa4        1    4.29  4.12%    0.094       0.26  0.00000  0.00000  104
2.4.19-pre10-aa4-oql    1    3.47  3.33%    0.095       0.93  0.00000  0.00000  104

2.4.19-pre10-aa4       32    4.36 10.78%    0.256     107.72  0.00000  0.00000   40
2.4.19-pre10-aa4-oql   32    3.51  8.38%    0.226       8.23  0.00000  0.00000   42

2.4.19-pre10-aa4       64    4.41 11.08%    0.231       7.93  0.00000  0.00000   40
2.4.19-pre10-aa4-oql   64    3.52  8.87%    0.231       7.92  0.00000  0.00000   40

2.4.19-pre10-aa4      128    4.36 10.60%    0.244      55.45  0.00000  0.00000   41
2.4.19-pre10-aa4-oql  128    3.61  9.54%    0.378     434.79  0.00000  0.00000   38

2.4.19-pre10-aa4      256    4.21 10.46%    0.857     494.67  0.00000  0.00000   40
2.4.19-pre10-aa4-oql  256    3.40  8.78%    0.665     711.39  0.00000  0.00000   39


Dbench improves 4-15% with QLogic 6.x driver.

2.4.19-pre10-aa4 has QLogic 6.x driver.
2.4.19-pre10-aa4-oql has old qlogic driver.

dbench reiserfs 192 processes		average (5 runs)
2.4.19-pre10-aa4         		 47.98 
2.4.19-pre10-aa4-oql     		 45.24 

dbench reiserfs 64 processes		average
2.4.19-pre10-aa4         		 65.19 
2.4.19-pre10-aa4-oql     		 61.84 

dbench ext3 192 processes		average
2.4.19-pre10-aa4         		 71.62 
2.4.19-pre10-aa4-oql     		 66.19 

dbench ext3 64 processes		Average
2.4.19-pre10-aa4         		 91.68 
2.4.19-pre10-aa4-oql     		 86.16 

dbench ext2 192 processes		Average
2.4.19-pre10-aa4         		153.43 
2.4.19-pre10-aa4-oql     		147.71 

dbench ext2 64 processes		Average
2.4.19-pre10-aa4         		184.89 
2.4.19-pre10-aa4-oql     		178.27 


Bonnie++ average of 3 runs shows improvement in most metrics.
15% improvement in block writes on ext3.

2.4.19-pre10-aa4 has QLogic 6.x driver.
2.4.19-pre10-aa4-oql has old qlogic driver.

bonnie++-1.02a on ext2
                               -------- Sequential Output ----------  - Sequential Input -  ----- Random -----
                               ------ Block -----  ---- Rewrite ----    ----- Block -----   ----- Seeks  -----
Kernel                 Size    MB/sec   %CPU  Eff  MB/sec  %CPU  Eff    MB/sec  %CPU  Eff    /sec  %CPU   Eff
2.4.19-pre10-aa4       8192     52.00   29.0  179   22.35  20.7  108     51.78  26.3  197   435.9  2.00  21797
2.4.19-pre10-aa4-oql   8192     47.36   26.0  182   21.85  20.0  109     50.68  25.0  203   440.9  1.67  26452

                              ---------Sequential ------------------  ------------- Random -----------------
                              ----- Create -----    ---- Delete ----  ----- Create ----     ---- Delete ----
                       files   /sec  %CPU    Eff    /sec  %CPU   Eff   /sec  %CPU   Eff     /sec  %CPU   Eff
2.4.19-pre10-aa4       65536    174  99.0    175   87563  98.7  8874    170  99.0   172      574  99.0   580
2.4.19-pre10-aa4-oql   65536    172  99.0    174   86867  99.0  8774    170  99.0   172      582  99.0   588

bonnie++-1.02a on ext3
                               -------- Sequential Output ----------  - Sequential Input -  ----- Random -----
                               ------ Block -----  ---- Rewrite ----    ----- Block -----   ----- Seeks  -----
Kernel                 Size    MB/sec   %CPU  Eff  MB/sec  %CPU  Eff    MB/sec  %CPU  Eff    /sec  %CPU   Eff
2.4.19-pre10-aa4       8192     49.99   55.3   90   22.31  23.0   97     51.81  25.3  205   362.3  2.00  18115
2.4.19-pre10-aa4-oql   8192     42.36   46.0   92   21.12  21.7   97     50.75  25.0  203   363.4  1.67  21806

                              ---------Sequential ------------------  ------------- Random -----------------
                              ----- Create -----    ---- Delete ----  ----- Create ----     ---- Delete ----
                       files   /sec  %CPU    Eff    /sec  %CPU   Eff   /sec  %CPU   Eff     /sec  %CPU   Eff
2.4.19-pre10-aa4       65536    127  99.0    128   27237  96.0  2837    129  99.0   130      481  96.3   499
2.4.19-pre10-aa4-oql   65536    125  99.0    127   28173  96.3  2924    128  99.0   129      478  96.0   498

bonnie++-1.02a on reiserfs
                               -------- Sequential Output ----------  - Sequential Input -  ----- Random -----
                               ------ Block -----  ---- Rewrite ----    ----- Block -----   ----- Seeks  -----
Kernel                 Size    MB/sec   %CPU  Eff  MB/sec  %CPU  Eff    MB/sec  %CPU  Eff    /sec  %CPU   Eff
2.4.19-pre10-aa4       8192     32.98   49.0   67   22.65  24.7   92     49.03  28.0  175   363.0  2.00  18152
2.4.19-pre10-aa4-oql   8192     30.34   45.0   67   21.57  23.0   94     49.09  27.7  177   365.2  2.33  15650

                              ---------Sequential ------------------  ------------- Random -----------------
                              ----- Create -----    ---- Delete ----  ----- Create ----     ---- Delete ----
                       files   /sec  %CPU    Eff    /sec  %CPU   Eff   /sec  %CPU   Eff     /sec  %CPU   Eff
2.4.19-pre10-aa4      131072   3634  41.7   8722    2406  33.7  7147   3280  39.7  8270      977  18.3  5331
2.4.19-pre10-aa4-oql  131072   3527  40.3   8745    2295  32.0  7170   3349  40.7  8234      871  16.0  5446

-- 
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-10-19  2:08 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-15 19:20 [Kernel 2.5] Qlogic 2x00 driver Simon Roscic
2002-10-15 19:31 ` Arjan van de Ven
2002-10-15 19:53   ` Simon Roscic
2002-10-16  2:51     ` Michael Clark
2002-10-16  3:56       ` GrandMasterLee
2002-10-16  4:30         ` Michael Clark
2002-10-16  4:35           ` J Sloan
2002-10-16  4:43             ` GrandMasterLee
2002-10-16  6:03               ` Michael Clark
2002-10-16  6:31                 ` GrandMasterLee
2002-10-16  6:40                   ` Michael Clark
2002-10-16  6:48                     ` GrandMasterLee
2002-10-16  6:59                       ` Michael Clark
2002-10-16  4:58             ` GrandMasterLee
2002-10-16  5:28             ` Michael Clark
2002-10-16  5:40               ` Andreas Dilger
2002-10-17  1:59                 ` Andrew Vasquez
2002-10-17  2:44                   ` GrandMasterLee
2002-10-17  3:11                     ` Andrew Vasquez
2002-10-17  3:42                       ` GrandMasterLee
2002-10-17  9:40                       ` Michael Clark
2002-10-18  6:45                         ` GrandMasterLee
2002-10-16 16:28       ` Simon Roscic
2002-10-16 16:49         ` Michael Clark
2002-10-17  3:12           ` GrandMasterLee
2002-10-17  3:54             ` Michael Clark
2002-10-17  4:08               ` GrandMasterLee
2002-10-17  5:03                 ` Michael Clark
2002-10-16  5:02 ` GrandMasterLee
2002-10-16 16:38   ` Simon Roscic
2002-10-17  3:08     ` GrandMasterLee
2002-10-17 17:47       ` Simon Roscic
2002-10-18  6:42         ` GrandMasterLee
2002-10-18 15:11           ` Simon Roscic
2002-10-19  2:17 rwhron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).