linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Complete I/O starvation with 3ware raid on 2.6
@ 2003-09-25  7:12 Aaron Lehmann
  2003-09-25  7:43 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-25  7:12 UTC (permalink / raw)
  To: linux-kernel

I'm running bkcvs HEAD on a newly installed system, and started
copying files over to my RAID 5 from older IDE disks. When I copy
these files, the system becomes unusable. Specifically, any disk
access on the 3ware array, no matter how simple, even starting 'vi' on
a file, takes minutes or eternity to complete. Suspending the process
doing the copying doesn't even help much, because the LEDs on the card
continue blinking for about 30 seconds after the suspension. This
happens whether the IDE drive is using DMA or not. It seems that some
kind of insane queueing is going on. Are there parameters worth
playing with? Should I try the deadline I/O scheduler?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  7:12 Complete I/O starvation with 3ware raid on 2.6 Aaron Lehmann
@ 2003-09-25  7:43 ` Andrew Morton
  2003-09-25  7:50   ` Aaron Lehmann
  2003-09-25  7:58   ` Aaron Lehmann
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2003-09-25  7:43 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: linux-kernel

Aaron Lehmann <aaronl@vitelus.com> wrote:
>
> I'm running bkcvs HEAD on a newly installed system, and started
> copying files over to my RAID 5 from older IDE disks. When I copy
> these files, the system becomes unusable. Specifically, any disk
> access on the 3ware array, no matter how simple, even starting 'vi' on
> a file, takes minutes or eternity to complete. Suspending the process
> doing the copying doesn't even help much, because the LEDs on the card
> continue blinking for about 30 seconds after the suspension. This
> happens whether the IDE drive is using DMA or not. It seems that some
> kind of insane queueing is going on. Are there parameters worth
> playing with? Should I try the deadline I/O scheduler?

An update to the 3ware driver was merged yesterday.  Have you used earlier
2.5 kernels?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  7:43 ` Andrew Morton
@ 2003-09-25  7:50   ` Aaron Lehmann
  2003-09-25  8:02     ` Nick Piggin
  2003-09-25  7:58   ` Aaron Lehmann
  1 sibling, 1 reply; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-25  7:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Sep 25, 2003 at 12:43:01AM -0700, Andrew Morton wrote:
> An update to the 3ware driver was merged yesterday.  Have you used earlier
> 2.5 kernels?

Unfortunately not. I copied a day-old CVS tree to the machine but
decided to update before compiling to get the latest-and-greatest. I
did notice the 3ware updates.

I rebooted with the deadline scheduler. It definately isn't helping.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  7:43 ` Andrew Morton
  2003-09-25  7:50   ` Aaron Lehmann
@ 2003-09-25  7:58   ` Aaron Lehmann
  2003-09-25  8:10     ` Andrew Morton
  1 sibling, 1 reply; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-25  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Sep 25, 2003 at 12:43:01AM -0700, Andrew Morton wrote:
> An update to the 3ware driver was merged yesterday.  Have you used earlier
> 2.5 kernels?

More info: The load average is above ten just because of this copy,
and even cating /proc/cpuinfo takes 10 seconds.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  7:50   ` Aaron Lehmann
@ 2003-09-25  8:02     ` Nick Piggin
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2003-09-25  8:02 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: Andrew Morton, linux-kernel



Aaron Lehmann wrote:

>On Thu, Sep 25, 2003 at 12:43:01AM -0700, Andrew Morton wrote:
>
>>An update to the 3ware driver was merged yesterday.  Have you used earlier
>>2.5 kernels?
>>
>
>Unfortunately not. I copied a day-old CVS tree to the machine but
>decided to update before compiling to get the latest-and-greatest. I
>did notice the 3ware updates.
>
>I rebooted with the deadline scheduler. It definately isn't helping.
>

OK, one problem is most likely something I added a month or so ago: a
new process is now assumed to be not a good anticipate candidate. This
solved some guy's obscure problem, but a lot of programs that benefit from
anticipation (ls, gcc, vi startup, cat, etc) are only going to submit a
few requests in their life, so they lose most of the gains. I have some
automatic thingy I'm testing at the moment.

The other problem could well be a big TCQ depth. AS helps with this, but
it can't do a really good job. Try a TCQ depth of max 4.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  7:58   ` Aaron Lehmann
@ 2003-09-25  8:10     ` Andrew Morton
  2003-09-25  8:31       ` Aaron Lehmann
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2003-09-25  8:10 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: linux-kernel

Aaron Lehmann <aaronl@vitelus.com> wrote:
>
> On Thu, Sep 25, 2003 at 12:43:01AM -0700, Andrew Morton wrote:
> > An update to the 3ware driver was merged yesterday.  Have you used earlier
> > 2.5 kernels?
> 
> More info: The load average is above ten just because of this copy,
> and even cating /proc/cpuinfo takes 10 seconds.

A few things to run are `top', `ps' and `vmstat 1'.

And `less Documentation/basic_profiling.txt' ;)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  8:10     ` Andrew Morton
@ 2003-09-25  8:31       ` Aaron Lehmann
  2003-09-25  9:13         ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-25  8:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Sep 25, 2003 at 01:10:52AM -0700, Andrew Morton wrote:
> A few things to run are `top', `ps' and `vmstat 1'.

The first two do not show any information out of the ordinary other
than the fact that the load average is 11 while only two rsync
processes are using any CPU at all.

Here is some vmstat output. It was nontrivial to caputure since
redirecting it to a file seemed to make the I/O block vmstat far too
long for it to gather useful data. Therefore, I'm not sure if the
following is accurate, despite the way I piped it to head rather than
directly to a file (hoping the pipe would block less than actual file
I/O).


procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2 10      0   2056   5196 480120    0    0  7735  7677 1135   918 11 12  2 75
 0 12      0   2040   5208 479904    0    0  5644  8196 1106   650  8 10  0 82
 0 11      0   3240   5228 478804    0    0  1148  8248 1078   279  1  1  0 98
 0 11      0   3240   5228 478824    0    0    24  8192 1057   138  0  0  0 100
 0 11      0   4344   5224 477676    0    0  2712  8108 1082   450  4  5  0 91
 0 12      0   2104   5252 479876    0    0  1544  4064 1077   352  3  3  0 94
 0 12      0   2040   5216 479972    0    0  3384  8200 1090   626  5  5  0 90
 0 12      0   1992   5220 479952    0    0  3116  8168 1092   563  4  5  0 91
 0 11      0   4500   5224 477448    0    0    92  3996 1079   125  0  0  0 100
 0 12      0   2096   5232 479892    0    0  3260  8084 1095   589  4  6  0 90
 0 11      0   2032   5232 479912    0    0  4488  8104 1101   511  6  7  0 87
 0 10      0   3808   5232 478356    0    0  8916  2208 1142  1065 12 15  0 73
 0 10      0   4960   5192 477368    0    0 17416  3716 1219  1821 23 31  0 46
 0 11      0   4512   4892 478016    0    0 15200  5132 1206  1652 23 20  0 57
 1 10      0   3808   4900 478760    0    0  6736  5364 1133  1208  9 10  0 81
 0 11      0   2096   4844 480396    0    0 28860 17220 1299  3917 43 39  0 19
 0 10      0   4060   4848 478444    0    0  8544  8008 1129   812 12 12  0 76
 0 10      0   4044   4696 478652    0    0 10928  8112 1169  1026 16 14  0 70
 0 11      0   1996   4680 480612    0    0 12752  8184 1165  1923 19 19  0 63
 1 10      0   1996   4676 480720    0    0  5532  8176 1108   550  6  8  0 86
 0 11      0   1948   4684 480608    0    0  3976  8172 1089   429  6  7  0 87
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2 10      0   1948   4680 480728    0    0 11144  4024 1150  1683 17 14  0 69
 0 11      0   2160   4700 480424    0    0 16972 12264 1210  1740 24 25  0 51
 0 13      0   2236   4732 480524    0    0 10772  4064 1161  1607 16 15  0 69
 0 11      0   4656   4696 478044    0    0   400  8196 1074   170  0  0  0 100
 0 11      0   4656   4708 478044    0    0     8  7880 1063   129  0  2  0 98

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  8:31       ` Aaron Lehmann
@ 2003-09-25  9:13         ` Nick Piggin
  2003-09-25 10:15           ` Aaron Lehmann
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2003-09-25  9:13 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: Andrew Morton, linux-kernel



Aaron Lehmann wrote:

>On Thu, Sep 25, 2003 at 01:10:52AM -0700, Andrew Morton wrote:
>
>>A few things to run are `top', `ps' and `vmstat 1'.
>>
>
>The first two do not show any information out of the ordinary other
>than the fact that the load average is 11 while only two rsync
>processes are using any CPU at all.
>

But the load average will be 11 because there are processes stuck in the
kernel somewhere in D state. Have a look for them. They might be things
like pdflush, kswapd, scsi_*, etc. Try getting an Alt+SysRq+T dump of
them as well.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25  9:13         ` Nick Piggin
@ 2003-09-25 10:15           ` Aaron Lehmann
  2003-09-25 10:25             ` Jens Axboe
  2003-09-25 10:29             ` Nick Piggin
  0 siblings, 2 replies; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-25 10:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel

On Thu, Sep 25, 2003 at 07:13:32PM +1000, Nick Piggin wrote:
> But the load average will be 11 because there are processes stuck in the
> kernel somewhere in D state. Have a look for them. They might be things
> like pdflush, kswapd, scsi_*, etc.

They're pdflush and kjournald. I don't have sysrq support compiled in
at the moment.

I've noticed the problem does not occur when the raid can absorb data
faster than the other drive can throw data at it. My naive mind is
pretty sure that this is just an issue of way too much being queued
for writing. If someone could tell me how to control this parameter,
I'd definately give it a try [tomorrow]. All I've found on my own is
#define TW_Q_LENGTH 256 in 3w-xxxx.h and am not sure if this is the
right thing to change or safe to change.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25 10:15           ` Aaron Lehmann
@ 2003-09-25 10:25             ` Jens Axboe
  2003-09-25 10:29             ` Nick Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2003-09-25 10:25 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: Nick Piggin, Andrew Morton, linux-kernel

On Thu, Sep 25 2003, Aaron Lehmann wrote:
> On Thu, Sep 25, 2003 at 07:13:32PM +1000, Nick Piggin wrote:
> > But the load average will be 11 because there are processes stuck in the
> > kernel somewhere in D state. Have a look for them. They might be things
> > like pdflush, kswapd, scsi_*, etc.
> 
> They're pdflush and kjournald. I don't have sysrq support compiled in
> at the moment.
> 
> I've noticed the problem does not occur when the raid can absorb data
> faster than the other drive can throw data at it. My naive mind is
> pretty sure that this is just an issue of way too much being queued
> for writing. If someone could tell me how to control this parameter,
> I'd definately give it a try [tomorrow]. All I've found on my own is
> #define TW_Q_LENGTH 256 in 3w-xxxx.h and am not sure if this is the
> right thing to change or safe to change.

That is the right define, try setting it to something low. 8, or maybe
even 4. Don't go below 3, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25 10:15           ` Aaron Lehmann
  2003-09-25 10:25             ` Jens Axboe
@ 2003-09-25 10:29             ` Nick Piggin
  1 sibling, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2003-09-25 10:29 UTC (permalink / raw)
  To: Aaron Lehmann; +Cc: Andrew Morton, linux-kernel



Aaron Lehmann wrote:

>On Thu, Sep 25, 2003 at 07:13:32PM +1000, Nick Piggin wrote:
>
>>But the load average will be 11 because there are processes stuck in the
>>kernel somewhere in D state. Have a look for them. They might be things
>>like pdflush, kswapd, scsi_*, etc.
>>
>
>They're pdflush and kjournald. I don't have sysrq support compiled in
>at the moment.
>

OK, it would be good if you could get a couple of sysrq T snapshots then
and post them to the list.

>
>I've noticed the problem does not occur when the raid can absorb data
>faster than the other drive can throw data at it. My naive mind is
>pretty sure that this is just an issue of way too much being queued
>

Although your system (usr, lib, bin etc) is on the IDE disk, right?
And that is only doing reads?

How does your system behave if you are doing just the read side (ie.
going to /dev/null), or just the write side (coming from /dev/zero).

>
>for writing. If someone could tell me how to control this parameter,
>I'd definately give it a try [tomorrow]. All I've found on my own is
>#define TW_Q_LENGTH 256 in 3w-xxxx.h and am not sure if this is the
>right thing to change or safe to change.
>

That looks like it, try it at 4.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Complete I/O starvation with 3ware raid on 2.6
  2003-09-25 18:19 Adam Radford
@ 2003-09-28 22:48 ` Aaron Lehmann
  0 siblings, 0 replies; 13+ messages in thread
From: Aaron Lehmann @ 2003-09-28 22:48 UTC (permalink / raw)
  To: Adam Radford; +Cc: 'Nick Piggin', Andrew Morton, linux-kernel

On Thu, Sep 25, 2003 at 11:19:20AM -0700, Adam Radford wrote:
> You should set CONFIG_3W_XXXX_CMD_PER_LUN in your .config to 16 or 32.

Hmm, I tried this, but I learned the hard way that make oldconfig
destroys this line.

What's the right way to do this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Complete I/O starvation with 3ware raid on 2.6
@ 2003-09-25 18:19 Adam Radford
  2003-09-28 22:48 ` Aaron Lehmann
  0 siblings, 1 reply; 13+ messages in thread
From: Adam Radford @ 2003-09-25 18:19 UTC (permalink / raw)
  To: 'Nick Piggin', Aaron Lehmann; +Cc: Andrew Morton, linux-kernel

You should set CONFIG_3W_XXXX_CMD_PER_LUN in your .config to 16 or 32.

-Adam

-----Original Message-----
From: Nick Piggin [mailto:piggin@cyberone.com.au]
Sent: Thursday, September 25, 2003 3:29 AM
To: Aaron Lehmann
Cc: Andrew Morton; linux-kernel@vger.kernel.org
Subject: Re: Complete I/O starvation with 3ware raid on 2.6




Aaron Lehmann wrote:

>On Thu, Sep 25, 2003 at 07:13:32PM +1000, Nick Piggin wrote:
>
>>But the load average will be 11 because there are processes stuck in the
>>kernel somewhere in D state. Have a look for them. They might be things
>>like pdflush, kswapd, scsi_*, etc.
>>
>
>They're pdflush and kjournald. I don't have sysrq support compiled in
>at the moment.
>

OK, it would be good if you could get a couple of sysrq T snapshots then
and post them to the list.

>
>I've noticed the problem does not occur when the raid can absorb data
>faster than the other drive can throw data at it. My naive mind is
>pretty sure that this is just an issue of way too much being queued
>

Although your system (usr, lib, bin etc) is on the IDE disk, right?
And that is only doing reads?

How does your system behave if you are doing just the read side (ie.
going to /dev/null), or just the write side (coming from /dev/zero).

>
>for writing. If someone could tell me how to control this parameter,
>I'd definately give it a try [tomorrow]. All I've found on my own is
>#define TW_Q_LENGTH 256 in 3w-xxxx.h and am not sure if this is the
>right thing to change or safe to change.
>

That looks like it, try it at 4.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-09-28 22:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-25  7:12 Complete I/O starvation with 3ware raid on 2.6 Aaron Lehmann
2003-09-25  7:43 ` Andrew Morton
2003-09-25  7:50   ` Aaron Lehmann
2003-09-25  8:02     ` Nick Piggin
2003-09-25  7:58   ` Aaron Lehmann
2003-09-25  8:10     ` Andrew Morton
2003-09-25  8:31       ` Aaron Lehmann
2003-09-25  9:13         ` Nick Piggin
2003-09-25 10:15           ` Aaron Lehmann
2003-09-25 10:25             ` Jens Axboe
2003-09-25 10:29             ` Nick Piggin
2003-09-25 18:19 Adam Radford
2003-09-28 22:48 ` Aaron Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).