* Linux 2.4.21-rc6 @ 2003-05-29 0:55 Marcelo Tosatti 2003-05-29 1:22 ` Con Kolivas ` (3 more replies) 0 siblings, 4 replies; 114+ messages in thread From: Marcelo Tosatti @ 2003-05-29 0:55 UTC (permalink / raw) To: lkml Hi, Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's fix for the IO stalls/deadlocks. Please test it. Summary of changes from v2.4.21-rc5 to v2.4.21-rc6 ============================================ <c-d.hailfinger.kernel.2003@gmx.net>: o IDE config.in correctness Andi Kleen <ak@muc.de>: o x86-64 fix for the ioport problem Andrew Morton <akpm@digeo.com>: o Fix IO stalls and deadlocks Marcelo Tosatti <marcelo@freak.distro.conectiva>: o Add missing via82xxx PCI ID o Backout erroneous fsync on last opener at close() o Changed EXTRAVERSION to -rc6 ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti @ 2003-05-29 1:22 ` Con Kolivas 2003-05-29 5:24 ` Marc Wilson 2003-05-29 10:02 ` Con Kolivas ` (2 subsequent siblings) 3 siblings, 1 reply; 114+ messages in thread From: Con Kolivas @ 2003-05-29 1:22 UTC (permalink / raw) To: lkml On Thu, 29 May 2003 10:55, Marcelo Tosatti wrote: > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's fix > for the IO stalls/deadlocks. Good for you. Well done Marcelo! > Please test it. Yes everyone who gets these stalls please test it also! > Andrew Morton <akpm@digeo.com>: > o Fix IO stalls and deadlocks For those interested these are patches 1 and 2 from akpm's proposed fixes in the looong thread discussing this problem. Con ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 1:22 ` Con Kolivas @ 2003-05-29 5:24 ` Marc Wilson 2003-05-29 5:34 ` Riley Williams 0 siblings, 1 reply; 114+ messages in thread From: Marc Wilson @ 2003-05-29 5:24 UTC (permalink / raw) To: lkml On Thu, May 29, 2003 at 11:22:20AM +1000, Con Kolivas wrote: > On Thu, 29 May 2003 10:55, Marcelo Tosatti wrote: > > Andrew Morton <akpm@digeo.com>: > > o Fix IO stalls and deadlocks > > For those interested these are patches 1 and 2 from akpm's proposed fixes in > the looong thread discussing this problem. Are you sure? I'm no C programmer, but it looks to me like all three patches are in 21-rc6. And I still see the stalls, although it's much reduced. :( I just had mutt freeze cold on me though for ~15 sec when it tried to open my debian-devel mbox (rather lage file) while brag was beating on the drive. <whimper> -- Marc Wilson | You have had a long-term stimulation relative to msw@cox.net | business. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 5:24 ` Marc Wilson @ 2003-05-29 5:34 ` Riley Williams 2003-05-29 5:57 ` Marc Wilson 0 siblings, 1 reply; 114+ messages in thread From: Riley Williams @ 2003-05-29 5:34 UTC (permalink / raw) To: Marc Wilson, lkml Hi Marc. > I just had mutt > freeze cold on me though for ~15 sec when > it tried to open my debian-devel mbox (rather large file) > while brag was beating on the drive. > > <whimper> I used to get the same effect when I asked pine to open the Linux-Kernel mailbox on my system. I long since cured that by having procmail split Linux-Kernel mail into multiple mailboxes, one for each calendar week. The basic problem there is that any mail client needs to know just how many messages are in a particular folder to handle that folder, and the only way to do this is to count them all. That's what takes the time when one opens a large folder. Best wishes from Riley. --- * Nothing as pretty as a smile, nothing as ugly as a frown. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.484 / Virus Database: 282 - Release Date: 27-May-2003 ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 5:34 ` Riley Williams @ 2003-05-29 5:57 ` Marc Wilson 2003-05-29 7:15 ` Riley Williams ` (2 more replies) 0 siblings, 3 replies; 114+ messages in thread From: Marc Wilson @ 2003-05-29 5:57 UTC (permalink / raw) To: lkml On Thu, May 29, 2003 at 06:34:48AM +0100, Riley Williams wrote: > The basic problem there is that any mail client needs to know > just how many messages are in a particular folder to handle that > folder, and the only way to do this is to count them all. That's > what takes the time when one opens a large folder. No, the basic problem there is that the kernel is deadlocking. Read the VERY long thread for the details. I think I have enough on the ball to be able to tell the difference between mutt opening a folder and counting messages, with a counter and percentage indicator advancing, and mutt sitting there deadlocked with the HD activity light stuck on and all the rest of X stuck tight. And it just happened again, so -rc6 is no sure fix. What did y'all that reported the problem had gone away do, patch -rc4 with the akpm patches? ^_^ -- Marc Wilson | Fortune favors the lucky. msw@cox.net | ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: Linux 2.4.21-rc6 2003-05-29 5:57 ` Marc Wilson @ 2003-05-29 7:15 ` Riley Williams 2003-05-29 8:38 ` Willy Tarreau 2003-06-03 16:02 ` Marcelo Tosatti 2 siblings, 0 replies; 114+ messages in thread From: Riley Williams @ 2003-05-29 7:15 UTC (permalink / raw) To: Marc Wilson; +Cc: Linux Kernel List Hi Marc. >> The basic problem there is that any mail client needs to know >> just how many messages are in a particular folder to handle that >> folder, and the only way to do this is to count them all. That's >> what takes the time when one opens a large folder. > No, the basic problem there is that the kernel is deadlocking. > Read the VERY long thread for the details. > > I think I have enough on the ball to be able to tell the difference > between mutt opening a folder and counting messages, with a counter > and percentage indicator advancing, and mutt sitting there > deadlocked with the HD activity light stuck on and all the rest of > X stuck tight. I thought I was on the ball when a similar situation happened to me. What I observed was that the counters and percentage indicators were NOT advancing for about 30 seconds, and both would then jump up by about 70 messages and the relevant percent rather than counting smoothly through. It was only when I noticed those jumps that I went back to basics and analysed the folder rather than the kernel. However, I apologise profusely for assuming that my experience in what to me appear to be similar circumstances to yours could have any sort of bearing on the problem you are seeing. > And it just happened again, so -rc6 is no sure fix. What did y'all > that reported the problem had gone away do, patch -rc4 with the > akpm patches? In my case, I fixed the problem by splitting the relevant folder up, as stated in my previous message. However, such a solution apparently doesn't work for you, so I'm unable to help any further. Best wishes from Riley. --- * Nothing as pretty as a smile, nothing as ugly as a frown. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.484 / Virus Database: 282 - Release Date: 27-May-2003 ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 5:57 ` Marc Wilson 2003-05-29 7:15 ` Riley Williams @ 2003-05-29 8:38 ` Willy Tarreau 2003-05-29 8:40 ` Willy Tarreau 2003-06-03 16:02 ` Marcelo Tosatti 2 siblings, 1 reply; 114+ messages in thread From: Willy Tarreau @ 2003-05-29 8:38 UTC (permalink / raw) To: lkml Hi ! On Wed, May 28, 2003 at 10:57:35PM -0700, Marc Wilson wrote: > No, the basic problem there is that the kernel is deadlocking. Read the > VERY long thread for the details. I didn't follow this thread, what's its subject, please ? > I think I have enough on the ball to be able to tell the difference between > mutt opening a folder and counting messages, with a counter and percentage > indicator advancing, and mutt sitting there deadlocked with the HD activity > light stuck on and all the rest of X stuck tight. even on -rc3, I don't observe this behaviour. I tried from a cold cache, and mutt took a little less than 3 seconds to open LKML's May folder (35 MB), and progressed very smoothly. Since it's on my Alpha file server, I can't test with X. But the I/O bandwidth and scheduler frequency (1024 HZ) may have an impact. Cheers, Willy ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 8:38 ` Willy Tarreau @ 2003-05-29 8:40 ` Willy Tarreau 0 siblings, 0 replies; 114+ messages in thread From: Willy Tarreau @ 2003-05-29 8:40 UTC (permalink / raw) To: Willy Tarreau; +Cc: lkml On Thu, May 29, 2003 at 10:38:04AM +0200, Willy Tarreau wrote: > Hi ! > > On Wed, May 28, 2003 at 10:57:35PM -0700, Marc Wilson wrote: > > No, the basic problem there is that the kernel is deadlocking. Read the > > VERY long thread for the details. > > I didn't follow this thread, what's its subject, please ? Hmmm never mind, I easily found it (yes, VERY long) ! Cheers, Willy ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 5:57 ` Marc Wilson 2003-05-29 7:15 ` Riley Williams 2003-05-29 8:38 ` Willy Tarreau @ 2003-06-03 16:02 ` Marcelo Tosatti 2003-06-03 16:13 ` Marc-Christian Petersen ` (2 more replies) 2 siblings, 3 replies; 114+ messages in thread From: Marcelo Tosatti @ 2003-06-03 16:02 UTC (permalink / raw) To: Marc Wilson; +Cc: lkml On Wed, 28 May 2003, Marc Wilson wrote: > On Thu, May 29, 2003 at 06:34:48AM +0100, Riley Williams wrote: > > The basic problem there is that any mail client needs to know > > just how many messages are in a particular folder to handle that > > folder, and the only way to do this is to count them all. That's > > what takes the time when one opens a large folder. > > No, the basic problem there is that the kernel is deadlocking. Read the > VERY long thread for the details. > > I think I have enough on the ball to be able to tell the difference between > mutt opening a folder and counting messages, with a counter and percentage > indicator advancing, and mutt sitting there deadlocked with the HD activity > light stuck on and all the rest of X stuck tight. > > And it just happened again, so -rc6 is no sure fix. What did y'all that > reported the problem had gone away do, patch -rc4 with the akpm patches? > ^_^ Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:02 ` Marcelo Tosatti @ 2003-06-03 16:13 ` Marc-Christian Petersen 2003-06-04 21:54 ` Pavel Machek 2003-06-03 16:30 ` Michael Frank 2003-06-04 4:04 ` Marc Wilson 2 siblings, 1 reply; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-03 16:13 UTC (permalink / raw) To: Marcelo Tosatti, Marc Wilson; +Cc: lkml On Tuesday 03 June 2003 18:02, Marcelo Tosatti wrote: Hi Marcelo, > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? well, even if you mean Marc Wilson, I also have to say something (as I've written in my previous email some days ago) The pauses/stops are _a lot_ less than w/o the fix but they are _not_ gone. Tested with 2.4.21-rc6. ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:13 ` Marc-Christian Petersen @ 2003-06-04 21:54 ` Pavel Machek 2003-06-05 2:10 ` Michael Frank 0 siblings, 1 reply; 114+ messages in thread From: Pavel Machek @ 2003-06-04 21:54 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Marcelo Tosatti, Marc Wilson, lkml Hi! > > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? > well, even if you mean Marc Wilson, I also have to say something (as I've > written in my previous email some days ago) > > The pauses/stops are _a lot_ less than w/o the fix but they are _not_ gone. > Tested with 2.4.21-rc6. If hangs are not worse than 2.4.20, then I'd go ahead with release.... Pavel -- When do you have a heart between your knees? [Johanka's followup: and *two* hearts?] ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-04 21:54 ` Pavel Machek @ 2003-06-05 2:10 ` Michael Frank 0 siblings, 0 replies; 114+ messages in thread From: Michael Frank @ 2003-06-05 2:10 UTC (permalink / raw) To: Pavel Machek, Marc-Christian Petersen; +Cc: Marcelo Tosatti, Marc Wilson, lkml On Thursday 05 June 2003 05:54, Pavel Machek wrote: > Hi! > > > > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? > > > > well, even if you mean Marc Wilson, I also have to say something (as I've > > written in my previous email some days ago) > > > > The pauses/stops are _a lot_ less than w/o the fix but they are _not_ > > gone. Tested with 2.4.21-rc6. > > If hangs are not worse than 2.4.20, then I'd go ahead with release.... > > I have -rc6 running on a P4 for a few days, doing the test script, compiles, Opera and found it to be comparable to 2.4.18. It also does well on slower machines of about 1/4 the the CPU and disk bandwidth. IMHO, interactivity is reasonable (again just IMHO), and others may disagree. -- Powered by linux-2.5.70-mm3 My current linux related activities in rough order of priority: - Testing of 2.4/2.5 kernel interactivity - Testing of Swsusp for 2.4 - Testing of Opera 7.11 emphasizing interactivity - Research of NFS i/o errors during transfer 2.4>2.5 - Learning 2.5 series kernel debugging with kgdb - it's in the -mm tree - Studying 2.5 series serial and ide drivers, ACPI, S3 * Input and feedback is always welcome * ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:02 ` Marcelo Tosatti 2003-06-03 16:13 ` Marc-Christian Petersen @ 2003-06-03 16:30 ` Michael Frank 2003-06-03 16:53 ` Matthias Mueller ` (2 more replies) 2003-06-04 4:04 ` Marc Wilson 2 siblings, 3 replies; 114+ messages in thread From: Michael Frank @ 2003-06-03 16:30 UTC (permalink / raw) To: Marcelo Tosatti, Marc Wilson; +Cc: lkml On Wednesday 04 June 2003 00:02, Marcelo Tosatti wrote: > On Wed, 28 May 2003, Marc Wilson wrote: > > On Thu, May 29, 2003 at 06:34:48AM +0100, Riley Williams wrote: > > > The basic problem there is that any mail client needs to know > > > just how many messages are in a particular folder to handle that > > > folder, and the only way to do this is to count them all. That's > > > what takes the time when one opens a large folder. > > > > No, the basic problem there is that the kernel is deadlocking. Read the > > VERY long thread for the details. > > > > I think I have enough on the ball to be able to tell the difference > > between mutt opening a folder and counting messages, with a counter and > > percentage indicator advancing, and mutt sitting there deadlocked with > > the HD activity light stuck on and all the rest of X stuck tight. > > > > And it just happened again, so -rc6 is no sure fix. What did y'all that > > reported the problem had gone away do, patch -rc4 with the akpm patches? > > ^_^ > > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? -rc6 is better - comparable to 2.4.18 in what I have seen with my script. After the long obscure problems since 2.4.19x, -rc6 could use serious stress-testing. User level testing is not sufficient here - it's just like playing roulette. By serious stress-testing I mean: Everone testing comes up with one dedicated "tough test" which _must_ be reproducible (program, script) along his line of expertise/application. Two or more of these independent tests are run in combination. This method should increase the coverage drastically. Regards Michael ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:30 ` Michael Frank @ 2003-06-03 16:53 ` Matthias Mueller 2003-06-03 16:59 ` Marc-Christian Petersen 2003-06-04 14:56 ` Jakob Oestergaard 2 siblings, 0 replies; 114+ messages in thread From: Matthias Mueller @ 2003-06-03 16:53 UTC (permalink / raw) To: Michael Frank; +Cc: Marcelo Tosatti, Marc Wilson, lkml On Wed, Jun 04, 2003 at 12:30:27AM +0800, Michael Frank wrote: > On Wednesday 04 June 2003 00:02, Marcelo Tosatti wrote: > -rc6 is better - comparable to 2.4.18 in what I have seen with my script. > > After the long obscure problems since 2.4.19x, -rc6 could use serious > stress-testing. > > User level testing is not sufficient here - it's just like playing roulette. > > By serious stress-testing I mean: > > Everone testing comes up with one dedicated "tough test" > which _must_ be reproducible (program, script) along his line of > expertise/application. > > Two or more of these independent tests are run in combination. Agreed and I'm willing to run test-scripts on my system, that has these hangs (long ones with 2.4.19-pre1 to 2.4.21-rc5 and only short ones with 2.4.21-rc6). But at the moment I have neither time nor enough knowledge to write a test to reproduce it. So if someone comes up with a suitable test skript, I'm happy to try it and use it on different kernel versions. Bye, Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:30 ` Michael Frank 2003-06-03 16:53 ` Matthias Mueller @ 2003-06-03 16:59 ` Marc-Christian Petersen 2003-06-03 17:03 ` Marc-Christian Petersen 2003-06-03 17:23 ` Michael Frank 2003-06-04 14:56 ` Jakob Oestergaard 2 siblings, 2 replies; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-03 16:59 UTC (permalink / raw) To: Michael Frank, Marcelo Tosatti, Marc Wilson; +Cc: lkml On Tuesday 03 June 2003 18:30, Michael Frank wrote: Hi Michael, > > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? > -rc6 is better - comparable to 2.4.18 in what I have seen with my script. > After the long obscure problems since 2.4.19x, -rc6 could use serious > stress-testing. > User level testing is not sufficient here - it's just like playing > roulette. > By serious stress-testing I mean: > Everone testing comes up with one dedicated "tough test" > which _must_ be reproducible (program, script) along his line of > expertise/application. well, very easy one: dd if=/dev/zero of=/home/largefile bs=16384 count=131072 then use your mouse, your apps, switch between them, use them, _w/o_ pauses, delay, stops or kinda that. If _that_ will work flawlessly for everyone, then it is fixed, if not, it _needs_ to be fixed. ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:59 ` Marc-Christian Petersen @ 2003-06-03 17:03 ` Marc-Christian Petersen 2003-06-03 18:02 ` Anders Karlsson 2003-06-03 17:23 ` Michael Frank 1 sibling, 1 reply; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-03 17:03 UTC (permalink / raw) To: Michael Frank, Marcelo Tosatti, Marc Wilson; +Cc: lkml On Tuesday 03 June 2003 18:59, Marc-Christian Petersen wrote: Hi again, > well, very easy one: > dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > then use your mouse, your apps, switch between them, use them, _w/o_ > pauses, delay, stops or kinda that. If _that_ will work flawlessly for > everyone, then it is fixed, if not, it _needs_ to be fixed. I forgot to mention. If you have more than 2GB free memory (the above one will create a 2GB file), the test is useless. Have less memory free, so the machine will swap, doesn't matter if the same disk or another or whatever! ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 17:03 ` Marc-Christian Petersen @ 2003-06-03 18:02 ` Anders Karlsson 2003-06-03 21:12 ` J.A. Magallon 0 siblings, 1 reply; 114+ messages in thread From: Anders Karlsson @ 2003-06-03 18:02 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Michael Frank, Marcelo Tosatti, Marc Wilson, LKML [-- Attachment #1: Type: text/plain, Size: 1276 bytes --] Good Evening, On Tue, 2003-06-03 at 18:03, Marc-Christian Petersen wrote: > On Tuesday 03 June 2003 18:59, Marc-Christian Petersen wrote: > > Hi again, > > > well, very easy one: > > dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > > then use your mouse, your apps, switch between them, use them, _w/o_ > > pauses, delay, stops or kinda that. If _that_ will work flawlessly for > > everyone, then it is fixed, if not, it _needs_ to be fixed. > I forgot to mention. If you have more than 2GB free memory (the above one will > create a 2GB file), the test is useless. > > Have less memory free, so the machine will swap, doesn't matter if the same > disk or another or whatever! Would it count if I said I run 2.4.21-rc6-ac1 and had 768MB RAM, ended up using about 250MB swap and when I today suspended VMware and closed a few gnome-terminals, Galeon and Evolution, the mouse cursor would not move, then jump half way across the screen after a second, then 'stick' again before doing another jump. I thought it sounded a little like what you are describing. If more details are required, let me know and I will try and collect what is asked for. Regards, -- Anders Karlsson <anders@trudheim.com> Trudheim Technology Limited [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 18:02 ` Anders Karlsson @ 2003-06-03 21:12 ` J.A. Magallon 2003-06-03 21:18 ` Marc-Christian Petersen 0 siblings, 1 reply; 114+ messages in thread From: J.A. Magallon @ 2003-06-03 21:12 UTC (permalink / raw) To: Anders Karlsson Cc: Marc-Christian Petersen, Michael Frank, Marcelo Tosatti, Marc Wilson, LKML On 06.03, Anders Karlsson wrote: > Good Evening, > > On Tue, 2003-06-03 at 18:03, Marc-Christian Petersen wrote: > > On Tuesday 03 June 2003 18:59, Marc-Christian Petersen wrote: > > > > Hi again, > > > > > well, very easy one: > > > dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > > > then use your mouse, your apps, switch between them, use them, _w/o_ > > > pauses, delay, stops or kinda that. If _that_ will work flawlessly for > > > everyone, then it is fixed, if not, it _needs_ to be fixed. > > I forgot to mention. If you have more than 2GB free memory (the above one will > > create a 2GB file), the test is useless. > > > > Have less memory free, so the machine will swap, doesn't matter if the same > > disk or another or whatever! > > Would it count if I said I run 2.4.21-rc6-ac1 and had 768MB RAM, ended > up using about 250MB swap and when I today suspended VMware and closed a > few gnome-terminals, Galeon and Evolution, the mouse cursor would not > move, then jump half way across the screen after a second, then 'stick' > again before doing another jump. > One vote in the opposite sense (I know, nobody uses plain rc6 ???) I am using a -jam kernel (-aa with some additional patches), and I did not notice anything. Dual PII box with 900 Mb, as buffers were filling memory, no stalls. Just a very small (less than half a second) jump in the cursor under gnome when the memory got full, and then smooth again. I use pointer-focus and was rapidly moving the pointer from window to window to change focus and response was ok. Launching an aterm was instant. -- J.A. Magallon <jamagallon@able.es> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.2 (Cooker) for i586 Linux 2.4.21-rc6-jam1 (gcc 3.2.3 (Mandrake Linux 9.2 3.2.3-1mdk)) ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 21:12 ` J.A. Magallon @ 2003-06-03 21:18 ` Marc-Christian Petersen 0 siblings, 0 replies; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-03 21:18 UTC (permalink / raw) To: J.A. Magallon, Anders Karlsson Cc: Michael Frank, Marcelo Tosatti, Marc Wilson, LKML On Tuesday 03 June 2003 23:12, J.A. Magallon wrote: Hi J.A., > One vote in the opposite sense (I know, nobody uses plain rc6 ???) > I am using a -jam kernel (-aa with some additional patches), and I did > not notice anything. Dual PII box with 900 Mb, as buffers were filling > memory, no stalls. Just a very small (less than half a second) jump in the > cursor under gnome when the memory got full, and then smooth again. > I use pointer-focus and was rapidly moving the pointer from window to > window to change focus and response was ok. Launching an aterm was instant. once again for you ;-) -aa is using low latency elevator! Pauses/Stops are more less with it. ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:59 ` Marc-Christian Petersen 2003-06-03 17:03 ` Marc-Christian Petersen @ 2003-06-03 17:23 ` Michael Frank 1 sibling, 0 replies; 114+ messages in thread From: Michael Frank @ 2003-06-03 17:23 UTC (permalink / raw) To: Marc-Christian Petersen, Marcelo Tosatti, Marc Wilson; +Cc: lkml On Wednesday 04 June 2003 00:59, Marc-Christian Petersen wrote: > On Tuesday 03 June 2003 18:30, Michael Frank wrote: > well, very easy one: > > dd if=/dev/zero of=/home/largefile bs=16384 count=131072 Got that already - more flexible: http://www.ussg.iu.edu/hypermail/linux/kernel/0305.3/1291.html Breaks anything >= 2.4.19 < rc6 in no time. We need more - any ideas Reagards Michael ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:30 ` Michael Frank 2003-06-03 16:53 ` Matthias Mueller 2003-06-03 16:59 ` Marc-Christian Petersen @ 2003-06-04 14:56 ` Jakob Oestergaard 2 siblings, 0 replies; 114+ messages in thread From: Jakob Oestergaard @ 2003-06-04 14:56 UTC (permalink / raw) To: Michael Frank; +Cc: Marcelo Tosatti, Marc Wilson, lkml On Wed, Jun 04, 2003 at 12:30:27AM +0800, Michael Frank wrote: ... > > > > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? > > -rc6 is better - comparable to 2.4.18 in what I have seen with my script. I've run 2.4.20 for a long time, and have been seriously plagued with the I/O stalls. On a file server (details below) here I upgraded to 2.4.21-rc6 yesterday. The I/O stalls have *almost* gone away. Best of all, we still have our data intact ;) Server data: ~130 GB data on a ~150 GB ext3fs with >1 million files Software RAID-0+1 on four IDE disks Two promise controllers 1x20262 1x20269 1x Intel eepro100, 1x Intel e1000 dual PIII, half a gig of memory NFS server (mainly v3, many different clients) > > After the long obscure problems since 2.4.19x, -rc6 could use serious > stress-testing. This server rarely has load below 1, but frequently above 15. It may run some compilers and linkers locally, but most of the load comes from NFS serving. So far it's been running for 28 hours with that kind of load. Nothing suspicious in the dmesg yet. I will of course let you all know if it falls on it's knees. So far it's all thumbs-up from me! There may still be an occational stall here and there, but compared to 2.4.20 this is heaven (it really was unbelievably annoying having your emacs stall for 10 seconds every 30 seonds when someone was linking on the cluster) :) A big *thank*you* to Marcelo for deciding to include a fix for the I/O stalls! -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-06-03 16:02 ` Marcelo Tosatti 2003-06-03 16:13 ` Marc-Christian Petersen 2003-06-03 16:30 ` Michael Frank @ 2003-06-04 4:04 ` Marc Wilson 2 siblings, 0 replies; 114+ messages in thread From: Marc Wilson @ 2003-06-04 4:04 UTC (permalink / raw) To: lkml; +Cc: Marcelo Tosatti On Tue, Jun 03, 2003 at 01:02:45PM -0300, Marcelo Tosatti wrote: > Ok, so you can reproduce the hangs reliably EVEN with -rc6, Marc? Yes, with -rc6, and this: rei $ dd if=/dev/zero of=/home/mwilson/largefile bs=16384 count=131072 The mouse starts skipping soon after the box starts swapping. It eventually catches up, but then when I start up another application, it starts again. I have the test running as I type this e-mail in mutt (with vim as the editor), and there are noticeable pauses where I'm typing, but there isn't anything happening on the screen. It's *much* better than it was with my prior kernel (-rc2), but it's most definately still there. Anyone got any other test they want me to make on the box? -- Marc Wilson | You're a card which will have to be dealt with. msw@cox.net | ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti 2003-05-29 1:22 ` Con Kolivas @ 2003-05-29 10:02 ` Con Kolivas 2003-05-29 18:00 ` Georg Nikodym 2003-06-03 19:45 ` Config issue (CONFIG_X86_TSC) " Paul 3 siblings, 0 replies; 114+ messages in thread From: Con Kolivas @ 2003-05-29 10:02 UTC (permalink / raw) To: Marcelo Tosatti, lkml On Thu, 29 May 2003 10:55, Marcelo Tosatti wrote: > Hi, > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's fix > for the IO stalls/deadlocks. > > Please test it. > > > Andrew Morton <akpm@digeo.com>: > o Fix IO stalls and deadlocks As this is only patches 1 and 2 from akpm's suggested changes I was wondering if my report got lost in the huge thread so I've included it here: Ok patch combination final score for me is as follows in the presence of a large continuous write: 1 No change 2 No change 3 improvement++; minor hangs with reads 1+2 improvement+++; minor pauses with switching applications 1+2+3 improvement++++; no pauses Applications may start up slowly that's fine. The mouse cursor keeps spinning and responding at all times though with 1+2+3 which it hasn't done in 2.4 for a year or so. Is there a reason the 3rd patch was omitted? Con ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti 2003-05-29 1:22 ` Con Kolivas 2003-05-29 10:02 ` Con Kolivas @ 2003-05-29 18:00 ` Georg Nikodym 2003-05-29 19:11 ` -rc7 " Marcelo Tosatti 2003-06-03 19:45 ` Config issue (CONFIG_X86_TSC) " Paul 3 siblings, 1 reply; 114+ messages in thread From: Georg Nikodym @ 2003-05-29 18:00 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 555 bytes --] On Wed, 28 May 2003 21:55:39 -0300 (BRT) Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > fix for the IO stalls/deadlocks. While others may be dubious about the efficacy of this patch, I've been running -rc6 on my laptop now since sometime last night and have seen nothing odd. In case anybody cares, I'm using both ide and a ieee1394 (for a large external drive [which implies scsi]) and I do a _lot_ of big work with BK so I was seeing the problem within hours previously. -g [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 114+ messages in thread
* -rc7 Re: Linux 2.4.21-rc6 2003-05-29 18:00 ` Georg Nikodym @ 2003-05-29 19:11 ` Marcelo Tosatti 2003-05-29 19:56 ` Krzysiek Taraszka 2003-06-04 10:22 ` Andrea Arcangeli 0 siblings, 2 replies; 114+ messages in thread From: Marcelo Tosatti @ 2003-05-29 19:11 UTC (permalink / raw) To: Georg Nikodym; +Cc: lkml On Thu, 29 May 2003, Georg Nikodym wrote: > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > > fix for the IO stalls/deadlocks. > > While others may be dubious about the efficacy of this patch, I've been > running -rc6 on my laptop now since sometime last night and have seen > nothing odd. > > In case anybody cares, I'm using both ide and a ieee1394 (for a large > external drive [which implies scsi]) and I do a _lot_ of big work with > BK so I was seeing the problem within hours previously. Great! -rc7 will have to be released due to some problems :( ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-05-29 19:11 ` -rc7 " Marcelo Tosatti @ 2003-05-29 19:56 ` Krzysiek Taraszka 2003-05-29 20:18 ` Krzysiek Taraszka 2003-06-04 10:22 ` Andrea Arcangeli 1 sibling, 1 reply; 114+ messages in thread From: Krzysiek Taraszka @ 2003-05-29 19:56 UTC (permalink / raw) To: Marcelo Tosatti, Georg Nikodym; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 4242 bytes --] Dnia czw 29. maja 2003 21:11, Marcelo Tosatti napisał: > On Thu, 29 May 2003, Georg Nikodym wrote: > > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > > > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > > > fix for the IO stalls/deadlocks. > > > > While others may be dubious about the efficacy of this patch, I've been > > running -rc6 on my laptop now since sometime last night and have seen > > nothing odd. > > > > In case anybody cares, I'm using both ide and a ieee1394 (for a large > > external drive [which implies scsi]) and I do a _lot_ of big work with > > BK so I was seeing the problem within hours previously. > > Great! > > -rc7 will have to be released due to some problems :( hmm, seems to ide modules and others are broken. Im looking for reason why .. here are depmod errors and my .config file: make[1]: Nie nic do roboty w `modules_install'. make[1]: Opuszczam katalog `/home/users/dzimi/rpm/BUILD/linux-2.4.20/arch/i386/l ib' cd /lib/modules/2.4.21-rc6; \ mkdir -p pcmcia; \ find kernel -path '*/pcmcia/*' -name '*.o' | xargs -i -r ln -sf ../{} pcmcia if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.4.21-rc6; fi depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/ide/ide -disk.o depmod: proc_ide_read_geometry depmod: ide_remove_proc_entries depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/ide/ide -floppy.o depmod: proc_ide_read_geometry depmod: ide_remove_proc_entries depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/ide/ide -probe.o depmod: do_ide_request depmod: ide_add_generic_settings depmod: create_proc_ide_interfaces depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/ide/ide -tape.o depmod: ide_remove_proc_entries depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/ide/ide .o depmod: ide_release_dma depmod: ide_add_proc_entries depmod: cmd640_vlb depmod: ide_probe_for_cmd640x depmod: ide_scan_pcibus depmod: proc_ide_read_capacity depmod: proc_ide_create depmod: ide_remove_proc_entries depmod: destroy_proc_ide_drives depmod: proc_ide_destroy depmod: create_proc_ide_interfaces depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/drivers/net/wan /comx.o depmod: proc_get_inode depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/net/atm/common. o depmod: free_atm_vcc_sk depmod: atm_init_aal34 depmod: alloc_atm_vcc_sk depmod: atm_init_aal0 depmod: atm_devs depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/net/atm/pvc.o depmod: atm_getsockopt depmod: atm_recvmsg depmod: atm_release depmod: atm_ioctl depmod: atm_create depmod: atm_sendmsg depmod: atm_poll depmod: atm_connect depmod: atm_proc_init depmod: atm_setsockopt depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/net/atm/resourc es.o depmod: atm_proc_dev_deregister depmod: atm_proc_dev_register depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/net/atm/signali ng.o depmod: nodev_vccs depmod: atm_devs depmod: *** Unresolved symbols in /lib/modules/2.4.21-rc6/kernel/net/atm/svc.o depmod: atm_getsockopt depmod: atm_recvmsg depmod: free_atm_vcc_sk depmod: atm_ioctl depmod: atm_create depmod: atm_sendmsg depmod: atm_poll depmod: atm_connect depmod: atm_release_vcc_sk depmod: atm_setsockopt my .config (it's distro config, im PLD kernel packager) is include. Ok im going to fix those trivial (?) problems, when i worked around 2.2.x i made some hacks on ksyms.c, was it corect ? -- Krzysiek Taraszka (dzimi@pld.org.pl) http://cyborg.kernel.pl/~dzimi/ [-- Attachment #2: .config --] [-- Type: text/plain, Size: 39697 bytes --] # # Automatically generated by make menuconfig: don't edit # CONFIG_X86=y # CONFIG_SBUS is not set CONFIG_UID16=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y # # Loadable module support # CONFIG_MODULES=y # CONFIG_MODVERSIONS is not set CONFIG_KMOD=y # # Processor type and features # # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set CONFIG_MK7=y # CONFIG_MK8 is not set # CONFIG_MELAN is not set # CONFIG_MCRUSOE is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_HAS_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_USE_3DNOW=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_F00F_WORKS_OK=y CONFIG_X86_MCE=y CONFIG_TOSHIBA=m CONFIG_I8K=m CONFIG_MICROCODE=m CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_HIGHMEM=y CONFIG_HIGHIO=y # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_SMP is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_TSC_DISABLE is not set CONFIG_X86_TSC=y # # General setup # CONFIG_NET=y CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_ISA=y CONFIG_PCI_NAMES=y CONFIG_EISA=y CONFIG_MCA=y CONFIG_HOTPLUG=y # # PCMCIA/CardBus support # CONFIG_PCMCIA=m CONFIG_CARDBUS=y CONFIG_TCIC=y CONFIG_I82092=y CONFIG_I82365=y # # PCI Hotplug Support # CONFIG_HOTPLUG_PCI=m CONFIG_HOTPLUG_PCI_COMPAQ=m # CONFIG_HOTPLUG_PCI_COMPAQ_NVRAM is not set CONFIG_HOTPLUG_PCI_IBM=m CONFIG_HOTPLUG_PCI_ACPI=m CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y # CONFIG_KCORE_AOUT is not set CONFIG_BINFMT_AOUT=m CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=m CONFIG_PM=y CONFIG_ACPI=y # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_BUSMGR=m CONFIG_ACPI_SYS=m CONFIG_ACPI_CPU=m CONFIG_ACPI_BUTTON=m CONFIG_ACPI_AC=m CONFIG_ACPI_EC=m CONFIG_ACPI_CMBATT=m CONFIG_ACPI_THERMAL=m CONFIG_APM=m # CONFIG_APM_IGNORE_USER_SUSPEND is not set # CONFIG_APM_DO_ENABLE is not set # CONFIG_APM_CPU_IDLE is not set # CONFIG_APM_DISPLAY_BLANK is not set CONFIG_APM_RTC_IS_GMT=y # CONFIG_APM_ALLOW_INTS is not set CONFIG_APM_REAL_MODE_POWER_OFF=y # # Memory Technology Devices (MTD) # CONFIG_MTD=m # CONFIG_MTD_DEBUG is not set CONFIG_MTD_PARTITIONS=m CONFIG_MTD_CONCAT=m CONFIG_MTD_REDBOOT_PARTS=m # CONFIG_MTD_CMDLINE_PARTS is not set CONFIG_MTD_CHAR=m CONFIG_MTD_BLOCK=m CONFIG_MTD_BLOCK_RO=m CONFIG_FTL=m CONFIG_NFTL=m CONFIG_NFTL_RW=y # # RAM/ROM/Flash chip drivers # CONFIG_MTD_CFI=m CONFIG_MTD_JEDECPROBE=m CONFIG_MTD_GEN_PROBE=m CONFIG_MTD_CFI_ADV_OPTIONS=y CONFIG_MTD_CFI_NOSWAP=y # CONFIG_MTD_CFI_BE_BYTE_SWAP is not set # CONFIG_MTD_CFI_LE_BYTE_SWAP is not set # CONFIG_MTD_CFI_GEOMETRY is not set CONFIG_MTD_CFI_INTELEXT=m CONFIG_MTD_CFI_AMDSTD=m # CONFIG_MTD_CFI_STAA is not set CONFIG_MTD_RAM=m CONFIG_MTD_ROM=m CONFIG_MTD_ABSENT=m # CONFIG_MTD_OBSOLETE_CHIPS is not set # CONFIG_MTD_AMDSTD is not set # CONFIG_MTD_SHARP is not set # CONFIG_MTD_JEDEC is not set # # Mapping drivers for chip access # CONFIG_MTD_PHYSMAP=m CONFIG_MTD_PHYSMAP_START=8000000 CONFIG_MTD_PHYSMAP_LEN=4000000 CONFIG_MTD_PHYSMAP_BUSWIDTH=2 CONFIG_MTD_PNC2000=m CONFIG_MTD_SC520CDP=m CONFIG_MTD_NETSC520=m CONFIG_MTD_SBC_GXX=m CONFIG_MTD_ELAN_104NC=m CONFIG_MTD_DILNETPC=m CONFIG_MTD_DILNETPC_BOOTSIZE=80000 # CONFIG_MTD_MIXMEM is not set # CONFIG_MTD_OCTAGON is not set # CONFIG_MTD_VMAX is not set # CONFIG_MTD_SCx200_DOCFLASH is not set CONFIG_MTD_L440GX=m # CONFIG_MTD_AMD76XROM is not set CONFIG_MTD_ICH2ROM=m # CONFIG_MTD_NETtel is not set # CONFIG_MTD_SCB2_FLASH is not set CONFIG_MTD_PCI=m # CONFIG_MTD_PCMCIA is not set # # Self-contained MTD device drivers # CONFIG_MTD_PMC551=m CONFIG_MTD_PMC551_BUGFIX=y # CONFIG_MTD_PMC551_DEBUG is not set CONFIG_MTD_SLRAM=m CONFIG_MTD_MTDRAM=m CONFIG_MTDRAM_TOTAL_SIZE=4096 CONFIG_MTDRAM_ERASE_SIZE=128 CONFIG_MTD_BLKMTD=m CONFIG_MTD_DOC1000=m CONFIG_MTD_DOC2000=m CONFIG_MTD_DOC2001=m CONFIG_MTD_DOCPROBE=m CONFIG_MTD_DOCPROBE_ADVANCED=y CONFIG_MTD_DOCPROBE_ADDRESS=0000 CONFIG_MTD_DOCPROBE_HIGH=y CONFIG_MTD_DOCPROBE_55AA=y # # NAND Flash Device Drivers # CONFIG_MTD_NAND=m CONFIG_MTD_NAND_VERIFY_WRITE=y CONFIG_MTD_NAND_IDS=m # # Parallel port support # CONFIG_PARPORT=m CONFIG_PARPORT_PC=m CONFIG_PARPORT_PC_CML1=m CONFIG_PARPORT_SERIAL=m CONFIG_PARPORT_PC_FIFO=y CONFIG_PARPORT_PC_SUPERIO=y CONFIG_PARPORT_PC_PCMCIA=m # CONFIG_PARPORT_AMIGA is not set # CONFIG_PARPORT_MFC3 is not set # CONFIG_PARPORT_ATARI is not set # CONFIG_PARPORT_GSC is not set # CONFIG_PARPORT_SUNBPP is not set # CONFIG_PARPORT_OTHER is not set CONFIG_PARPORT_1284=y # # Plug and Play configuration # CONFIG_PNP=m CONFIG_ISAPNP=m # # Block devices # CONFIG_BLK_DEV_FD=m CONFIG_BLK_DEV_PS2=m CONFIG_BLK_DEV_XD=m CONFIG_PARIDE=m CONFIG_PARIDE_PARPORT=m CONFIG_PARIDE_PD=m CONFIG_PARIDE_PCD=m CONFIG_PARIDE_PF=m CONFIG_PARIDE_PT=m CONFIG_PARIDE_PG=m CONFIG_PARIDE_ATEN=m CONFIG_PARIDE_BPCK=m CONFIG_PARIDE_BPCK6=m CONFIG_PARIDE_COMM=m CONFIG_PARIDE_DSTR=m CONFIG_PARIDE_FIT2=m CONFIG_PARIDE_FIT3=m CONFIG_PARIDE_EPAT=m CONFIG_PARIDE_EPATC8=y CONFIG_PARIDE_EPIA=m CONFIG_PARIDE_FRIQ=m CONFIG_PARIDE_FRPW=m CONFIG_PARIDE_KBIC=m CONFIG_PARIDE_KTTI=m CONFIG_PARIDE_ON20=m CONFIG_PARIDE_ON26=m CONFIG_BLK_CPQ_DA=m CONFIG_BLK_CPQ_CISS_DA=m CONFIG_CISS_SCSI_TAPE=y CONFIG_BLK_DEV_DAC960=m CONFIG_BLK_DEV_UMEM=m CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_INITRD=y CONFIG_BLK_STATS=y # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=m CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m CONFIG_MD_RAID5=m CONFIG_MD_MULTIPATH=m CONFIG_BLK_DEV_LVM=m # # Networking options # CONFIG_PACKET=m CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set CONFIG_FILTER=y CONFIG_UNIX=m CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_FWMARK=y CONFIG_IP_ROUTE_NAT=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_TOS=y CONFIG_IP_ROUTE_VERBOSE=y CONFIG_IP_ROUTE_LARGE_TABLES=y # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=m CONFIG_NET_IPGRE=m CONFIG_NET_IPGRE_BROADCAST=y CONFIG_IP_MROUTE=y CONFIG_IP_PIMSM_V1=y CONFIG_IP_PIMSM_V2=y # CONFIG_ARPD is not set # CONFIG_INET_ECN is not set CONFIG_SYN_COOKIES=y # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m CONFIG_IP_NF_FTP=m # CONFIG_IP_NF_AMANDA is not set # CONFIG_IP_NF_TFTP is not set CONFIG_IP_NF_IRC=m CONFIG_IP_NF_QUEUE=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_LIMIT=m CONFIG_IP_NF_MATCH_MAC=m CONFIG_IP_NF_MATCH_PKTTYPE=m CONFIG_IP_NF_MATCH_MARK=m CONFIG_IP_NF_MATCH_MULTIPORT=m CONFIG_IP_NF_MATCH_TOS=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_DSCP=m CONFIG_IP_NF_MATCH_AH_ESP=m CONFIG_IP_NF_MATCH_LENGTH=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_MATCH_TCPMSS=m CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_UNCLEAN=m CONFIG_IP_NF_MATCH_OWNER=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_MIRROR=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_TARGET_REDIRECT=m CONFIG_IP_NF_NAT_LOCAL=y CONFIG_IP_NF_NAT_SNMP_BASIC=m CONFIG_IP_NF_NAT_IRC=m CONFIG_IP_NF_NAT_FTP=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_TOS=m CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_LOG=m CONFIG_IP_NF_TARGET_ULOG=m CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_COMPAT_IPCHAINS=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_COMPAT_IPFWADM=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IPV6=m # # IPv6: Netfilter Configuration # CONFIG_IP6_NF_QUEUE=m CONFIG_IP6_NF_IPTABLES=m CONFIG_IP6_NF_MATCH_LIMIT=m CONFIG_IP6_NF_MATCH_MAC=m # CONFIG_IP6_NF_MATCH_RT is not set # CONFIG_IP6_NF_MATCH_OPTS is not set # CONFIG_IP6_NF_MATCH_FRAG is not set # CONFIG_IP6_NF_MATCH_HL is not set CONFIG_IP6_NF_MATCH_MULTIPORT=m CONFIG_IP6_NF_MATCH_OWNER=m CONFIG_IP6_NF_MATCH_MARK=m # CONFIG_IP6_NF_MATCH_IPV6HEADER is not set # CONFIG_IP6_NF_MATCH_AHESP is not set CONFIG_IP6_NF_MATCH_LENGTH=m CONFIG_IP6_NF_MATCH_EUI64=m CONFIG_IP6_NF_FILTER=m CONFIG_IP6_NF_TARGET_LOG=m CONFIG_IP6_NF_MANGLE=m CONFIG_IP6_NF_TARGET_MARK=m CONFIG_KHTTPD=m CONFIG_ATM=m CONFIG_VLAN_8021Q=m CONFIG_IPX=m CONFIG_IPX_INTERN=y CONFIG_ATALK=m # # Appletalk devices # CONFIG_DEV_APPLETALK=y CONFIG_LTPC=m CONFIG_COPS=m CONFIG_COPS_DAYNA=y CONFIG_COPS_TANGENT=y CONFIG_IPDDP=m CONFIG_IPDDP_ENCAP=y CONFIG_IPDDP_DECAP=y CONFIG_DECNET=m CONFIG_DECNET_SIOCGIFCONF=y CONFIG_DECNET_ROUTER=y # CONFIG_DECNET_ROUTE_FWMARK is not set CONFIG_BRIDGE=m CONFIG_X25=m CONFIG_LAPB=m CONFIG_LLC=y CONFIG_NET_DIVERT=y CONFIG_ECONET=m CONFIG_ECONET_AUNUDP=y CONFIG_ECONET_NATIVE=y CONFIG_WAN_ROUTER=m # CONFIG_NET_FASTROUTE is not set # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # CONFIG_NET_SCHED=y CONFIG_NET_SCH_CBQ=m CONFIG_NET_SCH_HTB=m CONFIG_NET_SCH_CSZ=m CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_SFQ=m CONFIG_NET_SCH_TEQL=m CONFIG_NET_SCH_TBF=m CONFIG_NET_SCH_GRED=m CONFIG_NET_SCH_DSMARK=m CONFIG_NET_SCH_INGRESS=m CONFIG_NET_QOS=y CONFIG_NET_ESTIMATOR=y CONFIG_NET_CLS=y CONFIG_NET_CLS_TCINDEX=m CONFIG_NET_CLS_ROUTE4=m CONFIG_NET_CLS_ROUTE=y CONFIG_NET_CLS_FW=m CONFIG_NET_CLS_U32=m CONFIG_NET_CLS_RSVP=m CONFIG_NET_CLS_RSVP6=m CONFIG_NET_CLS_POLICE=y # # Network testing # CONFIG_NET_PKTGEN=m # # Telephony Support # CONFIG_PHONE=m CONFIG_PHONE_IXJ=m CONFIG_PHONE_IXJ_PCMCIA=m # # ATA/IDE/MFM/RLL support # CONFIG_IDE=m # # IDE, ATA and ATAPI Block devices # CONFIG_BLK_DEV_IDE=m # CONFIG_BLK_DEV_HD_IDE is not set # CONFIG_BLK_DEV_HD is not set CONFIG_BLK_DEV_IDEDISK=m # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_IDEDISK_STROKE=y CONFIG_BLK_DEV_IDECS=m CONFIG_BLK_DEV_IDECD=m CONFIG_BLK_DEV_IDETAPE=m CONFIG_BLK_DEV_IDEFLOPPY=m CONFIG_BLK_DEV_IDESCSI=m CONFIG_IDE_TASK_IOCTL=y CONFIG_BLK_DEV_CMD640=y # CONFIG_BLK_DEV_CMD640_ENHANCED is not set # CONFIG_BLK_DEV_ISAPNP is not set CONFIG_BLK_DEV_IDEPCI=y # CONFIG_BLK_DEV_GENERIC is not set CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_BLK_DEV_OFFBOARD=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set # CONFIG_IDEDMA_PCI_AUTO is not set # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_PCI_WIP is not set # CONFIG_BLK_DEV_ADMA100 is not set CONFIG_BLK_DEV_AEC62XX=y CONFIG_BLK_DEV_ALI15X3=y # CONFIG_WDC_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y # CONFIG_AMD74XX_OVERRIDE is not set CONFIG_BLK_DEV_CMD64X=y # CONFIG_BLK_DEV_TRIFLEX is not set CONFIG_BLK_DEV_CY82C693=y CONFIG_BLK_DEV_CS5530=y CONFIG_BLK_DEV_HPT34X=y # CONFIG_HPT34X_AUTODMA is not set CONFIG_BLK_DEV_HPT366=y CONFIG_BLK_DEV_PIIX=y CONFIG_BLK_DEV_NS87415=y CONFIG_BLK_DEV_OPTI621=y # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_PDC202XX_BURST is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set CONFIG_BLK_DEV_RZ1000=y # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_SVWKS=y # CONFIG_BLK_DEV_SIIMAGE is not set CONFIG_BLK_DEV_SIS5513=y CONFIG_BLK_DEV_SLC90E66=y CONFIG_BLK_DEV_TRM290=y CONFIG_BLK_DEV_VIA82CXXX=y CONFIG_IDE_CHIPSETS=y CONFIG_BLK_DEV_4DRIVES=y CONFIG_BLK_DEV_ALI14XX=m CONFIG_BLK_DEV_DTC2278=m CONFIG_BLK_DEV_HT6560B=m # CONFIG_BLK_DEV_PDC4030 is not set CONFIG_BLK_DEV_QD65XX=m CONFIG_BLK_DEV_UMC8672=m # CONFIG_IDEDMA_AUTO is not set # CONFIG_IDEDMA_IVB is not set # CONFIG_DMA_NONPCI is not set CONFIG_BLK_DEV_IDE_MODES=y CONFIG_BLK_DEV_ATARAID=m CONFIG_BLK_DEV_ATARAID_PDC=m CONFIG_BLK_DEV_ATARAID_HPT=m # CONFIG_BLK_DEV_ATARAID_SII is not set # # SCSI support # CONFIG_SCSI=m CONFIG_BLK_DEV_SD=m CONFIG_SD_EXTRA_DEVS=64 CONFIG_CHR_DEV_ST=m CONFIG_CHR_DEV_OSST=m CONFIG_BLK_DEV_SR=m CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_SR_EXTRA_DEVS=4 CONFIG_CHR_DEV_SG=m # CONFIG_SCSI_DEBUG_QUEUES is not set CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y # # SCSI low-level drivers # CONFIG_BLK_DEV_3W_XXXX_RAID=m CONFIG_SCSI_7000FASST=m CONFIG_SCSI_ACARD=m CONFIG_SCSI_AHA152X=m CONFIG_SCSI_AHA1542=m CONFIG_SCSI_AHA1740=m CONFIG_SCSI_AACRAID=m CONFIG_SCSI_AIC7XXX=m CONFIG_AIC7XXX_CMDS_PER_DEVICE=253 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 CONFIG_AIC7XXX_PROBE_EISA_VL=y # CONFIG_AIC7XXX_BUILD_FIRMWARE is not set # CONFIG_SCSI_AIC79XX is not set CONFIG_SCSI_AIC7XXX_OLD=m CONFIG_AIC7XXX_OLD_TCQ_ON_BY_DEFAULT=y CONFIG_AIC7XXX_OLD_CMDS_PER_DEVICE=128 CONFIG_AIC7XXX_OLD_PROC_STATS=y CONFIG_SCSI_DPT_I2O=m CONFIG_SCSI_ADVANSYS=m CONFIG_SCSI_IN2000=m CONFIG_SCSI_AM53C974=m CONFIG_SCSI_MEGARAID=m CONFIG_SCSI_BUSLOGIC=m # CONFIG_SCSI_OMIT_FLASHPOINT is not set CONFIG_SCSI_CPQFCTS=m CONFIG_SCSI_DMX3191D=m CONFIG_SCSI_DTC3280=m CONFIG_SCSI_EATA=m CONFIG_SCSI_EATA_TAGGED_QUEUE=y # CONFIG_SCSI_EATA_LINKED_COMMANDS is not set CONFIG_SCSI_EATA_MAX_TAGS=16 CONFIG_SCSI_EATA_DMA=m CONFIG_SCSI_EATA_PIO=m CONFIG_SCSI_FUTURE_DOMAIN=m CONFIG_SCSI_FD_MCS=m CONFIG_SCSI_GDTH=m CONFIG_SCSI_GENERIC_NCR5380=m # CONFIG_SCSI_GENERIC_NCR53C400 is not set CONFIG_SCSI_G_NCR5380_PORT=y # CONFIG_SCSI_G_NCR5380_MEM is not set CONFIG_SCSI_IBMMCA=m CONFIG_IBMMCA_SCSI_ORDER_STANDARD=y # CONFIG_IBMMCA_SCSI_DEV_RESET is not set CONFIG_SCSI_IPS=m CONFIG_SCSI_INITIO=m CONFIG_SCSI_INIA100=m CONFIG_SCSI_PPA=m CONFIG_SCSI_IMM=m # CONFIG_SCSI_IZIP_EPP16 is not set # CONFIG_SCSI_IZIP_SLOW_CTR is not set CONFIG_SCSI_NCR53C406A=m CONFIG_SCSI_NCR_D700=m CONFIG_53C700_IO_MAPPED=y CONFIG_SCSI_NCR53C7xx=m # CONFIG_SCSI_NCR53C7xx_sync is not set CONFIG_SCSI_NCR53C7xx_FAST=y CONFIG_SCSI_NCR53C7xx_DISCONNECT=y CONFIG_SCSI_SYM53C8XX_2=m CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 CONFIG_SCSI_SYM53C8XX_IOMAPPED=y CONFIG_SCSI_NCR53C8XX=m CONFIG_SCSI_SYM53C8XX=m CONFIG_SCSI_NCR53C8XX_DEFAULT_TAGS=8 CONFIG_SCSI_NCR53C8XX_MAX_TAGS=32 CONFIG_SCSI_NCR53C8XX_SYNC=20 # CONFIG_SCSI_NCR53C8XX_PROFILE is not set # CONFIG_SCSI_NCR53C8XX_IOMAPPED is not set CONFIG_SCSI_NCR53C8XX_PQS_PDS=y # CONFIG_SCSI_NCR53C8XX_SYMBIOS_COMPAT is not set CONFIG_SCSI_MCA_53C9X=m CONFIG_SCSI_PAS16=m CONFIG_SCSI_PCI2000=m CONFIG_SCSI_PCI2220I=m CONFIG_SCSI_PSI240I=m CONFIG_SCSI_QLOGIC_FAS=m CONFIG_SCSI_QLOGIC_ISP=m CONFIG_SCSI_QLOGIC_FC=m # CONFIG_SCSI_QLOGIC_FC_FIRMWARE is not set CONFIG_SCSI_QLOGIC_1280=m CONFIG_SCSI_SEAGATE=m CONFIG_SCSI_SIM710=m CONFIG_SCSI_SYM53C416=m CONFIG_SCSI_DC390T=m # CONFIG_SCSI_DC390T_NOGENSUPP is not set CONFIG_SCSI_T128=m CONFIG_SCSI_U14_34F=m # CONFIG_SCSI_U14_34F_LINKED_COMMANDS is not set CONFIG_SCSI_U14_34F_MAX_TAGS=8 CONFIG_SCSI_ULTRASTOR=m # CONFIG_SCSI_NSP32 is not set CONFIG_SCSI_DEBUG=m # # PCMCIA SCSI adapter support # CONFIG_SCSI_PCMCIA=y CONFIG_PCMCIA_AHA152X=m CONFIG_PCMCIA_FDOMAIN=m CONFIG_PCMCIA_NINJA_SCSI=m CONFIG_PCMCIA_QLOGIC=m # # Fusion MPT device support # CONFIG_FUSION=m # CONFIG_FUSION_BOOT is not set CONFIG_FUSION_MAX_SGE=40 CONFIG_FUSION_ISENSE=m CONFIG_FUSION_CTL=m CONFIG_FUSION_LAN=m CONFIG_NET_FC=y # # IEEE 1394 (FireWire) support (EXPERIMENTAL) # CONFIG_IEEE1394=m CONFIG_IEEE1394_PCILYNX=m CONFIG_IEEE1394_OHCI1394=m CONFIG_IEEE1394_VIDEO1394=m CONFIG_IEEE1394_SBP2=m CONFIG_IEEE1394_SBP2_PHYS_DMA=y CONFIG_IEEE1394_ETH1394=m CONFIG_IEEE1394_DV1394=m CONFIG_IEEE1394_RAWIO=m CONFIG_IEEE1394_CMP=m CONFIG_IEEE1394_AMDTP=m # CONFIG_IEEE1394_VERBOSEDEBUG is not set # # I2O device support # CONFIG_I2O=m CONFIG_I2O_PCI=m CONFIG_I2O_BLOCK=m CONFIG_I2O_LAN=m CONFIG_I2O_SCSI=m CONFIG_I2O_PROC=m # # Network device support # CONFIG_NETDEVICES=y # # ARCnet devices # CONFIG_ARCNET=m CONFIG_ARCNET_1201=m CONFIG_ARCNET_1051=m CONFIG_ARCNET_RAW=m CONFIG_ARCNET_COM90xx=m CONFIG_ARCNET_COM90xxIO=m CONFIG_ARCNET_RIM_I=m CONFIG_ARCNET_COM20020=m CONFIG_ARCNET_COM20020_ISA=m CONFIG_ARCNET_COM20020_PCI=m CONFIG_DUMMY=m CONFIG_BONDING=m CONFIG_EQUALIZER=m CONFIG_TUN=m CONFIG_ETHERTAP=m CONFIG_NET_SB1000=m # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y # CONFIG_SUNLANCE is not set CONFIG_HAPPYMEAL=m # CONFIG_SUNBMAC is not set # CONFIG_SUNQE is not set CONFIG_SUNGEM=m CONFIG_NET_VENDOR_3COM=y CONFIG_EL1=m CONFIG_EL2=m CONFIG_ELPLUS=m CONFIG_EL16=m CONFIG_EL3=m CONFIG_3C515=m CONFIG_ELMC=m CONFIG_ELMC_II=m CONFIG_VORTEX=m # CONFIG_TYPHOON is not set CONFIG_LANCE=m CONFIG_NET_VENDOR_SMC=y CONFIG_WD80x3=m CONFIG_ULTRAMCA=m CONFIG_ULTRA=m CONFIG_ULTRA32=m CONFIG_SMC9194=m CONFIG_NET_VENDOR_RACAL=y CONFIG_NI5010=m CONFIG_NI52=m CONFIG_NI65=m CONFIG_AT1700=m CONFIG_DEPCA=m CONFIG_HP100=m CONFIG_NET_ISA=y CONFIG_E2100=m CONFIG_EWRK3=m CONFIG_EEXPRESS=m CONFIG_EEXPRESS_PRO=m CONFIG_HPLAN_PLUS=m CONFIG_HPLAN=m CONFIG_LP486E=m CONFIG_ETH16I=m CONFIG_NE2000=m CONFIG_SKMC=m CONFIG_NE2_MCA=m CONFIG_IBMLANA=m CONFIG_NET_PCI=y CONFIG_PCNET32=m # CONFIG_AMD8111_ETH is not set CONFIG_ADAPTEC_STARFIRE=m CONFIG_AC3200=m CONFIG_APRICOT=m CONFIG_CS89x0=m CONFIG_TULIP=m CONFIG_TULIP_MWI=y CONFIG_TULIP_MMIO=y CONFIG_DE4X5=m CONFIG_DGRS=m CONFIG_DM9102=m CONFIG_EEPRO100=m # CONFIG_EEPRO100_PIO is not set CONFIG_E100=m CONFIG_LNE390=m CONFIG_FEALNX=m CONFIG_NATSEMI=m CONFIG_NE2K_PCI=m CONFIG_NE3210=m CONFIG_ES3210=m CONFIG_8139CP=m CONFIG_8139TOO=m # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set CONFIG_8139TOO_8129=y # CONFIG_8139_OLD_RX_RESET is not set CONFIG_SIS900=m CONFIG_EPIC100=m CONFIG_SUNDANCE=m # CONFIG_SUNDANCE_MMIO is not set CONFIG_TLAN=m CONFIG_TC35815=m CONFIG_VIA_RHINE=m # CONFIG_VIA_RHINE_MMIO is not set CONFIG_WINBOND_840=m CONFIG_NET_POCKET=y CONFIG_ATP=m CONFIG_DE600=m CONFIG_DE620=m # # Ethernet (1000 Mbit) # CONFIG_ACENIC=m # CONFIG_ACENIC_OMIT_TIGON_I is not set CONFIG_DL2K=m CONFIG_E1000=m # CONFIG_MYRI_SBUS is not set CONFIG_NS83820=m CONFIG_HAMACHI=m CONFIG_YELLOWFIN=m # CONFIG_R8169 is not set CONFIG_SK98LIN=m CONFIG_TIGON3=m CONFIG_FDDI=y CONFIG_DEFXX=m CONFIG_SKFP=m CONFIG_HIPPI=y CONFIG_ROADRUNNER=m # CONFIG_ROADRUNNER_LARGE_RINGS is not set CONFIG_PLIP=m CONFIG_PPP=m CONFIG_PPP_MULTILINK=y CONFIG_PPP_FILTER=y CONFIG_PPP_ASYNC=m CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m CONFIG_PPPOE=m CONFIG_PPPOATM=m CONFIG_SLIP=m CONFIG_SLIP_COMPRESSED=y CONFIG_SLIP_SMART=y CONFIG_SLIP_MODE_SLIP6=y # # Wireless LAN (non-hamradio) # CONFIG_NET_RADIO=y CONFIG_STRIP=m CONFIG_WAVELAN=m CONFIG_ARLAN=m CONFIG_AIRONET4500=m CONFIG_AIRONET4500_NONCS=m CONFIG_AIRONET4500_PNP=y CONFIG_AIRONET4500_PCI=y # CONFIG_AIRONET4500_ISA is not set # CONFIG_AIRONET4500_I365 is not set CONFIG_AIRONET4500_PROC=m CONFIG_AIRO=m CONFIG_HERMES=m CONFIG_PLX_HERMES=m CONFIG_PCI_HERMES=m CONFIG_PCMCIA_HERMES=m CONFIG_AIRO_CS=m CONFIG_NET_WIRELESS=y # # Token Ring devices # CONFIG_TR=y CONFIG_IBMTR=m CONFIG_IBMOL=m CONFIG_IBMLS=m CONFIG_3C359=m CONFIG_TMS380TR=m CONFIG_TMSPCI=m CONFIG_TMSISA=m CONFIG_ABYSS=m CONFIG_MADGEMC=m CONFIG_SMCTR=m CONFIG_NET_FC=y CONFIG_IPHASE5526=m CONFIG_RCPCI=m CONFIG_SHAPER=m # # Wan interfaces # CONFIG_WAN=y CONFIG_HOSTESS_SV11=m CONFIG_COSA=m CONFIG_COMX=m CONFIG_COMX_HW_COMX=m CONFIG_COMX_HW_LOCOMX=m CONFIG_COMX_HW_MIXCOM=m CONFIG_COMX_HW_MUNICH=m CONFIG_COMX_PROTO_PPP=m CONFIG_COMX_PROTO_LAPB=m CONFIG_COMX_PROTO_FR=m CONFIG_DSCC4=m CONFIG_LANMEDIA=m CONFIG_ATI_XX20=m CONFIG_SEALEVEL_4021=m CONFIG_SYNCLINK_SYNCPPP=m CONFIG_HDLC=m # CONFIG_HDLC_RAW is not set # CONFIG_HDLC_CISCO is not set # CONFIG_HDLC_FR is not set CONFIG_HDLC_PPP=y CONFIG_HDLC_X25=y CONFIG_N2=m CONFIG_C101=m CONFIG_FARSYNC=m # CONFIG_HDLC_DEBUG_PKT is not set # CONFIG_HDLC_DEBUG_HARD_HEADER is not set # CONFIG_HDLC_DEBUG_ECN is not set # CONFIG_HDLC_DEBUG_RINGS is not set CONFIG_DLCI=m CONFIG_DLCI_COUNT=24 CONFIG_DLCI_MAX=8 CONFIG_SDLA=m CONFIG_WAN_ROUTER_DRIVERS=y CONFIG_VENDOR_SANGOMA=m CONFIG_WANPIPE_CHDLC=y CONFIG_WANPIPE_FR=y CONFIG_WANPIPE_X25=y CONFIG_WANPIPE_PPP=y CONFIG_WANPIPE_MULTPPP=y CONFIG_CYCLADES_SYNC=m CONFIG_CYCLOMX_X25=y CONFIG_LAPBETHER=m CONFIG_X25_ASY=m CONFIG_SBNI=m # CONFIG_SBNI_MULTILINE is not set # # PCMCIA network device support # CONFIG_NET_PCMCIA=y CONFIG_PCMCIA_3C589=m CONFIG_PCMCIA_3C574=m CONFIG_PCMCIA_FMVJ18X=m CONFIG_PCMCIA_PCNET=m CONFIG_PCMCIA_AXNET=m CONFIG_PCMCIA_NMCLAN=m CONFIG_PCMCIA_SMC91C92=m CONFIG_PCMCIA_XIRC2PS=m CONFIG_ARCNET_COM20020_CS=m CONFIG_PCMCIA_IBMTR=m CONFIG_PCMCIA_XIRCOM=m CONFIG_PCMCIA_XIRTULIP=m CONFIG_NET_PCMCIA_RADIO=y CONFIG_PCMCIA_RAYCS=m CONFIG_PCMCIA_NETWAVE=m CONFIG_PCMCIA_WAVELAN=m CONFIG_AIRONET4500_CS=m # # Amateur Radio support # CONFIG_HAMRADIO=y CONFIG_AX25=m CONFIG_AX25_DAMA_SLAVE=y CONFIG_NETROM=m CONFIG_ROSE=m # # AX.25 network device drivers # CONFIG_MKISS=m CONFIG_6PACK=m CONFIG_BPQETHER=m CONFIG_DMASCC=m CONFIG_SCC=m # CONFIG_SCC_DELAY is not set # CONFIG_SCC_TRXECHO is not set CONFIG_BAYCOM_SER_FDX=m CONFIG_BAYCOM_SER_HDX=m CONFIG_BAYCOM_PAR=m CONFIG_BAYCOM_EPP=m CONFIG_SOUNDMODEM=m CONFIG_SOUNDMODEM_SBC=y CONFIG_SOUNDMODEM_WSS=y CONFIG_SOUNDMODEM_AFSK1200=y CONFIG_SOUNDMODEM_AFSK2400_7=y CONFIG_SOUNDMODEM_AFSK2400_8=y CONFIG_SOUNDMODEM_AFSK2666=y CONFIG_SOUNDMODEM_HAPN4800=y CONFIG_SOUNDMODEM_PSK4800=y CONFIG_SOUNDMODEM_FSK9600=y CONFIG_YAM=m # # IrDA (infrared) support # CONFIG_IRDA=m CONFIG_IRLAN=m CONFIG_IRNET=m CONFIG_IRCOMM=m CONFIG_IRDA_ULTRA=y CONFIG_IRDA_CACHE_LAST_LSAP=y CONFIG_IRDA_FAST_RR=y CONFIG_IRDA_DEBUG=y # # Infrared-port device drivers # CONFIG_IRTTY_SIR=m CONFIG_IRPORT_SIR=m CONFIG_DONGLE=y CONFIG_ESI_DONGLE=m CONFIG_ACTISYS_DONGLE=m CONFIG_TEKRAM_DONGLE=m CONFIG_GIRBIL_DONGLE=m CONFIG_LITELINK_DONGLE=m CONFIG_MCP2120_DONGLE=m CONFIG_OLD_BELKIN_DONGLE=m CONFIG_ACT200L_DONGLE=m CONFIG_MA600_DONGLE=m CONFIG_USB_IRDA=m CONFIG_NSC_FIR=m CONFIG_WINBOND_FIR=m # CONFIG_TOSHIBA_OLD is not set CONFIG_TOSHIBA_FIR=m CONFIG_SMC_IRCC_FIR=m CONFIG_ALI_FIR=m CONFIG_VLSI_FIR=m # # ISDN subsystem # CONFIG_ISDN=m CONFIG_ISDN_BOOL=y CONFIG_ISDN_PPP=y CONFIG_ISDN_PPP_VJ=y CONFIG_ISDN_MPP=y CONFIG_ISDN_PPP_BSDCOMP=m CONFIG_ISDN_AUDIO=y CONFIG_ISDN_TTY_FAX=y CONFIG_ISDN_X25=y # # ISDN feature submodules # CONFIG_ISDN_DRV_LOOP=m CONFIG_ISDN_DIVERSION=m # # Passive ISDN cards # CONFIG_ISDN_DRV_HISAX=m CONFIG_ISDN_HISAX=y CONFIG_HISAX_EURO=y CONFIG_DE_AOC=y # CONFIG_HISAX_NO_SENDCOMPLETE is not set # CONFIG_HISAX_NO_LLC is not set # CONFIG_HISAX_NO_KEYPAD is not set CONFIG_HISAX_1TR6=y CONFIG_HISAX_NI1=y CONFIG_HISAX_MAX_CARDS=8 CONFIG_HISAX_16_0=y CONFIG_HISAX_16_3=y CONFIG_HISAX_AVM_A1=y CONFIG_HISAX_IX1MICROR2=y CONFIG_HISAX_ASUSCOM=y CONFIG_HISAX_TELEINT=y CONFIG_HISAX_HFCS=y CONFIG_HISAX_SPORTSTER=y CONFIG_HISAX_MIC=y CONFIG_HISAX_ISURF=y CONFIG_HISAX_HSTSAPHIR=y CONFIG_HISAX_TELESPCI=y CONFIG_HISAX_S0BOX=y CONFIG_HISAX_FRITZPCI=y CONFIG_HISAX_AVM_A1_PCMCIA=y CONFIG_HISAX_ELSA=y CONFIG_HISAX_DIEHLDIVA=y CONFIG_HISAX_SEDLBAUER=y CONFIG_HISAX_NETJET=y CONFIG_HISAX_NETJET_U=y CONFIG_HISAX_NICCY=y CONFIG_HISAX_BKM_A4T=y CONFIG_HISAX_SCT_QUADRO=y CONFIG_HISAX_GAZEL=y CONFIG_HISAX_HFC_PCI=y CONFIG_HISAX_W6692=y CONFIG_HISAX_HFC_SX=y # CONFIG_HISAX_ENTERNOW_PCI is not set # CONFIG_HISAX_DEBUG is not set CONFIG_HISAX_SEDLBAUER_CS=m CONFIG_HISAX_ELSA_CS=m CONFIG_HISAX_AVM_A1_CS=m CONFIG_HISAX_ST5481=m CONFIG_HISAX_FRITZ_PCIPNP=m # CONFIG_USB_AUERISDN is not set # # Active ISDN cards # CONFIG_ISDN_DRV_ICN=m CONFIG_ISDN_DRV_PCBIT=m CONFIG_ISDN_DRV_SC=m CONFIG_ISDN_DRV_ACT2000=m CONFIG_ISDN_DRV_EICON=y CONFIG_ISDN_DRV_EICON_DIVAS=m CONFIG_ISDN_DRV_EICON_OLD=m CONFIG_ISDN_DRV_EICON_PCI=y CONFIG_ISDN_DRV_EICON_ISA=y CONFIG_ISDN_DRV_TPAM=m CONFIG_ISDN_CAPI=m CONFIG_ISDN_DRV_AVMB1_VERBOSE_REASON=y CONFIG_ISDN_CAPI_MIDDLEWARE=y CONFIG_ISDN_CAPI_CAPI20=m CONFIG_ISDN_CAPI_CAPIFS_BOOL=y CONFIG_ISDN_CAPI_CAPIFS=m CONFIG_ISDN_CAPI_CAPIDRV=m CONFIG_ISDN_DRV_AVMB1_B1ISA=m CONFIG_ISDN_DRV_AVMB1_B1PCI=m CONFIG_ISDN_DRV_AVMB1_B1PCIV4=y CONFIG_ISDN_DRV_AVMB1_T1ISA=m CONFIG_ISDN_DRV_AVMB1_B1PCMCIA=m CONFIG_ISDN_DRV_AVMB1_AVM_CS=m CONFIG_ISDN_DRV_AVMB1_T1PCI=m CONFIG_ISDN_DRV_AVMB1_C4=m CONFIG_HYSDN=m CONFIG_HYSDN_CAPI=y # # Old CD-ROM drivers (not SCSI, not IDE) # CONFIG_CD_NO_IDESCSI=y CONFIG_AZTCD=m CONFIG_GSCD=m CONFIG_SBPCD=m CONFIG_MCD=m CONFIG_MCD_IRQ=11 CONFIG_MCD_BASE=300 CONFIG_MCDX=m CONFIG_OPTCD=m CONFIG_CM206=m CONFIG_SJCD=m CONFIG_ISP16_CDI=m CONFIG_CDU31A=m CONFIG_CDU535=m # # Input core support # CONFIG_INPUT=m CONFIG_INPUT_KEYBDEV=m CONFIG_INPUT_MOUSEDEV=m CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_INPUT_JOYDEV=m CONFIG_INPUT_EVDEV=m # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_SERIAL_CONSOLE=y # CONFIG_SERIAL_EXTENDED is not set CONFIG_SERIAL_NONSTANDARD=y CONFIG_COMPUTONE=m CONFIG_ROCKETPORT=m CONFIG_CYCLADES=m CONFIG_CYZ_INTR=y CONFIG_DIGIEPCA=m CONFIG_ESPSERIAL=m CONFIG_MOXA_INTELLIO=m CONFIG_MOXA_SMARTIO=m CONFIG_ISI=m CONFIG_SYNCLINK=m CONFIG_SYNCLINKMP=m CONFIG_N_HDLC=m CONFIG_RISCOM8=m CONFIG_SPECIALIX=m # CONFIG_SPECIALIX_RTSCTS is not set CONFIG_SX=m CONFIG_RIO=m # CONFIG_RIO_OLDPCI is not set # CONFIG_STALDRV is not set CONFIG_UNIX98_PTYS=y CONFIG_UNIX98_PTY_COUNT=512 CONFIG_PRINTER=m CONFIG_LP_CONSOLE=y CONFIG_PPDEV=m # CONFIG_TIPAR is not set # # I2C support # CONFIG_I2C=m CONFIG_I2C_ALGOBIT=m CONFIG_I2C_PHILIPSPAR=m CONFIG_I2C_ELV=m CONFIG_I2C_VELLEMAN=m # CONFIG_SCx200_I2C is not set # CONFIG_SCx200_ACB is not set CONFIG_I2C_ALGOPCF=m CONFIG_I2C_ELEKTOR=m CONFIG_I2C_CHARDEV=m CONFIG_I2C_PROC=m # # Mice # CONFIG_BUSMOUSE=m CONFIG_ATIXL_BUSMOUSE=m CONFIG_LOGIBUSMOUSE=m CONFIG_MS_BUSMOUSE=m CONFIG_MOUSE=m CONFIG_PSMOUSE=y CONFIG_82C710_MOUSE=m CONFIG_PC110_PAD=m CONFIG_MK712_MOUSE=m # # Joysticks # CONFIG_INPUT_GAMEPORT=m CONFIG_INPUT_NS558=m CONFIG_INPUT_LIGHTNING=m CONFIG_INPUT_PCIGAME=m CONFIG_INPUT_CS461X=m CONFIG_INPUT_EMU10K1=m CONFIG_INPUT_SERIO=m CONFIG_INPUT_SERPORT=m CONFIG_INPUT_ANALOG=m CONFIG_INPUT_A3D=m CONFIG_INPUT_ADI=m CONFIG_INPUT_COBRA=m CONFIG_INPUT_GF2K=m CONFIG_INPUT_GRIP=m CONFIG_INPUT_INTERACT=m CONFIG_INPUT_TMDC=m CONFIG_INPUT_SIDEWINDER=m CONFIG_INPUT_IFORCE_USB=m CONFIG_INPUT_IFORCE_232=m CONFIG_INPUT_WARRIOR=m CONFIG_INPUT_MAGELLAN=m CONFIG_INPUT_SPACEORB=m CONFIG_INPUT_SPACEBALL=m CONFIG_INPUT_STINGER=m CONFIG_INPUT_DB9=m CONFIG_INPUT_GAMECON=m CONFIG_INPUT_TURBOGRAFX=m CONFIG_QIC02_TAPE=m CONFIG_QIC02_DYNCONF=y # CONFIG_IPMI_HANDLER is not set # CONFIG_IPMI_PANIC_EVENT is not set # CONFIG_IPMI_DEVICE_INTERFACE is not set # CONFIG_IPMI_KCS is not set # CONFIG_IPMI_WATCHDOG is not set # # Watchdog Cards # CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set CONFIG_ACQUIRE_WDT=m CONFIG_ADVANTECH_WDT=m # CONFIG_ALIM1535_WDT is not set CONFIG_ALIM7101_WDT=m CONFIG_SC520_WDT=m CONFIG_PCWATCHDOG=m CONFIG_EUROTECH_WDT=m CONFIG_IB700_WDT=m CONFIG_WAFER_WDT=m CONFIG_I810_TCO=m CONFIG_MIXCOMWD=m CONFIG_60XX_WDT=m CONFIG_SC1200_WDT=m # CONFIG_SCx200_WDT is not set CONFIG_SOFT_WATCHDOG=m CONFIG_W83877F_WDT=m CONFIG_WDT=m CONFIG_WDTPCI=m CONFIG_WDT_501=y # CONFIG_WDT_501_FAN is not set CONFIG_MACHZ_WDT=m CONFIG_AMD7XX_TCO=m # CONFIG_SCx200_GPIO is not set CONFIG_AMD_RNG=m CONFIG_INTEL_RNG=m CONFIG_AMD_PM768=m CONFIG_NVRAM=m CONFIG_RTC=m CONFIG_DTLK=m CONFIG_R3964=m CONFIG_APPLICOM=m CONFIG_SONYPI=m # # Ftape, the floppy tape device driver # CONFIG_FTAPE=m CONFIG_ZFTAPE=m CONFIG_ZFT_DFLT_BLK_SZ=10240 CONFIG_ZFT_COMPRESSOR=m CONFIG_FT_NR_BUFFERS=3 CONFIG_FT_PROC_FS=y CONFIG_FT_NORMAL_DEBUG=y # CONFIG_FT_FULL_DEBUG is not set # CONFIG_FT_NO_TRACE is not set # CONFIG_FT_NO_TRACE_AT_ALL is not set CONFIG_FT_STD_FDC=y # CONFIG_FT_MACH2 is not set # CONFIG_FT_PROBE_FC10 is not set # CONFIG_FT_ALT_FDC is not set CONFIG_FT_FDC_THR=8 CONFIG_FT_FDC_MAX_RATE=2000 CONFIG_FT_ALPHA_CLOCK=0 CONFIG_AGP=m CONFIG_AGP_INTEL=y CONFIG_AGP_I810=y CONFIG_AGP_VIA=y CONFIG_AGP_AMD=y CONFIG_AGP_AMD_8151=y CONFIG_AGP_SIS=y CONFIG_AGP_ALI=y CONFIG_AGP_SWORKS=y CONFIG_DRM=y # CONFIG_DRM_OLD is not set CONFIG_DRM_NEW=y CONFIG_DRM_TDFX=m CONFIG_DRM_R128=m CONFIG_DRM_RADEON=m CONFIG_DRM_I810=m # CONFIG_DRM_I810_XFREE_41 is not set CONFIG_DRM_I830=m CONFIG_DRM_MGA=m CONFIG_DRM_SIS=m # # PCMCIA character devices # CONFIG_PCMCIA_SERIAL_CS=m CONFIG_SYNCLINK_CS=m CONFIG_MWAVE=m # # Multimedia devices # CONFIG_VIDEO_DEV=m # # Video For Linux # CONFIG_VIDEO_PROC_FS=y CONFIG_I2C_PARPORT=m CONFIG_VIDEO_BT848=m CONFIG_VIDEO_PMS=m CONFIG_VIDEO_BWQCAM=m CONFIG_VIDEO_CQCAM=m CONFIG_VIDEO_W9966=m CONFIG_VIDEO_CPIA=m CONFIG_VIDEO_CPIA_PP=m CONFIG_VIDEO_CPIA_USB=m CONFIG_VIDEO_SAA5249=m CONFIG_TUNER_3036=m CONFIG_VIDEO_STRADIS=m CONFIG_VIDEO_ZORAN=m CONFIG_VIDEO_ZORAN_BUZ=m CONFIG_VIDEO_ZORAN_DC10=m CONFIG_VIDEO_ZORAN_LML33=m CONFIG_VIDEO_ZR36120=m CONFIG_VIDEO_MEYE=m # # Radio Adapters # CONFIG_RADIO_CADET=m CONFIG_RADIO_RTRACK=m CONFIG_RADIO_RTRACK2=m CONFIG_RADIO_AZTECH=m CONFIG_RADIO_GEMTEK=m CONFIG_RADIO_GEMTEK_PCI=m CONFIG_RADIO_MAXIRADIO=m CONFIG_RADIO_MAESTRO=m CONFIG_RADIO_MIROPCM20=m CONFIG_RADIO_MIROPCM20_RDS=m CONFIG_RADIO_SF16FMI=m # CONFIG_RADIO_SF16FMR2 is not set CONFIG_RADIO_TERRATEC=m CONFIG_RADIO_TRUST=m CONFIG_RADIO_TYPHOON=m CONFIG_RADIO_TYPHOON_PROC_FS=y CONFIG_RADIO_ZOLTRIX=m # # File systems # CONFIG_QUOTA=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m CONFIG_REISERFS_FS=m # CONFIG_REISERFS_CHECK is not set CONFIG_REISERFS_PROC_INFO=y CONFIG_ADFS_FS=m CONFIG_ADFS_FS_RW=y CONFIG_AFFS_FS=m CONFIG_HFS_FS=m CONFIG_BEFS_FS=m # CONFIG_BEFS_DEBUG is not set CONFIG_BFS_FS=m CONFIG_EXT3_FS=m CONFIG_JBD=m # CONFIG_JBD_DEBUG is not set CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_UMSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_EFS_FS=m CONFIG_JFFS_FS=m CONFIG_JFFS_FS_VERBOSE=0 CONFIG_JFFS_PROC_FS=y CONFIG_JFFS2_FS=m CONFIG_JFFS2_FS_DEBUG=0 CONFIG_CRAMFS=m CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_ISO9660_FS=m CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_JFS_FS=m # CONFIG_JFS_DEBUG is not set CONFIG_JFS_STATISTICS=y CONFIG_MINIX_FS=m CONFIG_VXFS_FS=m CONFIG_NTFS_FS=m CONFIG_NTFS_RW=y CONFIG_HPFS_FS=m CONFIG_PROC_FS=y CONFIG_DEVFS_FS=y # CONFIG_DEVFS_MOUNT is not set # CONFIG_DEVFS_DEBUG is not set CONFIG_DEVPTS_FS=y CONFIG_QNX4FS_FS=m CONFIG_QNX4FS_RW=y CONFIG_ROMFS_FS=y CONFIG_EXT2_FS=m CONFIG_SYSV_FS=m CONFIG_UDF_FS=m CONFIG_UDF_RW=y CONFIG_UFS_FS=m CONFIG_UFS_FS_WRITE=y # # Network File Systems # CONFIG_CODA_FS=m CONFIG_INTERMEZZO_FS=m CONFIG_NFS_FS=m CONFIG_NFS_V3=y # CONFIG_ROOT_NFS is not set CONFIG_NFSD=m CONFIG_NFSD_V3=y CONFIG_NFSD_TCP=y CONFIG_SUNRPC=m CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_SMB_FS=m CONFIG_SMB_NLS_DEFAULT=y CONFIG_SMB_NLS_REMOTE="cp437" CONFIG_NCP_FS=m CONFIG_NCPFS_PACKET_SIGNING=y CONFIG_NCPFS_IOCTL_LOCKING=y CONFIG_NCPFS_STRONG=y CONFIG_NCPFS_NFS_NS=y CONFIG_NCPFS_OS2_NS=y CONFIG_NCPFS_SMALLDOS=y CONFIG_NCPFS_NLS=y CONFIG_NCPFS_EXTRAS=y CONFIG_ZISOFS_FS=m # # Partition Types # CONFIG_PARTITION_ADVANCED=y CONFIG_ACORN_PARTITION=y CONFIG_ACORN_PARTITION_ICS=y CONFIG_ACORN_PARTITION_ADFS=y # CONFIG_ACORN_PARTITION_POWERTEC is not set CONFIG_ACORN_PARTITION_RISCIX=y CONFIG_OSF_PARTITION=y CONFIG_AMIGA_PARTITION=y CONFIG_ATARI_PARTITION=y CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y CONFIG_MINIX_SUBPARTITION=y CONFIG_SOLARIS_X86_PARTITION=y CONFIG_UNIXWARE_DISKLABEL=y CONFIG_LDM_PARTITION=y # CONFIG_LDM_DEBUG is not set CONFIG_SGI_PARTITION=y CONFIG_ULTRIX_PARTITION=y CONFIG_SUN_PARTITION=y CONFIG_EFI_PARTITION=y CONFIG_SMB_NLS=y CONFIG_NLS=y # # Native Language Support # CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_CODEPAGE_737=m CONFIG_NLS_CODEPAGE_775=m CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_CODEPAGE_852=m CONFIG_NLS_CODEPAGE_855=m CONFIG_NLS_CODEPAGE_857=m CONFIG_NLS_CODEPAGE_860=m CONFIG_NLS_CODEPAGE_861=m CONFIG_NLS_CODEPAGE_862=m CONFIG_NLS_CODEPAGE_863=m CONFIG_NLS_CODEPAGE_864=m CONFIG_NLS_CODEPAGE_865=m CONFIG_NLS_CODEPAGE_866=m CONFIG_NLS_CODEPAGE_869=m CONFIG_NLS_CODEPAGE_936=m CONFIG_NLS_CODEPAGE_950=m CONFIG_NLS_CODEPAGE_932=m CONFIG_NLS_CODEPAGE_949=m CONFIG_NLS_CODEPAGE_874=m CONFIG_NLS_ISO8859_8=m CONFIG_NLS_CODEPAGE_1250=m CONFIG_NLS_CODEPAGE_1251=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_ISO8859_2=m CONFIG_NLS_ISO8859_3=m CONFIG_NLS_ISO8859_4=m CONFIG_NLS_ISO8859_5=m CONFIG_NLS_ISO8859_6=m CONFIG_NLS_ISO8859_7=m CONFIG_NLS_ISO8859_9=m CONFIG_NLS_ISO8859_13=m CONFIG_NLS_ISO8859_14=m CONFIG_NLS_ISO8859_15=m CONFIG_NLS_KOI8_R=m CONFIG_NLS_KOI8_U=m CONFIG_NLS_UTF8=m # # Console drivers # CONFIG_VGA_CONSOLE=y CONFIG_VIDEO_SELECT=y CONFIG_MDA_CONSOLE=m # # Frame-buffer support # CONFIG_FB=y CONFIG_DUMMY_CONSOLE=y CONFIG_FB_RIVA=m CONFIG_FB_CLGEN=m CONFIG_FB_PM2=m # CONFIG_FB_PM2_FIFO_DISCONNECT is not set # CONFIG_FB_PM2_PCI is not set CONFIG_FB_PM3=m CONFIG_FB_CYBER2000=m CONFIG_FB_VESA=y CONFIG_FB_VGA16=m CONFIG_FB_HGA=m CONFIG_VIDEO_SELECT=y CONFIG_FB_MATROX=m CONFIG_FB_MATROX_MILLENIUM=y CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G450=m CONFIG_FB_MATROX_I2C=m CONFIG_FB_MATROX_MAVEN=m # CONFIG_FB_MATROX_PROC is not set CONFIG_FB_MATROX_MULTIHEAD=y CONFIG_FB_ATY=m CONFIG_FB_ATY_GX=y CONFIG_FB_ATY_CT=y CONFIG_FB_RADEON=m CONFIG_FB_ATY128=m # CONFIG_FB_INTEL is not set CONFIG_FB_SIS=m CONFIG_FB_SIS_300=y CONFIG_FB_SIS_315=y CONFIG_FB_NEOMAGIC=m CONFIG_FB_3DFX=m CONFIG_FB_VOODOO1=m CONFIG_FB_TRIDENT=m CONFIG_FB_VIRTUAL=m CONFIG_FBCON_ADVANCED=y CONFIG_FBCON_MFB=m CONFIG_FBCON_CFB2=m CONFIG_FBCON_CFB4=m CONFIG_FBCON_CFB8=y CONFIG_FBCON_CFB16=y CONFIG_FBCON_CFB24=y CONFIG_FBCON_CFB32=m # CONFIG_FBCON_AFB is not set # CONFIG_FBCON_ILBM is not set # CONFIG_FBCON_IPLAN2P2 is not set # CONFIG_FBCON_IPLAN2P4 is not set # CONFIG_FBCON_IPLAN2P8 is not set # CONFIG_FBCON_MAC is not set CONFIG_FBCON_VGA_PLANES=m CONFIG_FBCON_VGA=m CONFIG_FBCON_HGA=m # CONFIG_FBCON_FONTWIDTH8_ONLY is not set CONFIG_FBCON_FONTS=y CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # CONFIG_FONT_SUN8x16 is not set # CONFIG_FONT_SUN12x22 is not set # CONFIG_FONT_6x11 is not set # CONFIG_FONT_PEARL_8x8 is not set # CONFIG_FONT_ACORN_8x8 is not set # # Sound # CONFIG_SOUND=m CONFIG_SOUND_ALI5455=m CONFIG_SOUND_BT878=m CONFIG_SOUND_CMPCI=m CONFIG_SOUND_CMPCI_FM=y CONFIG_SOUND_CMPCI_FMIO=388 CONFIG_SOUND_CMPCI_FMIO=388 CONFIG_SOUND_CMPCI_MIDI=y CONFIG_SOUND_CMPCI_MPUIO=330 CONFIG_SOUND_CMPCI_JOYSTICK=y CONFIG_SOUND_CMPCI_CM8738=y CONFIG_SOUND_CMPCI_SPDIFINVERSE=y # CONFIG_SOUND_CMPCI_SPDIFLOOP is not set CONFIG_SOUND_CMPCI_SPEAKERS=2 CONFIG_SOUND_EMU10K1=m CONFIG_MIDI_EMU10K1=y CONFIG_SOUND_FUSION=m CONFIG_SOUND_CS4281=m CONFIG_SOUND_ES1370=m CONFIG_SOUND_ES1371=m CONFIG_SOUND_ESSSOLO1=m CONFIG_SOUND_MAESTRO=m CONFIG_SOUND_MAESTRO3=m CONFIG_SOUND_FORTE=m CONFIG_SOUND_ICH=m CONFIG_SOUND_RME96XX=m CONFIG_SOUND_SONICVIBES=m CONFIG_SOUND_TRIDENT=m CONFIG_SOUND_MSNDCLAS=m # CONFIG_MSNDCLAS_HAVE_BOOT is not set CONFIG_MSNDCLAS_INIT_FILE="/etc/sound/msndinit.bin" CONFIG_MSNDCLAS_PERM_FILE="/etc/sound/msndperm.bin" CONFIG_SOUND_MSNDPIN=m # CONFIG_MSNDPIN_HAVE_BOOT is not set CONFIG_MSNDPIN_INIT_FILE="/etc/sound/pndspini.bin" CONFIG_MSNDPIN_PERM_FILE="/etc/sound/pndsperm.bin" CONFIG_SOUND_VIA82CXXX=m CONFIG_MIDI_VIA82CXXX=y CONFIG_SOUND_OSS=m # CONFIG_SOUND_TRACEINIT is not set # CONFIG_SOUND_DMAP is not set CONFIG_SOUND_AD1816=m # CONFIG_SOUND_AD1889 is not set CONFIG_SOUND_SGALAXY=m CONFIG_SOUND_ADLIB=m CONFIG_SOUND_ACI_MIXER=m CONFIG_SOUND_CS4232=m CONFIG_SOUND_SSCAPE=m CONFIG_SOUND_GUS=m # CONFIG_SOUND_GUS16 is not set # CONFIG_SOUND_GUSMAX is not set CONFIG_SOUND_VMIDI=m CONFIG_SOUND_TRIX=m CONFIG_SOUND_MSS=m CONFIG_SOUND_MPU401=m CONFIG_SOUND_NM256=m CONFIG_SOUND_MAD16=m CONFIG_MAD16_OLDCARD=y CONFIG_SOUND_PAS=m # CONFIG_PAS_JOYSTICK is not set CONFIG_SOUND_PSS=m # CONFIG_PSS_MIXER is not set # CONFIG_PSS_HAVE_BOOT is not set CONFIG_SOUND_SB=m CONFIG_SOUND_AWE32_SYNTH=m # CONFIG_SOUND_KAHLUA is not set CONFIG_SOUND_WAVEFRONT=m CONFIG_SOUND_MAUI=m CONFIG_SOUND_YM3812=m CONFIG_SOUND_OPL3SA1=m CONFIG_SOUND_OPL3SA2=m CONFIG_SOUND_YMFPCI=m # CONFIG_SOUND_YMFPCI_LEGACY is not set CONFIG_SOUND_UART6850=m CONFIG_SOUND_AEDSP16=m CONFIG_SC6600=y # CONFIG_SC6600_JOY is not set CONFIG_SC6600_CDROM=4 CONFIG_SC6600_CDROMBASE=0 CONFIG_AEDSP16_SBPRO=y CONFIG_AEDSP16_MPU401=y CONFIG_SOUND_TVMIXER=m # # USB support # CONFIG_USB=m CONFIG_USB_DEBUG=y CONFIG_USB_DEVICEFS=y CONFIG_USB_BANDWIDTH=y CONFIG_USB_EHCI_HCD=m CONFIG_USB_UHCI=m CONFIG_USB_UHCI_ALT=m CONFIG_USB_OHCI=m CONFIG_USB_AUDIO=m CONFIG_USB_EMI26=m CONFIG_USB_MIDI=m CONFIG_USB_STORAGE=m CONFIG_USB_STORAGE_DEBUG=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y CONFIG_USB_STORAGE_HP8200e=y CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y CONFIG_USB_ACM=m CONFIG_USB_PRINTER=m CONFIG_USB_HID=m CONFIG_USB_HIDINPUT=y CONFIG_USB_HIDDEV=y CONFIG_USB_KBD=m CONFIG_USB_MOUSE=m CONFIG_USB_AIPTEK=m CONFIG_USB_WACOM=m # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set CONFIG_USB_DC2XX=m CONFIG_USB_MDC800=m CONFIG_USB_SCANNER=m CONFIG_USB_MICROTEK=m CONFIG_USB_HPUSBSCSI=m CONFIG_USB_IBMCAM=m # CONFIG_USB_KONICAWC is not set CONFIG_USB_OV511=m CONFIG_USB_PWC=m CONFIG_USB_SE401=m CONFIG_USB_STV680=m CONFIG_USB_VICAM=m CONFIG_USB_DSBR=m CONFIG_USB_DABUSB=m CONFIG_USB_PEGASUS=m CONFIG_USB_RTL8150=m CONFIG_USB_KAWETH=m CONFIG_USB_CATC=m CONFIG_USB_CDCETHER=m CONFIG_USB_USBNET=m CONFIG_USB_USS720=m # # USB Serial Converter support # CONFIG_USB_SERIAL=m # CONFIG_USB_SERIAL_DEBUG is not set CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_WHITEHEAT=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m CONFIG_USB_SERIAL_EMPEG=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_VISOR=m CONFIG_USB_SERIAL_IPAQ=m CONFIG_USB_SERIAL_IR=m CONFIG_USB_SERIAL_EDGEPORT=m CONFIG_USB_SERIAL_EDGEPORT_TI=m CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m CONFIG_USB_SERIAL_KEYSPAN_USA28=y CONFIG_USB_SERIAL_KEYSPAN_USA28X=y CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y CONFIG_USB_SERIAL_KEYSPAN_USA19=y CONFIG_USB_SERIAL_KEYSPAN_USA18X=y CONFIG_USB_SERIAL_KEYSPAN_USA19W=y CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y # CONFIG_USB_SERIAL_KEYSPAN_MPR is not set CONFIG_USB_SERIAL_KEYSPAN_USA49W=y # CONFIG_USB_SERIAL_KEYSPAN_USA49WLC is not set CONFIG_USB_SERIAL_MCT_U232=m CONFIG_USB_SERIAL_KLSI=m # CONFIG_USB_SERIAL_KOBIL_SCT is not set CONFIG_USB_SERIAL_PL2303=m CONFIG_USB_SERIAL_CYBERJACK=m CONFIG_USB_SERIAL_XIRCOM=m CONFIG_USB_SERIAL_OMNINET=m CONFIG_USB_RIO500=m CONFIG_USB_AUERSWALD=m CONFIG_USB_TIGL=m CONFIG_USB_BRLVGER=m CONFIG_USB_LCD=m # # Bluetooth support # CONFIG_BLUEZ=m CONFIG_BLUEZ_L2CAP=m CONFIG_BLUEZ_SCO=m # CONFIG_BLUEZ_RFCOMM is not set CONFIG_BLUEZ_BNEP=m # CONFIG_BLUEZ_BNEP_MC_FILTER is not set # CONFIG_BLUEZ_BNEP_PROTO_FILTER is not set # # Bluetooth device drivers # CONFIG_BLUEZ_HCIUSB=m # CONFIG_BLUEZ_USB_SCO is not set CONFIG_BLUEZ_USB_ZERO_PACKET=y CONFIG_BLUEZ_HCIUART=m CONFIG_BLUEZ_HCIUART_H4=y # CONFIG_BLUEZ_HCIUART_BCSP is not set # CONFIG_BLUEZ_HCIUART_BCSP_TXCRC is not set CONFIG_BLUEZ_HCIDTL1=m CONFIG_BLUEZ_HCIBT3C=m CONFIG_BLUEZ_HCIBLUECARD=m # CONFIG_BLUEZ_HCIBTUART is not set CONFIG_BLUEZ_HCIVHCI=m # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_HIGHMEM is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_IOVIRT is not set CONFIG_MAGIC_SYSRQ=y # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_FRAME_POINTER is not set # # Library routines # CONFIG_ZLIB_INFLATE=m CONFIG_ZLIB_DEFLATE=m ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-05-29 19:56 ` Krzysiek Taraszka @ 2003-05-29 20:18 ` Krzysiek Taraszka 2003-06-04 18:17 ` Marcelo Tosatti 0 siblings, 1 reply; 114+ messages in thread From: Krzysiek Taraszka @ 2003-05-29 20:18 UTC (permalink / raw) To: Marcelo Tosatti, Georg Nikodym; +Cc: lkml Dnia czw 29. maja 2003 21:56, Krzysiek Taraszka napisał: > Dnia czw 29. maja 2003 21:11, Marcelo Tosatti napisał: > > On Thu, 29 May 2003, Georg Nikodym wrote: > > > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > > > > > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > > > > fix for the IO stalls/deadlocks. > > > > > > While others may be dubious about the efficacy of this patch, I've been > > > running -rc6 on my laptop now since sometime last night and have seen > > > nothing odd. > > > > > > In case anybody cares, I'm using both ide and a ieee1394 (for a large > > > external drive [which implies scsi]) and I do a _lot_ of big work with > > > BK so I was seeing the problem within hours previously. > > > > Great! > > > > -rc7 will have to be released due to some problems :( > > hmm, seems to ide modules and others are broken. Im looking for reason why hmm, for IDE subsystem the ide-proc.o was't made for CONFIG_BLK_DEV_IDE=m ... anyone goes to fix it ? or shall I prepare and send here my own patch ? -- Krzysiek Taraszka (dzimi@pld.org.pl) http://cyborg.kernel.pl/~dzimi/ ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-05-29 20:18 ` Krzysiek Taraszka @ 2003-06-04 18:17 ` Marcelo Tosatti 2003-06-04 21:41 ` Krzysiek Taraszka 0 siblings, 1 reply; 114+ messages in thread From: Marcelo Tosatti @ 2003-06-04 18:17 UTC (permalink / raw) To: Krzysiek Taraszka; +Cc: Georg Nikodym, lkml On Thu, 29 May 2003, Krzysiek Taraszka wrote: > Dnia czw 29. maja 2003 21:56, Krzysiek Taraszka napisa?: > > Dnia czw 29. maja 2003 21:11, Marcelo Tosatti napisa?: > > > On Thu, 29 May 2003, Georg Nikodym wrote: > > > > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > > > > > > > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > > > > > fix for the IO stalls/deadlocks. > > > > > > > > While others may be dubious about the efficacy of this patch, I've been > > > > running -rc6 on my laptop now since sometime last night and have seen > > > > nothing odd. > > > > > > > > In case anybody cares, I'm using both ide and a ieee1394 (for a large > > > > external drive [which implies scsi]) and I do a _lot_ of big work with > > > > BK so I was seeing the problem within hours previously. > > > > > > Great! > > > > > > -rc7 will have to be released due to some problems :( > > > > hmm, seems to ide modules and others are broken. Im looking for reason why > > hmm, for IDE subsystem the ide-proc.o was't made for CONFIG_BLK_DEV_IDE=m ... > anyone goes to fix it ? or shall I prepare and send here my own patch ? Feel free to send your own patch, please :) ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 18:17 ` Marcelo Tosatti @ 2003-06-04 21:41 ` Krzysiek Taraszka 2003-06-04 22:37 ` Alan Cox 0 siblings, 1 reply; 114+ messages in thread From: Krzysiek Taraszka @ 2003-06-04 21:41 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Georg Nikodym, lkml Dnia Wednesday 04 of June 2003 20:17, Marcelo Tosatti napisał: > On Thu, 29 May 2003, Krzysiek Taraszka wrote: > > Dnia czw 29. maja 2003 21:56, Krzysiek Taraszka napisa?: > > > Dnia czw 29. maja 2003 21:11, Marcelo Tosatti napisa?: > > > > On Thu, 29 May 2003, Georg Nikodym wrote: > > > > > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > > > > > > > > > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try > > > > > > Andrew's fix for the IO stalls/deadlocks. > > > > > > > > > > While others may be dubious about the efficacy of this patch, I've > > > > > been running -rc6 on my laptop now since sometime last night and > > > > > have seen nothing odd. > > > > > > > > > > In case anybody cares, I'm using both ide and a ieee1394 (for a > > > > > large external drive [which implies scsi]) and I do a _lot_ of big > > > > > work with BK so I was seeing the problem within hours previously. > > > > > > > > Great! > > > > > > > > -rc7 will have to be released due to some problems :( > > > > > > hmm, seems to ide modules and others are broken. Im looking for reason > > > why > > > > hmm, for IDE subsystem the ide-proc.o was't made for CONFIG_BLK_DEV_IDE=m > > ... anyone goes to fix it ? or shall I prepare and send here my own patch > > ? > > Feel free to send your own patch, please :) Hm, I send it few days ago (replay to Andrzej Krzysztofowicz post (sth with -rc3 in subject :)) with another fixes but without cmd640 fixes. Alan made almoust the same changes but him ac tree still have got broken cmd640 modular driver (cmd640_vlb still is unresolved). I tried hack it .. but I droped it ... maybe tomorrow i back to that code ... or someone goes to fix it (maybe Alan ?) -- Krzysiek Taraszka (dzimi@pld.org.pl) http://cyborg.kernel.pl/~dzimi/ ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 21:41 ` Krzysiek Taraszka @ 2003-06-04 22:37 ` Alan Cox 0 siblings, 0 replies; 114+ messages in thread From: Alan Cox @ 2003-06-04 22:37 UTC (permalink / raw) To: Krzysiek Taraszka; +Cc: Marcelo Tosatti, Georg Nikodym, lkml On Mer, 2003-06-04 at 22:41, Krzysiek Taraszka wrote: > -rc3 in subject :)) with another fixes but without cmd640 fixes. > Alan made almoust the same changes but him ac tree still have got broken > cmd640 modular driver (cmd640_vlb still is unresolved). > I tried hack it .. but I droped it ... maybe tomorrow i back to that code ... > or someone goes to fix it (maybe Alan ?) cmd640_vlb is gone from the core code in the -ac tree so that suprises me. Adrian Bunk sent me some more patches to look at. I'm not 100% convinced by them but there are a few cases left and some of his stuff certainly fixes real problems ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-05-29 19:11 ` -rc7 " Marcelo Tosatti 2003-05-29 19:56 ` Krzysiek Taraszka @ 2003-06-04 10:22 ` Andrea Arcangeli 2003-06-04 10:35 ` Marc-Christian Petersen 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-04 10:22 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Georg Nikodym, lkml On Thu, May 29, 2003 at 04:11:12PM -0300, Marcelo Tosatti wrote: > > > On Thu, 29 May 2003, Georg Nikodym wrote: > > > On Wed, 28 May 2003 21:55:39 -0300 (BRT) > > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's > > > fix for the IO stalls/deadlocks. > > > > While others may be dubious about the efficacy of this patch, I've been > > running -rc6 on my laptop now since sometime last night and have seen > > nothing odd. > > > > In case anybody cares, I'm using both ide and a ieee1394 (for a large > > external drive [which implies scsi]) and I do a _lot_ of big work with > > BK so I was seeing the problem within hours previously. > > Great! are you really sure that it is the right fix? I mean, the batching has a basic problem (I was discussing it with Jens two days ago and he said he's already addressed in 2.5, I wonder if that could also have an influence on the fact 2.5 is so much better in fariness) the issue with batching in 2.4, is that it is blocking at 0 and waking at batch_requests. But it's not blocking new get_request to eat requests in the way back from 0 to batch_requests. I mean, there are two directions, when we move from batch_requests to 0 get_requests should return requests. in the way back from 0 to batch_requests the get_request should block (and it doesn't in 2.4, that is the problem) > > -rc7 will have to be released due to some problems :( > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:22 ` Andrea Arcangeli @ 2003-06-04 10:35 ` Marc-Christian Petersen 2003-06-04 10:42 ` Jens Axboe 2003-06-04 10:43 ` -rc7 Re: Linux 2.4.21-rc6 Andrea Arcangeli 0 siblings, 2 replies; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-04 10:35 UTC (permalink / raw) To: Andrea Arcangeli, Marcelo Tosatti; +Cc: Georg Nikodym, lkml On Wednesday 04 June 2003 12:22, Andrea Arcangeli wrote: Hi Andrea, > are you really sure that it is the right fix? > I mean, the batching has a basic problem (I was discussing it with Jens > two days ago and he said he's already addressed in 2.5, I wonder if that > could also have an influence on the fact 2.5 is so much better in > fariness) > the issue with batching in 2.4, is that it is blocking at 0 and waking > at batch_requests. But it's not blocking new get_request to eat requests > in the way back from 0 to batch_requests. I mean, there are two > directions, when we move from batch_requests to 0 get_requests should > return requests. in the way back from 0 to batch_requests the > get_request should block (and it doesn't in 2.4, that is the problem) do you see a chance to fix this up in 2.4? ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:35 ` Marc-Christian Petersen @ 2003-06-04 10:42 ` Jens Axboe 2003-06-04 10:46 ` Marc-Christian Petersen 2003-06-04 10:43 ` -rc7 Re: Linux 2.4.21-rc6 Andrea Arcangeli 1 sibling, 1 reply; 114+ messages in thread From: Jens Axboe @ 2003-06-04 10:42 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrea Arcangeli, Marcelo Tosatti, Georg Nikodym, lkml On Wed, Jun 04 2003, Marc-Christian Petersen wrote: > On Wednesday 04 June 2003 12:22, Andrea Arcangeli wrote: > > Hi Andrea, > > > are you really sure that it is the right fix? > > I mean, the batching has a basic problem (I was discussing it with Jens > > two days ago and he said he's already addressed in 2.5, I wonder if that > > could also have an influence on the fact 2.5 is so much better in > > fariness) > > the issue with batching in 2.4, is that it is blocking at 0 and waking > > at batch_requests. But it's not blocking new get_request to eat requests > > in the way back from 0 to batch_requests. I mean, there are two > > directions, when we move from batch_requests to 0 get_requests should > > return requests. in the way back from 0 to batch_requests the > > get_request should block (and it doesn't in 2.4, that is the problem) > do you see a chance to fix this up in 2.4? Nick posted a patch to do so the other day and asked people to test. -- Jens Axboe ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:42 ` Jens Axboe @ 2003-06-04 10:46 ` Marc-Christian Petersen 2003-06-04 10:48 ` Andrea Arcangeli 0 siblings, 1 reply; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-04 10:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrea Arcangeli, Marcelo Tosatti, Georg Nikodym, lkml On Wednesday 04 June 2003 12:42, Jens Axboe wrote: Hi Jens, > > > the issue with batching in 2.4, is that it is blocking at 0 and waking > > > at batch_requests. But it's not blocking new get_request to eat > > > requests in the way back from 0 to batch_requests. I mean, there are > > > two directions, when we move from batch_requests to 0 get_requests > > > should return requests. in the way back from 0 to batch_requests the > > > get_request should block (and it doesn't in 2.4, that is the problem) > > do you see a chance to fix this up in 2.4? > Nick posted a patch to do so the other day and asked people to test. Silly mcp. His mail was CC'ed to me :( ... F*ck huge inbox. ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:46 ` Marc-Christian Petersen @ 2003-06-04 10:48 ` Andrea Arcangeli 2003-06-04 11:57 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-04 10:48 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml On Wed, Jun 04, 2003 at 12:46:33PM +0200, Marc-Christian Petersen wrote: > On Wednesday 04 June 2003 12:42, Jens Axboe wrote: > > Hi Jens, > > > > > the issue with batching in 2.4, is that it is blocking at 0 and waking > > > > at batch_requests. But it's not blocking new get_request to eat > > > > requests in the way back from 0 to batch_requests. I mean, there are > > > > two directions, when we move from batch_requests to 0 get_requests > > > > should return requests. in the way back from 0 to batch_requests the > > > > get_request should block (and it doesn't in 2.4, that is the problem) > > > do you see a chance to fix this up in 2.4? > > Nick posted a patch to do so the other day and asked people to test. > Silly mcp. His mail was CC'ed to me :( ... F*ck huge inbox. I was probably not CC'ed, I'll search for the email (and I was travelling the last few days so I didn't read every single l-k email yet sorry ;) Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:48 ` Andrea Arcangeli @ 2003-06-04 11:57 ` Nick Piggin 2003-06-04 12:00 ` Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-04 11:57 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 2100 bytes --] Andrea Arcangeli wrote: >On Wed, Jun 04, 2003 at 12:46:33PM +0200, Marc-Christian Petersen wrote: > >>On Wednesday 04 June 2003 12:42, Jens Axboe wrote: >> >>Hi Jens, >> >> >>>>>the issue with batching in 2.4, is that it is blocking at 0 and waking >>>>>at batch_requests. But it's not blocking new get_request to eat >>>>>requests in the way back from 0 to batch_requests. I mean, there are >>>>>two directions, when we move from batch_requests to 0 get_requests >>>>>should return requests. in the way back from 0 to batch_requests the >>>>>get_request should block (and it doesn't in 2.4, that is the problem) >>>>> >>>>do you see a chance to fix this up in 2.4? >>>> >>>Nick posted a patch to do so the other day and asked people to test. >>> >>Silly mcp. His mail was CC'ed to me :( ... F*ck huge inbox. >> > >I was probably not CC'ed, I'll search for the email (and I was >travelling the last few days so I didn't read every single l-k email yet >sorry ;) > > The patch I sent is actually against 2.4.20, contrary to my babling. Reports I have had say it helps, but maybe not so much as Andrew'ss fixes. Then Matthias Mueller ported my patch to 2.4.21-rc6 which include Andrew's fixes. It seems that they might be fixing two different problems. It looks promising though. My patch would not affect read IO throughput for a smallish number of readers because the queue should never fill up. > 1 writer or a lot of readers could see some throughput drop due to the patch causing the queue to be more FIFO at high loads. I have attached the patch again. Its against 2.4.20. Nick Matthias Mueller wrote: >Currently I'm running 2.4.21-rc6 with your patch and the patch from Andrew >and it looks very promising. Both patches seem to address two different >problems, combined I can have 2 parallel dds running and play music with >xmms and notice no sounddrops (actually i had one, but that was during >very high cpu load). Your patch seems to lower IO-throughput, but I >haven't tested this, so no real numbers, just my personal feelings and >the numbers 'time dd ...' gave me. > > [-- Attachment #2: blk-fair-batches-24 --] [-- Type: text/plain, Size: 2612 bytes --] --- linux-2.4/include/linux/blkdev.h.orig 2003-06-02 21:59:06.000000000 +1000 +++ linux-2.4/include/linux/blkdev.h 2003-06-02 22:39:57.000000000 +1000 @@ -118,13 +118,21 @@ struct request_queue /* * Boolean that indicates whether this queue is plugged or not. */ - char plugged; + int plugged:1; /* * Boolean that indicates whether current_request is active or * not. */ - char head_active; + int head_active:1; + + /* + * Booleans that indicate whether the queue's free requests have + * been exhausted and is waiting to drop below the batch_requests + * threshold + */ + int read_full:1; + int write_full:1; unsigned long bounce_pfn; @@ -140,6 +148,30 @@ struct request_queue wait_queue_head_t wait_for_requests[2]; }; +static inline void set_queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + q->read_full = 1; + else + q->write_full = 1; +} + +static inline void clear_queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + q->read_full = 0; + else + q->write_full = 0; +} + +static inline int queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + return q->read_full; + else + return q->write_full; +} + extern unsigned long blk_max_low_pfn, blk_max_pfn; #define BLK_BOUNCE_HIGH (blk_max_low_pfn << PAGE_SHIFT) --- linux-2.4/drivers/block/ll_rw_blk.c.orig 2003-06-02 21:56:54.000000000 +1000 +++ linux-2.4/drivers/block/ll_rw_blk.c 2003-06-02 22:17:13.000000000 +1000 @@ -513,7 +513,10 @@ static struct request *get_request(reque struct request *rq = NULL; struct request_list *rl = q->rq + rw; - if (!list_empty(&rl->free)) { + if (list_empty(&rl->free)) + set_queue_full(q, rw); + + if (!queue_full(q, rw)) { rq = blkdev_free_rq(&rl->free); list_del(&rq->queue); rl->count--; @@ -594,7 +597,7 @@ static struct request *__get_request_wai add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); - if (q->rq[rw].count == 0) + if (queue_full(q, rw)) schedule(); spin_lock_irq(&io_request_lock); rq = get_request(q, rw); @@ -829,9 +832,14 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) - wake_up(&q->wait_for_requests[rw]); + q->rq[rw].count++; + if (q->rq[rw].count >= q->batch_requests) { + if (q->rq[rw].count == q->batch_requests) + clear_queue_full(q, rw); + + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); + } } } ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 11:57 ` Nick Piggin @ 2003-06-04 12:00 ` Jens Axboe 2003-06-04 12:09 ` Andrea Arcangeli 2003-06-04 12:11 ` Nick Piggin 2003-06-04 12:35 ` Miquel van Smoorenburg 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2 siblings, 2 replies; 114+ messages in thread From: Jens Axboe @ 2003-06-04 12:00 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 04 2003, Nick Piggin wrote: > Andrea Arcangeli wrote: > > >On Wed, Jun 04, 2003 at 12:46:33PM +0200, Marc-Christian Petersen wrote: > > > >>On Wednesday 04 June 2003 12:42, Jens Axboe wrote: > >> > >>Hi Jens, > >> > >> > >>>>>the issue with batching in 2.4, is that it is blocking at 0 and waking > >>>>>at batch_requests. But it's not blocking new get_request to eat > >>>>>requests in the way back from 0 to batch_requests. I mean, there are > >>>>>two directions, when we move from batch_requests to 0 get_requests > >>>>>should return requests. in the way back from 0 to batch_requests the > >>>>>get_request should block (and it doesn't in 2.4, that is the problem) > >>>>> > >>>>do you see a chance to fix this up in 2.4? > >>>> > >>>Nick posted a patch to do so the other day and asked people to test. > >>> > >>Silly mcp. His mail was CC'ed to me :( ... F*ck huge inbox. > >> > > > >I was probably not CC'ed, I'll search for the email (and I was > >travelling the last few days so I didn't read every single l-k email yet > >sorry ;) > > > > > The patch I sent is actually against 2.4.20, contrary to my > babling. Reports I have had say it helps, but maybe not so > much as Andrew'ss fixes. Then Matthias Mueller ported my patch > to 2.4.21-rc6 which include Andrew's fixes. > > It seems that they might be fixing two different problems. > It looks promising though. It is a different problem I think, yours will fix the starvation of writers (of readers, writers is much much easier to trigger though) where someone will repeatedly get cheaten by the request allocator. The other problem is still not clear to anyone. I doubt this patch would make any difference (apart from a psychological one) in this case, since you have a single writer and maybe a reader or two. The single writer cannot starve anyone else. -- Jens Axboe ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 12:00 ` Jens Axboe @ 2003-06-04 12:09 ` Andrea Arcangeli 2003-06-04 12:20 ` Jens Axboe 2003-06-04 12:11 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-04 12:09 UTC (permalink / raw) To: Jens Axboe Cc: Nick Piggin, Marc-Christian Petersen, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 04, 2003 at 02:00:53PM +0200, Jens Axboe wrote: > since you have a single writer and maybe a reader or two. The single > writer cannot starve anyone else. unless you're changing an atime and you've to mark_buffer_dirty or similar (balance_dirty will write stuff the same way from cp and the reader then). Maybe we can get some stack trace with kgdb to be sure where the reader is blocking. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 12:09 ` Andrea Arcangeli @ 2003-06-04 12:20 ` Jens Axboe 2003-06-04 20:50 ` Rob Landley 0 siblings, 1 reply; 114+ messages in thread From: Jens Axboe @ 2003-06-04 12:20 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 04 2003, Andrea Arcangeli wrote: > On Wed, Jun 04, 2003 at 02:00:53PM +0200, Jens Axboe wrote: > > since you have a single writer and maybe a reader or two. The single > > writer cannot starve anyone else. > > unless you're changing an atime and you've to mark_buffer_dirty or > similar (balance_dirty will write stuff the same way from cp and the > reader then). Yes you are right, could be. But the whole thing still smells fishy. Read starvation causing mouse stalls, hmm. -- Jens Axboe ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 12:20 ` Jens Axboe @ 2003-06-04 20:50 ` Rob Landley 0 siblings, 0 replies; 114+ messages in thread From: Rob Landley @ 2003-06-04 20:50 UTC (permalink / raw) To: Jens Axboe, Andrea Arcangeli; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 1497 bytes --] On Wednesday 04 June 2003 08:20, Jens Axboe wrote: > On Wed, Jun 04 2003, Andrea Arcangeli wrote: > > On Wed, Jun 04, 2003 at 02:00:53PM +0200, Jens Axboe wrote: > > > since you have a single writer and maybe a reader or two. The single > > > writer cannot starve anyone else. > > > > unless you're changing an atime and you've to mark_buffer_dirty or > > similar (balance_dirty will write stuff the same way from cp and the > > reader then). > > Yes you are right, could be. > > But the whole thing still smells fishy. Read starvation causing mouse > stalls, hmm. If reads from swap get starved, you can have interactive dropouts in just about anything. My desktop is usually pretty deep into swap. I upgrade to machines with four times as much memory, but that usually means the graphics resolution went up and it just lets me keep more windows open in more desktops. (Currently six.) My record was driving the system so deep into swapping frenzy it was still swapping when I came back from lunch. Really. This was under 2.4.4, though. On RH 9/2.4.20-? my record is a little under five minutes of "frozen thrashing on swap" before I got control of the system back. That's just a "go for a soda" break. And at least the mouse cursor never froze for more than a couple seconds at a time during that, even if the desktop was ignoring me... :) Haven't tried 2.5 on anything but servers yet, but it's on my to-do list... Rob (I am the VM subsystem's worst nightmare. Bwahaha.) [-- Attachment #2: typescript --] [-- Type: text/plain, Size: 4531 bytes --] Script started on Wed 04 Jun 2003 04:25:29 PM EDT ^[]0;landley@localhost:~[landley@localhost landley]$ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 261390336 247234560 14155776 0 9351168 80461824 Swap: 542859264 276152320 266706944 MemTotal: 255264 kB MemFree: 13824 kB MemShared: 0 kB Buffers: 9132 kB Cached: 43372 kB SwapCached: 35204 kB Active: 182324 kB ActiveAnon: 131940 kB ActiveCache: 50384 kB Inact_dirty: 19164 kB Inact_laundry: 14400 kB Inact_clean: 3512 kB Inact_target: 43880 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 255264 kB LowFree: 13824 kB SwapTotal: 530136 kB SwapFree: 260456 kB ^[]0;landley@localhost:~[landley@localhost landley]$ cat /proc/slabinfo slabinfo - version: 1.1 kmem_cache 65 70 108 2 2 1 ip_fib_hash 11 112 32 1 1 1 urb_priv 0 0 64 0 0 1 journal_head 57 770 48 1 10 1 revoke_table 2 250 12 1 1 1 revoke_record 0 112 32 0 1 1 clip_arp_cache 0 0 128 0 0 1 ip_mrt_cache 0 0 128 0 0 1 tcp_tw_bucket 0 90 128 0 3 1 tcp_bind_bucket 4 224 32 1 2 1 tcp_open_request 0 30 128 0 1 1 inet_peer_cache 0 58 64 0 1 1 ip_dst_cache 5 75 256 1 5 1 arp_cache 2 30 128 1 1 1 blkdev_requests 256 270 128 9 9 1 dnotify_cache 0 0 20 0 0 1 file_lock_cache 0 41 92 0 1 1 fasync_cache 2 200 16 1 1 1 uid_cache 2 112 32 1 1 1 skbuff_head_cache 176 2265 256 32 151 1 sock 589 720 1280 220 240 1 sigqueue 0 29 132 0 1 1 kiobuf 0 0 64 0 0 1 cdev_cache 26 232 64 2 4 1 bdev_cache 4 58 64 1 1 1 mnt_cache 13 58 64 1 1 1 inode_cache 2395 3647 512 519 521 1 dentry_cache 2477 4050 128 135 135 1 dquot 0 0 128 0 0 1 filp 2364 2370 128 79 79 1 names_cache 0 14 4096 0 14 1 buffer_head 16649 30360 128 789 1012 1 mm_struct 173 210 256 14 14 1 vm_area_struct 5840 7770 128 238 259 1 fs_cache 78 116 64 2 2 1 files_cache 78 112 512 15 16 1 signal_cache 243 290 64 5 5 1 sighand_cache 235 253 1408 22 23 4 task_struct 0 0 1792 0 0 1 pte_chain 1958 7590 128 83 253 1 size-131072(DMA) 0 0 131072 0 0 32 size-131072 0 0 131072 0 0 32 size-65536(DMA) 0 0 65536 0 0 16 size-65536 0 0 65536 0 0 16 size-32768(DMA) 0 0 32768 0 0 8 size-32768 0 0 32768 0 0 8 size-16384(DMA) 0 0 16384 0 0 4 size-16384 0 16 16384 0 16 4 size-8192(DMA) 0 0 8192 0 0 2 size-8192 4 19 8192 4 19 2 size-4096(DMA) 0 0 4096 0 0 1 size-4096 35 75 4096 35 75 1 size-2048(DMA) 0 0 2048 0 0 1 size-2048 8 86 2048 5 43 1 size-1024(DMA) 0 0 1024 0 0 1 size-1024 59 124 1024 18 31 1 size-512(DMA) 0 0 512 0 0 1 size-512 43 200 512 11 25 1 size-256(DMA) 0 0 256 0 0 1 size-256 43 1200 256 8 80 1 size-128(DMA) 1 30 128 1 1 1 size-128 707 3240 128 33 108 1 size-64(DMA) 0 0 128 0 0 1 size-64 377 1170 128 30 39 1 size-32(DMA) 17 58 64 1 1 1 size-32 397 754 64 10 13 1 ^[]0;landley@localhost:~[landley@localhost landley]$ Script done on Wed 04 Jun 2003 04:25:42 PM EDT ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 12:00 ` Jens Axboe 2003-06-04 12:09 ` Andrea Arcangeli @ 2003-06-04 12:11 ` Nick Piggin 1 sibling, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-04 12:11 UTC (permalink / raw) To: Jens Axboe Cc: Andrea Arcangeli, Marc-Christian Petersen, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Jens Axboe wrote: >On Wed, Jun 04 2003, Nick Piggin wrote: > >>Andrea Arcangeli wrote: >> >> >>>On Wed, Jun 04, 2003 at 12:46:33PM +0200, Marc-Christian Petersen wrote: >>> >>> >>>>On Wednesday 04 June 2003 12:42, Jens Axboe wrote: >>>> >>>>Hi Jens, >>>> >>>> >>>> >>>>>>>the issue with batching in 2.4, is that it is blocking at 0 and waking >>>>>>>at batch_requests. But it's not blocking new get_request to eat >>>>>>>requests in the way back from 0 to batch_requests. I mean, there are >>>>>>>two directions, when we move from batch_requests to 0 get_requests >>>>>>>should return requests. in the way back from 0 to batch_requests the >>>>>>>get_request should block (and it doesn't in 2.4, that is the problem) >>>>>>> >>>>>>> >>>>>>do you see a chance to fix this up in 2.4? >>>>>> >>>>>> >>>>>Nick posted a patch to do so the other day and asked people to test. >>>>> >>>>> >>>>Silly mcp. His mail was CC'ed to me :( ... F*ck huge inbox. >>>> >>>> >>>I was probably not CC'ed, I'll search for the email (and I was >>>travelling the last few days so I didn't read every single l-k email yet >>>sorry ;) >>> >>> >>> >>The patch I sent is actually against 2.4.20, contrary to my >>babling. Reports I have had say it helps, but maybe not so >>much as Andrew'ss fixes. Then Matthias Mueller ported my patch >>to 2.4.21-rc6 which include Andrew's fixes. >> >>It seems that they might be fixing two different problems. >>It looks promising though. >> > >It is a different problem I think, yours will fix the starvation of >writers (of readers, writers is much much easier to trigger though) >where someone will repeatedly get cheaten by the request allocator. > >The other problem is still not clear to anyone. I doubt this patch would >make any difference (apart from a psychological one) in this case, >since you have a single writer and maybe a reader or two. The single >writer cannot starve anyone else. > You are right about what the patch does. It wouldn't surprise me if there are still other problems, but it could be that the reader has to write some swap or other dirty buffers when trying to get memory itself. I have had 3 or so reports all saying similar things, but it could be psychological I guess. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 11:57 ` Nick Piggin 2003-06-04 12:00 ` Jens Axboe @ 2003-06-04 12:35 ` Miquel van Smoorenburg 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2 siblings, 0 replies; 114+ messages in thread From: Miquel van Smoorenburg @ 2003-06-04 12:35 UTC (permalink / raw) To: linux-kernel In article <3EDDDEBB.4080209@cyberone.com.au>, Nick Piggin <piggin@cyberone.com.au> wrote: >- char plugged; >+ int plugged:1; This is dangerous: struct foo { int bla:1; }; int main() { struct foo f; f.bla = 1; printf("%d\n", f.bla); } $ ./a.out -1 If you want to put "0" and "1" in a 1-bit field, use "unsigned int bla:1". Mike. -- .. somehow I have a feeling the hurting hasn't even begun yet -- Bill, "The Terrible Thunderlizards" ^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-04 11:57 ` Nick Piggin 2003-06-04 12:00 ` Jens Axboe 2003-06-04 12:35 ` Miquel van Smoorenburg @ 2003-06-09 21:39 ` Chris Mason 2003-06-09 22:19 ` Andrea Arcangeli ` (2 more replies) 2 siblings, 3 replies; 114+ messages in thread From: Chris Mason @ 2003-06-09 21:39 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Ok, there are lots of different problems here, and I've spent a little while trying to get some numbers with the __get_request_wait stats patch I posted before. This is all on ext2, since I wanted to rule out interactions with the journal flavors. Basically a dbench 90 run on ext2 rc6 vanilla kernels can generate latencies of over 2700 jiffies in __get_request_wait, with an average latency over 250 jiffies. No, most desktop workloads aren't dbench 90, but between balance_dirty() and the way we send stuff to disk during memory allocations, just about any process can get stuck submitting dirty buffers even if you've just got one process doing a dd if=/dev/zero of=foo. So, for the moment I'm going to pretend people seeing stalls in X are stuck in atime updates or memory allocations, or reading proc or some other silly spot. For the SMP corner cases, I've merged Andrea's fix-pausing patch into rc7, along with an altered form of Nick Piggin's queue_full patch to try and fix the latency problems. The major difference from Nick's patch is that once the queue is marked full, I don't clear the full flag until the wait queue is empty. This means new io can't steal available requests until every existing waiter has been granted a request. The latency results are better, with average time spent in __get_request_wait being around 28 jiffies, and a max of 170 jiffies. The cost is throughput, further benchmarking needs to be done but, but I wanted to get this out for review and testing. It should at least help us decide if the request allocation code really is causing our problems. The patch below also includes the __get_request_wait latency stats. If people try this and still see stalls, please run elvtune /dev/xxxx and send along the resulting console output. I haven't yet compared this to Andrea's elevator latency code, but the stat patch was originally developed on top of his 2.4.21pre3aa1, where the average wait was 97 jiffies and the max was 318. Anyway, less talk, more code. Treat this with care, it has only been lightly tested. Thanks to Andrea and Nick whose patches this is largely based on: --- 1.9/drivers/block/blkpg.c Sat Mar 30 06:58:05 2002 +++ edited/drivers/block/blkpg.c Mon Jun 9 12:17:24 2003 @@ -261,6 +261,7 @@ return blkpg_ioctl(dev, (struct blkpg_ioctl_arg *) arg); case BLKELVGET: + blk_print_stats(dev); return blkelvget_ioctl(&blk_get_queue(dev)->elevator, (blkelv_ioctl_arg_t *) arg); case BLKELVSET: --- 1.45/drivers/block/ll_rw_blk.c Wed May 28 03:50:02 2003 +++ edited/drivers/block/ll_rw_blk.c Mon Jun 9 17:13:16 2003 @@ -429,6 +429,8 @@ q->rq[READ].count = 0; q->rq[WRITE].count = 0; q->nr_requests = 0; + q->read_full = 0; + q->write_full = 0; si_meminfo(&si); megs = si.totalram >> (20 - PAGE_SHIFT); @@ -442,6 +444,56 @@ spin_lock_init(&q->queue_lock); } +void blk_print_stats(kdev_t dev) +{ + request_queue_t *q; + unsigned long avg_wait; + unsigned long min_wait; + unsigned long high_wait; + unsigned long *d; + + q = blk_get_queue(dev); + if (!q) + return; + + min_wait = q->min_wait; + if (min_wait == ~0UL) + min_wait = 0; + if (q->num_wait) + avg_wait = q->total_wait / q->num_wait; + else + avg_wait = 0; + printk("device %s: num_req %lu, total jiffies waited %lu\n", + kdevname(dev), q->num_req, q->total_wait); + printk("\t%lu forced to wait\n", q->num_wait); + printk("\t%lu min wait, %lu max wait\n", min_wait, q->max_wait); + printk("\t%lu average wait\n", avg_wait); + d = q->deviation; + printk("\t%lu < 100, %lu < 200, %lu < 300, %lu < 400, %lu < 500\n", + d[0], d[1], d[2], d[3], d[4]); + high_wait = d[0] + d[1] + d[2] + d[3] + d[4]; + high_wait = q->num_wait - high_wait; + printk("\t%lu waits longer than 500 jiffies\n", high_wait); +} + +static void reset_stats(request_queue_t *q) +{ + q->max_wait = 0; + q->min_wait = ~0UL; + q->total_wait = 0; + q->num_req = 0; + q->num_wait = 0; + memset(q->deviation, 0, sizeof(q->deviation)); +} +void blk_reset_stats(kdev_t dev) +{ + request_queue_t *q; + q = blk_get_queue(dev); + if (!q) + return; + printk("reset latency stats on device %s\n", kdevname(dev)); + reset_stats(q); +} static int __make_request(request_queue_t * q, int rw, struct buffer_head * bh); /** @@ -491,6 +543,9 @@ q->plug_tq.routine = &generic_unplug_device; q->plug_tq.data = q; q->plugged = 0; + + reset_stats(q); + /* * These booleans describe the queue properties. We set the * default (and most common) values here. Other drivers can @@ -508,7 +563,7 @@ * Get a free request. io_request_lock must be held and interrupts * disabled on the way in. Returns NULL if there are no free requests. */ -static struct request *get_request(request_queue_t *q, int rw) +static struct request *__get_request(request_queue_t *q, int rw) { struct request *rq = NULL; struct request_list *rl = q->rq + rw; @@ -521,10 +576,17 @@ rq->cmd = rw; rq->special = NULL; rq->q = q; - } + } else + set_queue_full(q, rw); return rq; } +static struct request *get_request(request_queue_t *q, int rw) +{ + if (queue_full(q, rw)) + return NULL; + return __get_request(q, rw); +} /* * Here's the request allocation design: @@ -588,23 +650,57 @@ static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; + int waited = 0; + unsigned long wait_start = jiffies; + unsigned long time_waited; DECLARE_WAITQUEUE(wait, current); - add_wait_queue(&q->wait_for_requests[rw], &wait); + add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + do { set_current_state(TASK_UNINTERRUPTIBLE); - generic_unplug_device(q); - if (q->rq[rw].count == 0) - schedule(); spin_lock_irq(&io_request_lock); - rq = get_request(q, rw); + if ((!waited && queue_full(q, rw)) || q->rq[rw].count == 0) { + __generic_unplug_device(q); + spin_unlock_irq(&io_request_lock); + schedule(); + spin_lock_irq(&io_request_lock); + waited = 1; + } + rq = __get_request(q, rw); spin_unlock_irq(&io_request_lock); } while (rq == NULL); remove_wait_queue(&q->wait_for_requests[rw], &wait); current->state = TASK_RUNNING; + + if (!waitqueue_active(&q->wait_for_requests[rw])) + clear_queue_full(q, rw); + + time_waited = jiffies - wait_start; + if (time_waited > q->max_wait) + q->max_wait = time_waited; + if (time_waited && time_waited < q->min_wait) + q->min_wait = time_waited; + q->total_wait += time_waited; + q->num_wait++; + if (time_waited < 500) { + q->deviation[time_waited/100]++; + } + return rq; } +static void get_request_wait_wakeup(request_queue_t *q, int rw) +{ + /* + * avoid losing an unplug if a second __get_request_wait did the + * generic_unplug_device while our __get_request_wait was running + * w/o the queue_lock held and w/ our request out of the queue. + */ + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); +} + /* RO fail safe mechanism */ static long ro_bits[MAX_BLKDEV][8]; @@ -829,8 +925,14 @@ */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests) - wake_up(&q->wait_for_requests[rw]); + q->rq[rw].count++; + if (q->rq[rw].count >= q->batch_requests) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); + else + clear_queue_full(q, rw); + } } } @@ -948,7 +1050,6 @@ */ max_sectors = get_max_sectors(bh->b_rdev); -again: req = NULL; head = &q->queue_head; /* @@ -957,6 +1058,7 @@ */ spin_lock_irq(&io_request_lock); +again: insert_here = head->prev; if (list_empty(head)) { q->plug_device_fn(q, bh->b_rdev); /* is atomic */ @@ -1042,6 +1144,9 @@ if (req == NULL) { spin_unlock_irq(&io_request_lock); freereq = __get_request_wait(q, rw); + head = &q->queue_head; + spin_lock_irq(&io_request_lock); + get_request_wait_wakeup(q, rw); goto again; } } @@ -1063,6 +1168,7 @@ req->rq_dev = bh->b_rdev; req->start_time = jiffies; req_new_io(req, 0, count); + q->num_req++; blk_started_io(count); add_request(q, req, insert_here); out: @@ -1196,8 +1302,15 @@ bh->b_rdev = bh->b_dev; bh->b_rsector = bh->b_blocknr * count; + get_bh(bh); generic_make_request(rw, bh); + /* fix race condition with wait_on_buffer() */ + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(&bh->b_wait)) + wake_up(&bh->b_wait); + + put_bh(bh); switch (rw) { case WRITE: kstat.pgpgout += count; --- 1.83/fs/buffer.c Wed May 14 12:51:00 2003 +++ edited/fs/buffer.c Mon Jun 9 13:55:22 2003 @@ -153,10 +153,23 @@ get_bh(bh); add_wait_queue(&bh->b_wait, &wait); do { - run_task_queue(&tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!buffer_locked(bh)) break; + /* + * We must read tq_disk in TQ_ACTIVE after the + * add_wait_queue effect is visible to other cpus. + * We could unplug some line above it wouldn't matter + * but we can't do that right after add_wait_queue + * without an smp_mb() in between because spin_unlock + * has inclusive semantics. + * Doing it here is the most efficient place so we + * don't do a suprious unplug if we get a racy + * wakeup that make buffer_locked to return 0, and + * doing it here avoids an explicit smp_mb() we + * rely on the implicit one in set_task_state. + */ + run_task_queue(&tq_disk); schedule(); } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; @@ -1507,6 +1520,9 @@ /* Done - end_buffer_io_async will unlock */ SetPageUptodate(page); + + wakeup_page_waiters(page); + return 0; out: @@ -1538,6 +1554,7 @@ } while (bh != head); if (need_unlock) UnlockPage(page); + wakeup_page_waiters(page); return err; } @@ -1765,6 +1782,8 @@ else submit_bh(READ, bh); } + + wakeup_page_waiters(page); return 0; } @@ -2378,6 +2397,7 @@ submit_bh(rw, bh); bh = next; } while (bh != head); + wakeup_page_waiters(page); return 0; } --- 1.49/fs/super.c Wed Dec 18 21:34:24 2002 +++ edited/fs/super.c Mon Jun 9 12:17:24 2003 @@ -726,6 +726,7 @@ if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0)) goto Einval; s->s_flags |= MS_ACTIVE; + blk_reset_stats(dev); path_release(&nd); return s; --- 1.45/fs/reiserfs/inode.c Thu May 22 16:35:02 2003 +++ edited/fs/reiserfs/inode.c Mon Jun 9 12:17:24 2003 @@ -2048,6 +2048,7 @@ */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { UnlockPage(page) ; } --- 1.23/include/linux/blkdev.h Fri Nov 29 17:03:01 2002 +++ edited/include/linux/blkdev.h Mon Jun 9 17:31:18 2003 @@ -126,6 +126,14 @@ */ char head_active; + /* + * Booleans that indicate whether the queue's free requests have + * been exhausted and is waiting to drop below the batch_requests + * threshold + */ + char read_full; + char write_full; + unsigned long bounce_pfn; /* @@ -138,8 +146,17 @@ * Tasks wait here for free read and write requests */ wait_queue_head_t wait_for_requests[2]; + unsigned long max_wait; + unsigned long min_wait; + unsigned long total_wait; + unsigned long num_req; + unsigned long num_wait; + unsigned long deviation[5]; }; +void blk_reset_stats(kdev_t dev); +void blk_print_stats(kdev_t dev); + #define blk_queue_plugged(q) (q)->plugged #define blk_fs_request(rq) ((rq)->cmd == READ || (rq)->cmd == WRITE) #define blk_queue_empty(q) list_empty(&(q)->queue_head) @@ -156,6 +173,33 @@ } } +static inline void set_queue_full(request_queue_t *q, int rw) +{ + wmb(); + if (rw == READ) + q->read_full = 1; + else + q->write_full = 1; +} + +static inline void clear_queue_full(request_queue_t *q, int rw) +{ + wmb(); + if (rw == READ) + q->read_full = 0; + else + q->write_full = 0; +} + +static inline int queue_full(request_queue_t *q, int rw) +{ + rmb(); + if (rw == READ) + return q->read_full; + else + return q->write_full; +} + extern unsigned long blk_max_low_pfn, blk_max_pfn; #define BLK_BOUNCE_HIGH (blk_max_low_pfn << PAGE_SHIFT) @@ -217,6 +261,7 @@ extern void generic_make_request(int rw, struct buffer_head * bh); extern inline request_queue_t *blk_get_queue(kdev_t dev); extern void blkdev_release_request(struct request *); +extern void blk_print_stats(kdev_t dev); /* * Access functions for manipulating queue properties --- 1.19/include/linux/pagemap.h Sun Aug 25 15:32:11 2002 +++ edited/include/linux/pagemap.h Mon Jun 9 14:47:11 2003 @@ -97,6 +97,8 @@ ___wait_on_page(page); } +extern void FASTCALL(wakeup_page_waiters(struct page * page)); + /* * Returns locked page at given index in given cache, creating it if needed. */ --- 1.68/kernel/ksyms.c Fri May 23 17:40:47 2003 +++ edited/kernel/ksyms.c Mon Jun 9 12:17:24 2003 @@ -295,6 +295,7 @@ EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); --- 1.77/mm/filemap.c Thu Apr 24 11:05:10 2003 +++ edited/mm/filemap.c Mon Jun 9 12:17:24 2003 @@ -812,6 +812,20 @@ return &wait[hash]; } +/* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * head; + + head = page_waitqueue(page); + if (waitqueue_active(head)) + wake_up(head); +} + /* * Wait for a page to get unlocked. * ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason @ 2003-06-09 22:19 ` Andrea Arcangeli 2003-06-10 0:27 ` Chris Mason 2003-06-10 23:13 ` Chris Mason 2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin 2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli 2 siblings, 2 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-09 22:19 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 3069 bytes --] Hi, On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > Ok, there are lots of different problems here, and I've spent a little > while trying to get some numbers with the __get_request_wait stats patch > I posted before. This is all on ext2, since I wanted to rule out > interactions with the journal flavors. > > Basically a dbench 90 run on ext2 rc6 vanilla kernels can generate > latencies of over 2700 jiffies in __get_request_wait, with an average > latency over 250 jiffies. > > No, most desktop workloads aren't dbench 90, but between balance_dirty() > and the way we send stuff to disk during memory allocations, just about > any process can get stuck submitting dirty buffers even if you've just > got one process doing a dd if=/dev/zero of=foo. > > So, for the moment I'm going to pretend people seeing stalls in X are > stuck in atime updates or memory allocations, or reading proc or some > other silly spot. > > For the SMP corner cases, I've merged Andrea's fix-pausing patch into > rc7, along with an altered form of Nick Piggin's queue_full patch to try > and fix the latency problems. > > The major difference from Nick's patch is that once the queue is marked > full, I don't clear the full flag until the wait queue is empty. This > means new io can't steal available requests until every existing waiter > has been granted a request. > > The latency results are better, with average time spent in > __get_request_wait being around 28 jiffies, and a max of 170 jiffies. > The cost is throughput, further benchmarking needs to be done but, but I > wanted to get this out for review and testing. It should at least help > us decide if the request allocation code really is causing our problems. > > The patch below also includes the __get_request_wait latency stats. If > people try this and still see stalls, please run elvtune /dev/xxxx and > send along the resulting console output. > > I haven't yet compared this to Andrea's elevator latency code, but the > stat patch was originally developed on top of his 2.4.21pre3aa1, where > the average wait was 97 jiffies and the max was 318. > > Anyway, less talk, more code. Treat this with care, it has only been > lightly tested. Thanks to Andrea and Nick whose patches this is largely > based on: I spent last Saturday working on this too. This is the status of my current patches, would be interesting to compare them. they're not very well tested yet though. They would obsoletes the old fix-pausing and the old elevator-lowlatency (I was going to release a new tree today, but I delayed it so I fixed uml today too first [tested with skas and w/o skas]). those backout the rc7 interactivity changes (the only one that wasn't in my tree was the add_wait_queue_exclusive, that IMHO would better stay for scalability reasons). Of course I would be very interested to know if those two patches (or Chris's one, you also retained the exclusive wakeup) are still greatly improved by removing the _exclusive weakups and going wake-all (in theory they shouldn't). Andrea [-- Attachment #2: 9980_fix-pausing-4 --] [-- Type: text/plain, Size: 8297 bytes --] diff -urNp --exclude CVS --exclude BitKeeper xx-ref/drivers/block/ll_rw_blk.c xx/drivers/block/ll_rw_blk.c --- xx-ref/drivers/block/ll_rw_blk.c 2003-06-07 15:22:23.000000000 +0200 +++ xx/drivers/block/ll_rw_blk.c 2003-06-07 15:22:27.000000000 +0200 @@ -596,12 +596,20 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - add_wait_queue(&q->wait_for_requests[rw], &wait); + add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); - generic_unplug_device(q); - if (q->rq[rw].count == 0) + if (q->rq[rw].count == 0) { + /* + * All we care about is not to stall if any request + * is been released after we set TASK_UNINTERRUPTIBLE. + * This is the most efficient place to unplug the queue + * in case we hit the race and we can get the request + * without waiting. + */ + generic_unplug_device(q); schedule(); + } spin_lock_irq(q->queue_lock); rq = get_request(q, rw); spin_unlock_irq(q->queue_lock); @@ -611,6 +619,17 @@ static struct request *__get_request_wai return rq; } +static void get_request_wait_wakeup(request_queue_t *q, int rw) +{ + /* + * avoid losing an unplug if a second __get_request_wait did the + * generic_unplug_device while our __get_request_wait was running + * w/o the queue_lock held and w/ our request out of the queue. + */ + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); +} + /* RO fail safe mechanism */ static long ro_bits[MAX_BLKDEV][8]; @@ -835,8 +854,11 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests) - wake_up(&q->wait_for_requests[rw]); + if (++q->rq[rw].count >= q->batch_requests) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); + } } } @@ -954,7 +976,6 @@ static int __make_request(request_queue_ */ max_sectors = get_max_sectors(bh->b_rdev); -again: req = NULL; head = &q->queue_head; /* @@ -963,6 +984,7 @@ again: */ spin_lock_irq(q->queue_lock); +again: insert_here = head->prev; if (list_empty(head)) { q->plug_device_fn(q, bh->b_rdev); /* is atomic */ @@ -1048,6 +1070,9 @@ get_rq: if (req == NULL) { spin_unlock_irq(q->queue_lock); freereq = __get_request_wait(q, rw); + head = &q->queue_head; + spin_lock_irq(q->queue_lock); + get_request_wait_wakeup(q, rw); goto again; } } @@ -1202,8 +1227,21 @@ void __submit_bh(int rw, struct buffer_h bh->b_rdev = bh->b_dev; bh->b_rsector = blocknr; + /* + * we play with the bh wait queue below, need to keep a + * reference so the buffer doesn't get freed after the + * end_io handler runs + */ + get_bh(bh); + generic_make_request(rw, bh); + /* fix race condition with wait_on_buffer() */ + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(&bh->b_wait)) + wake_up(&bh->b_wait); + put_bh(bh); + switch (rw) { case WRITE: kstat.pgpgout += count; diff -urNp --exclude CVS --exclude BitKeeper xx-ref/fs/buffer.c xx/fs/buffer.c --- xx-ref/fs/buffer.c 2003-06-07 15:22:23.000000000 +0200 +++ xx/fs/buffer.c 2003-06-07 15:22:27.000000000 +0200 @@ -158,10 +158,23 @@ void __wait_on_buffer(struct buffer_head get_bh(bh); add_wait_queue(&bh->b_wait, &wait); do { - run_task_queue(&tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!buffer_locked(bh)) break; + /* + * We must read tq_disk in TQ_ACTIVE after the + * add_wait_queue effect is visible to other cpus. + * We could unplug some line above it wouldn't matter + * but we can't do that right after add_wait_queue + * without an smp_mb() in between because spin_unlock + * has inclusive semantics. + * Doing it here is the most efficient place so we + * don't do a suprious unplug if we get a racy + * wakeup that make buffer_locked to return 0, and + * doing it here avoids an explicit smp_mb() we + * rely on the implicit one in set_task_state. + */ + run_task_queue(&tq_disk); schedule(); } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; @@ -1471,6 +1484,7 @@ static int __block_write_full_page(struc if (!page->buffers) create_empty_buffers(page, inode->i_dev, 1 << inode->i_blkbits); + BUG_ON(page_count(page) < 3); head = page->buffers; block = page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits); @@ -1517,6 +1531,9 @@ static int __block_write_full_page(struc /* Done - end_buffer_io_async will unlock */ SetPageUptodate(page); + + wakeup_page_waiters(page); + return 0; out: @@ -1548,6 +1565,7 @@ out: } while (bh != head); if (need_unlock) UnlockPage(page); + wakeup_page_waiters(page); return err; } @@ -1721,6 +1739,7 @@ int block_read_full_page(struct page *pa blocksize = 1 << inode->i_blkbits; if (!page->buffers) create_empty_buffers(page, inode->i_dev, blocksize); + BUG_ON(page_count(page) < 3); head = page->buffers; blocks = PAGE_CACHE_SIZE >> inode->i_blkbits; @@ -1781,6 +1800,8 @@ int block_read_full_page(struct page *pa else submit_bh(READ, bh); } + + wakeup_page_waiters(page); return 0; } @@ -2400,6 +2421,7 @@ int brw_page(int rw, struct page *page, if (!page->buffers) create_empty_buffers(page, dev, size); + BUG_ON(page_count(page) < 3); head = bh = page->buffers; /* Stage 1: lock all the buffers */ @@ -2417,6 +2439,7 @@ int brw_page(int rw, struct page *page, submit_bh(rw, bh); bh = next; } while (bh != head); + wakeup_page_waiters(page); return 0; } diff -urNp --exclude CVS --exclude BitKeeper xx-ref/fs/reiserfs/inode.c xx/fs/reiserfs/inode.c --- xx-ref/fs/reiserfs/inode.c 2003-06-07 15:22:11.000000000 +0200 +++ xx/fs/reiserfs/inode.c 2003-06-07 15:22:27.000000000 +0200 @@ -2048,6 +2048,7 @@ static int reiserfs_write_full_page(stru */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { UnlockPage(page) ; } diff -urNp --exclude CVS --exclude BitKeeper xx-ref/include/linux/pagemap.h xx/include/linux/pagemap.h --- xx-ref/include/linux/pagemap.h 2003-06-07 15:22:23.000000000 +0200 +++ xx/include/linux/pagemap.h 2003-06-07 15:22:27.000000000 +0200 @@ -98,6 +98,8 @@ static inline void wait_on_page(struct p ___wait_on_page(page); } +extern void FASTCALL(wakeup_page_waiters(struct page * page)); + /* * Returns locked page at given index in given cache, creating it if needed. */ diff -urNp --exclude CVS --exclude BitKeeper xx-ref/kernel/ksyms.c xx/kernel/ksyms.c --- xx-ref/kernel/ksyms.c 2003-06-07 15:22:23.000000000 +0200 +++ xx/kernel/ksyms.c 2003-06-07 15:22:27.000000000 +0200 @@ -319,6 +319,7 @@ EXPORT_SYMBOL(filemap_fdatasync); EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); diff -urNp --exclude CVS --exclude BitKeeper xx-ref/mm/filemap.c xx/mm/filemap.c --- xx-ref/mm/filemap.c 2003-06-07 15:22:23.000000000 +0200 +++ xx/mm/filemap.c 2003-06-07 15:22:27.000000000 +0200 @@ -779,6 +779,20 @@ inline wait_queue_head_t * page_waitqueu return wait_table_hashfn(page, &pgdat->wait_table); } +/* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * head; + + head = page_waitqueue(page); + if (waitqueue_active(head)) + wake_up(head); +} + /* * Wait for a page to get unlocked. * diff -urNp --exclude CVS --exclude BitKeeper xx-ref/mm/swapfile.c xx/mm/swapfile.c --- xx-ref/mm/swapfile.c 2003-06-07 15:22:23.000000000 +0200 +++ xx/mm/swapfile.c 2003-06-07 15:22:44.000000000 +0200 @@ -984,8 +984,10 @@ asmlinkage long sys_swapon(const char * goto bad_swap; } + get_page(virt_to_page(swap_header)); lock_page(virt_to_page(swap_header)); rw_swap_page_nolock(READ, SWP_ENTRY(type,0), (char *) swap_header); + put_page(virt_to_page(swap_header)); if (!memcmp("SWAP-SPACE",swap_header->magic.magic,10)) swap_header_version = 1; [-- Attachment #3: 9981_elevator-lowlatency-5 --] [-- Type: text/plain, Size: 23407 bytes --] Binary files x-ref/ID and x/ID differ diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/block/DAC960.c x/drivers/block/DAC960.c --- x-ref/drivers/block/DAC960.c 2002-11-29 02:22:58.000000000 +0100 +++ x/drivers/block/DAC960.c 2003-06-07 12:37:50.000000000 +0200 @@ -19,8 +19,8 @@ */ -#define DAC960_DriverVersion "2.4.11" -#define DAC960_DriverDate "11 October 2001" +#define DAC960_DriverVersion "2.4.20aa1" +#define DAC960_DriverDate "4 December 2002" #include <linux/version.h> @@ -2975,8 +2975,9 @@ static boolean DAC960_ProcessRequest(DAC Command->SegmentCount = Request->nr_segments; Command->BufferHeader = Request->bh; Command->RequestBuffer = Request->buffer; + Command->Request = Request; blkdev_dequeue_request(Request); - blkdev_release_request(Request); + /* blkdev_release_request(Request); */ DAC960_QueueReadWriteCommand(Command); return true; } @@ -3023,11 +3024,12 @@ static void DAC960_RequestFunction(Reque individual Buffer. */ -static inline void DAC960_ProcessCompletedBuffer(BufferHeader_T *BufferHeader, +static inline void DAC960_ProcessCompletedBuffer(IO_Request_T *Req, BufferHeader_T *BufferHeader, boolean SuccessfulIO) { - blk_finished_io(BufferHeader->b_size >> 9); + blk_finished_io(Req, BufferHeader->b_size >> 9); BufferHeader->b_end_io(BufferHeader, SuccessfulIO); + } @@ -3116,9 +3118,10 @@ static void DAC960_V1_ProcessCompletedCo { BufferHeader_T *NextBufferHeader = BufferHeader->b_reqnext; BufferHeader->b_reqnext = NULL; - DAC960_ProcessCompletedBuffer(BufferHeader, true); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, true); BufferHeader = NextBufferHeader; } + blkdev_release_request(Command->Request); if (Command->Completion != NULL) { complete(Command->Completion); @@ -3161,7 +3164,7 @@ static void DAC960_V1_ProcessCompletedCo { BufferHeader_T *NextBufferHeader = BufferHeader->b_reqnext; BufferHeader->b_reqnext = NULL; - DAC960_ProcessCompletedBuffer(BufferHeader, false); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, false); BufferHeader = NextBufferHeader; } if (Command->Completion != NULL) @@ -3169,6 +3172,7 @@ static void DAC960_V1_ProcessCompletedCo complete(Command->Completion); Command->Completion = NULL; } + blkdev_release_request(Command->Request); } } else if (CommandType == DAC960_ReadRetryCommand || @@ -3180,12 +3184,12 @@ static void DAC960_V1_ProcessCompletedCo Perform completion processing for this single buffer. */ if (CommandStatus == DAC960_V1_NormalCompletion) - DAC960_ProcessCompletedBuffer(BufferHeader, true); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, true); else { if (CommandStatus != DAC960_V1_LogicalDriveNonexistentOrOffline) DAC960_V1_ReadWriteError(Command); - DAC960_ProcessCompletedBuffer(BufferHeader, false); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, false); } if (NextBufferHeader != NULL) { @@ -3203,6 +3207,7 @@ static void DAC960_V1_ProcessCompletedCo DAC960_QueueCommand(Command); return; } + blkdev_release_request(Command->Request); } else if (CommandType == DAC960_MonitoringCommand || CommandOpcode == DAC960_V1_Enquiry || @@ -4222,9 +4227,10 @@ static void DAC960_V2_ProcessCompletedCo { BufferHeader_T *NextBufferHeader = BufferHeader->b_reqnext; BufferHeader->b_reqnext = NULL; - DAC960_ProcessCompletedBuffer(BufferHeader, true); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, true); BufferHeader = NextBufferHeader; } + blkdev_release_request(Command->Request); if (Command->Completion != NULL) { complete(Command->Completion); @@ -4267,9 +4273,10 @@ static void DAC960_V2_ProcessCompletedCo { BufferHeader_T *NextBufferHeader = BufferHeader->b_reqnext; BufferHeader->b_reqnext = NULL; - DAC960_ProcessCompletedBuffer(BufferHeader, false); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, false); BufferHeader = NextBufferHeader; } + blkdev_release_request(Command->Request); if (Command->Completion != NULL) { complete(Command->Completion); @@ -4286,12 +4293,12 @@ static void DAC960_V2_ProcessCompletedCo Perform completion processing for this single buffer. */ if (CommandStatus == DAC960_V2_NormalCompletion) - DAC960_ProcessCompletedBuffer(BufferHeader, true); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, true); else { if (Command->V2.RequestSense.SenseKey != DAC960_SenseKey_NotReady) DAC960_V2_ReadWriteError(Command); - DAC960_ProcessCompletedBuffer(BufferHeader, false); + DAC960_ProcessCompletedBuffer(Command->Request, BufferHeader, false); } if (NextBufferHeader != NULL) { @@ -4319,6 +4326,7 @@ static void DAC960_V2_ProcessCompletedCo DAC960_QueueCommand(Command); return; } + blkdev_release_request(Command->Request); } else if (CommandType == DAC960_MonitoringCommand) { diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/block/DAC960.h x/drivers/block/DAC960.h --- x-ref/drivers/block/DAC960.h 2002-01-22 18:54:52.000000000 +0100 +++ x/drivers/block/DAC960.h 2003-06-07 12:37:50.000000000 +0200 @@ -2282,6 +2282,7 @@ typedef struct DAC960_Command unsigned int SegmentCount; BufferHeader_T *BufferHeader; void *RequestBuffer; + IO_Request_T *Request; union { struct { DAC960_V1_CommandMailbox_T CommandMailbox; @@ -4265,12 +4266,4 @@ static void DAC960_Message(DAC960_Messag static void DAC960_CreateProcEntries(void); static void DAC960_DestroyProcEntries(void); - -/* - Export the Kernel Mode IOCTL interface. -*/ - -EXPORT_SYMBOL(DAC960_KernelIOCTL); - - #endif /* DAC960_DriverVersion */ diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/block/cciss.c x/drivers/block/cciss.c --- x-ref/drivers/block/cciss.c 2003-06-07 12:37:40.000000000 +0200 +++ x/drivers/block/cciss.c 2003-06-07 12:37:50.000000000 +0200 @@ -1990,14 +1990,14 @@ static void start_io( ctlr_info_t *h) } } -static inline void complete_buffers( struct buffer_head *bh, int status) +static inline void complete_buffers(struct request * req, struct buffer_head *bh, int status) { struct buffer_head *xbh; while(bh) { xbh = bh->b_reqnext; bh->b_reqnext = NULL; - blk_finished_io(bh->b_size >> 9); + blk_finished_io(req, bh->b_size >> 9); bh->b_end_io(bh, status); bh = xbh; } @@ -2140,7 +2140,7 @@ static inline void complete_command( ctl pci_unmap_page(hba[cmd->ctlr]->pdev, temp64.val, cmd->SG[i].Len, ddir); } - complete_buffers(cmd->rq->bh, status); + complete_buffers(cmd->rq, cmd->rq->bh, status); #ifdef CCISS_DEBUG printk("Done with %p\n", cmd->rq); #endif /* CCISS_DEBUG */ @@ -2224,7 +2224,7 @@ next: printk(KERN_WARNING "doreq cmd for %d, %x at %p\n", h->ctlr, creq->rq_dev, creq); blkdev_dequeue_request(creq); - complete_buffers(creq->bh, 0); + complete_buffers(creq, creq->bh, 0); end_that_request_last(creq); goto startio; } diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/block/cpqarray.c x/drivers/block/cpqarray.c --- x-ref/drivers/block/cpqarray.c 2003-06-07 12:37:38.000000000 +0200 +++ x/drivers/block/cpqarray.c 2003-06-07 12:37:50.000000000 +0200 @@ -169,7 +169,7 @@ static void start_io(ctlr_info_t *h); static inline void addQ(cmdlist_t **Qptr, cmdlist_t *c); static inline cmdlist_t *removeQ(cmdlist_t **Qptr, cmdlist_t *c); -static inline void complete_buffers(struct buffer_head *bh, int ok); +static inline void complete_buffers(struct request * req, struct buffer_head *bh, int ok); static inline void complete_command(cmdlist_t *cmd, int timeout); static void do_ida_intr(int irq, void *dev_id, struct pt_regs * regs); @@ -981,7 +981,7 @@ next: printk(KERN_WARNING "doreq cmd for %d, %x at %p\n", h->ctlr, creq->rq_dev, creq); blkdev_dequeue_request(creq); - complete_buffers(creq->bh, 0); + complete_buffers(creq, creq->bh, 0); end_that_request_last(creq); goto startio; } @@ -1082,14 +1082,14 @@ static void start_io(ctlr_info_t *h) } } -static inline void complete_buffers(struct buffer_head *bh, int ok) +static inline void complete_buffers(struct request * req, struct buffer_head *bh, int ok) { struct buffer_head *xbh; while(bh) { xbh = bh->b_reqnext; bh->b_reqnext = NULL; - blk_finished_io(bh->b_size >> 9); + blk_finished_io(req, bh->b_size >> 9); bh->b_end_io(bh, ok); bh = xbh; @@ -1131,7 +1131,7 @@ static inline void complete_command(cmdl (cmd->req.hdr.cmd == IDA_READ) ? PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE); } - complete_buffers(cmd->rq->bh, ok); + complete_buffers(cmd->rq, cmd->rq->bh, ok); DBGPX(printk("Done with %p\n", cmd->rq);); req_finished_io(cmd->rq); end_that_request_last(cmd->rq); diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/block/ll_rw_blk.c x/drivers/block/ll_rw_blk.c --- x-ref/drivers/block/ll_rw_blk.c 2003-06-07 12:37:48.000000000 +0200 +++ x/drivers/block/ll_rw_blk.c 2003-06-07 12:53:40.000000000 +0200 @@ -183,11 +183,12 @@ void blk_cleanup_queue(request_queue_t * { int count = q->nr_requests; - count -= __blk_cleanup_queue(&q->rq[READ]); - count -= __blk_cleanup_queue(&q->rq[WRITE]); + count -= __blk_cleanup_queue(&q->rq); if (count) printk("blk_cleanup_queue: leaked requests (%d)\n", count); + if (atomic_read(&q->nr_sectors)) + printk("blk_cleanup_queue: leaked sectors (%d)\n", atomic_read(&q->nr_sectors)); memset(q, 0, sizeof(*q)); } @@ -396,7 +397,7 @@ void generic_unplug_device(void *data) * * Returns the (new) number of requests which the queue has available. */ -int blk_grow_request_list(request_queue_t *q, int nr_requests) +int blk_grow_request_list(request_queue_t *q, int nr_requests, int max_queue_sectors) { unsigned long flags; /* Several broken drivers assume that this function doesn't sleep, @@ -406,21 +407,31 @@ int blk_grow_request_list(request_queue_ spin_lock_irqsave(q->queue_lock, flags); while (q->nr_requests < nr_requests) { struct request *rq; - int rw; rq = kmem_cache_alloc(request_cachep, SLAB_ATOMIC); if (rq == NULL) break; memset(rq, 0, sizeof(*rq)); rq->rq_status = RQ_INACTIVE; - rw = q->nr_requests & 1; - list_add(&rq->queue, &q->rq[rw].free); - q->rq[rw].count++; + list_add(&rq->queue, &q->rq.free); + q->rq.count++; q->nr_requests++; } + + /* + * Wakeup waiters after both one quarter of the + * max-in-fligh queue and one quarter of the requests + * are available again. + */ q->batch_requests = q->nr_requests / 4; if (q->batch_requests > 32) q->batch_requests = 32; + q->batch_sectors = max_queue_sectors / 4; + + q->max_queue_sectors = max_queue_sectors; + + BUG_ON(!q->batch_sectors); + atomic_set(&q->nr_sectors, 0); spin_unlock_irqrestore(q->queue_lock, flags); return q->nr_requests; } @@ -429,23 +440,26 @@ static void blk_init_free_list(request_q { struct sysinfo si; int megs; /* Total memory, in megabytes */ - int nr_requests; + int nr_requests, max_queue_sectors = MAX_QUEUE_SECTORS; - INIT_LIST_HEAD(&q->rq[READ].free); - INIT_LIST_HEAD(&q->rq[WRITE].free); - q->rq[READ].count = 0; - q->rq[WRITE].count = 0; + INIT_LIST_HEAD(&q->rq.free); + q->rq.count = 0; q->nr_requests = 0; si_meminfo(&si); megs = si.totalram >> (20 - PAGE_SHIFT); - nr_requests = 128; - if (megs < 32) + nr_requests = MAX_NR_REQUESTS; + if (megs < 30) { nr_requests /= 2; - blk_grow_request_list(q, nr_requests); + max_queue_sectors /= 2; + } + /* notice early if anybody screwed the defaults */ + BUG_ON(!nr_requests); + BUG_ON(!max_queue_sectors); + + blk_grow_request_list(q, nr_requests, max_queue_sectors); - init_waitqueue_head(&q->wait_for_requests[0]); - init_waitqueue_head(&q->wait_for_requests[1]); + init_waitqueue_head(&q->wait_for_requests); } static int __make_request(request_queue_t * q, int rw, struct buffer_head * bh); @@ -514,12 +528,19 @@ void blk_init_queue(request_queue_t * q, * Get a free request. io_request_lock must be held and interrupts * disabled on the way in. Returns NULL if there are no free requests. */ +static struct request * FASTCALL(get_request(request_queue_t *q, int rw)); static struct request *get_request(request_queue_t *q, int rw) { struct request *rq = NULL; - struct request_list *rl = q->rq + rw; + struct request_list *rl; - if (!list_empty(&rl->free)) { + if (blk_oversized_queue(q)) + goto out; + + rl = &q->rq; + if (list_empty(&rl->free)) + q->full = 1; + if (!q->full) { rq = blkdev_free_rq(&rl->free); list_del(&rq->queue); rl->count--; @@ -529,6 +550,7 @@ static struct request *get_request(reque rq->q = q; } + out: return rq; } @@ -596,10 +618,25 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + add_wait_queue_exclusive(&q->wait_for_requests, &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); - if (q->rq[rw].count == 0) { + + /* + * We must read rq.count and blk_oversized_queue() + * and unplug the queue atomically (with the + * spinlock being held for the whole duration of the + * operation). Otherwise we risk to unplug the queue + * before the request is visible in the I/O queue. + * + * On the __make_request side we depend on get_request, + * get_request_wait_wakeup and blk_started_io to run + * under the q->queue_lock and to never release it + * until the request is visible in the I/O queue + * (i.e. after add_request). + */ + spin_lock_irq(q->queue_lock); + if (q->full || blk_oversized_queue(q)) { /* * All we care about is not to stall if any request * is been released after we set TASK_UNINTERRUPTIBLE. @@ -607,14 +644,16 @@ static struct request *__get_request_wai * in case we hit the race and we can get the request * without waiting. */ - generic_unplug_device(q); + __generic_unplug_device(q); + + spin_unlock_irq(q->queue_lock); schedule(); + spin_lock_irq(q->queue_lock); } - spin_lock_irq(q->queue_lock); rq = get_request(q, rw); spin_unlock_irq(q->queue_lock); } while (rq == NULL); - remove_wait_queue(&q->wait_for_requests[rw], &wait); + remove_wait_queue(&q->wait_for_requests, &wait); current->state = TASK_RUNNING; return rq; } @@ -626,8 +665,8 @@ static void get_request_wait_wakeup(requ * generic_unplug_device while our __get_request_wait was running * w/o the queue_lock held and w/ our request out of the queue. */ - if (waitqueue_active(&q->wait_for_requests[rw])) - wake_up(&q->wait_for_requests[rw]); + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); } /* RO fail safe mechanism */ @@ -843,7 +882,6 @@ static inline void add_request(request_q void blkdev_release_request(struct request *req) { request_queue_t *q = req->q; - int rw = req->cmd; req->rq_status = RQ_INACTIVE; req->q = NULL; @@ -853,11 +891,13 @@ void blkdev_release_request(struct reque * assume it has free buffers and check waiters */ if (q) { - list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests) { + list_add(&req->queue, &q->rq.free); + if (++q->rq.count >= q->batch_requests && !blk_oversized_queue_batch(q)) { + if (q->full) + q->full = 0; smp_mb(); - if (waitqueue_active(&q->wait_for_requests[rw])) - wake_up(&q->wait_for_requests[rw]); + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); } } } @@ -1003,7 +1043,7 @@ again: req->bhtail->b_reqnext = bh; req->bhtail = bh; req->nr_sectors = req->hard_nr_sectors += count; - blk_started_io(count); + blk_started_io(req, count); drive_stat_acct(req->rq_dev, req->cmd, count, 0); req_new_io(req, 1, count); attempt_back_merge(q, req, max_sectors, max_segments); @@ -1025,7 +1065,7 @@ again: req->current_nr_sectors = req->hard_cur_sectors = count; req->sector = req->hard_sector = sector; req->nr_sectors = req->hard_nr_sectors += count; - blk_started_io(count); + blk_started_io(req, count); drive_stat_acct(req->rq_dev, req->cmd, count, 0); req_new_io(req, 1, count); attempt_front_merge(q, head, req, max_sectors, max_segments); @@ -1058,7 +1098,7 @@ get_rq: * See description above __get_request_wait() */ if (rw_ahead) { - if (q->rq[rw].count < q->batch_requests) { + if (q->rq.count < q->batch_requests || blk_oversized_queue_batch(q)) { spin_unlock_irq(q->queue_lock); goto end_io; } @@ -1094,7 +1134,7 @@ get_rq: req->rq_dev = bh->b_rdev; req->start_time = jiffies; req_new_io(req, 0, count); - blk_started_io(count); + blk_started_io(req, count); add_request(q, req, insert_here); out: if (freereq) @@ -1391,7 +1431,7 @@ int end_that_request_first (struct reque if ((bh = req->bh) != NULL) { nsect = bh->b_size >> 9; - blk_finished_io(nsect); + blk_finished_io(req, nsect); req->bh = bh->b_reqnext; bh->b_reqnext = NULL; bh->b_end_io(bh, uptodate); diff -urNp --exclude CVS --exclude BitKeeper x-ref/drivers/scsi/scsi_lib.c x/drivers/scsi/scsi_lib.c --- x-ref/drivers/scsi/scsi_lib.c 2003-06-07 12:37:47.000000000 +0200 +++ x/drivers/scsi/scsi_lib.c 2003-06-07 12:37:50.000000000 +0200 @@ -384,7 +384,7 @@ static Scsi_Cmnd *__scsi_end_request(Scs do { if ((bh = req->bh) != NULL) { nsect = bh->b_size >> 9; - blk_finished_io(nsect); + blk_finished_io(req, nsect); req->bh = bh->b_reqnext; bh->b_reqnext = NULL; sectors -= nsect; diff -urNp --exclude CVS --exclude BitKeeper x-ref/include/linux/blkdev.h x/include/linux/blkdev.h --- x-ref/include/linux/blkdev.h 2003-06-07 12:37:47.000000000 +0200 +++ x/include/linux/blkdev.h 2003-06-07 12:49:16.000000000 +0200 @@ -64,12 +64,6 @@ typedef int (make_request_fn) (request_q typedef void (plug_device_fn) (request_queue_t *q, kdev_t device); typedef void (unplug_device_fn) (void *q); -/* - * Default nr free requests per queue, ll_rw_blk will scale it down - * according to available RAM at init time - */ -#define QUEUE_NR_REQUESTS 8192 - struct request_list { unsigned int count; struct list_head free; @@ -80,7 +74,7 @@ struct request_queue /* * the queue request freelist, one for reads and one for writes */ - struct request_list rq[2]; + struct request_list rq; /* * The total number of requests on each queue @@ -93,6 +87,21 @@ struct request_queue int batch_requests; /* + * The total number of 512byte blocks on each queue + */ + atomic_t nr_sectors; + + /* + * Batching threshold for sleep/wakeup decisions + */ + int batch_sectors; + + /* + * The max number of 512byte blocks on each queue + */ + int max_queue_sectors; + + /* * Together with queue_head for cacheline sharing */ struct list_head queue_head; @@ -118,13 +127,20 @@ struct request_queue /* * Boolean that indicates whether this queue is plugged or not. */ - char plugged; + int plugged:1; /* * Boolean that indicates whether current_request is active or * not. */ - char head_active; + int head_active:1; + + /* + * Booleans that indicate whether the queue's free requests have + * been exhausted and is waiting to drop below the batch_requests + * threshold + */ + int full:1; unsigned long bounce_pfn; @@ -137,7 +153,7 @@ struct request_queue /* * Tasks wait here for free read and write requests */ - wait_queue_head_t wait_for_requests[2]; + wait_queue_head_t wait_for_requests; }; #define blk_queue_plugged(q) (q)->plugged @@ -221,7 +237,7 @@ extern void blkdev_release_request(struc /* * Access functions for manipulating queue properties */ -extern int blk_grow_request_list(request_queue_t *q, int nr_requests); +extern int blk_grow_request_list(request_queue_t *q, int nr_requests, int max_queue_sectors); extern void blk_init_queue(request_queue_t *, request_fn_proc *); extern void blk_cleanup_queue(request_queue_t *); extern void blk_queue_headactive(request_queue_t *, int); @@ -245,6 +261,8 @@ extern char * blkdev_varyio[MAX_BLKDEV]; #define MAX_SEGMENTS 128 #define MAX_SECTORS 255 +#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ +#define MAX_NR_REQUESTS 1024 /* 1024k when in 512 units, normally min is 1M in 1k units */ #define PageAlignSize(size) (((size) + PAGE_SIZE -1) & PAGE_MASK) @@ -271,8 +289,40 @@ static inline int get_hardsect_size(kdev return retval; } -#define blk_finished_io(nsects) do { } while (0) -#define blk_started_io(nsects) do { } while (0) +static inline int blk_oversized_queue(request_queue_t * q) +{ + return atomic_read(&q->nr_sectors) > q->max_queue_sectors; +} + +static inline int blk_oversized_queue_batch(request_queue_t * q) +{ + return atomic_read(&q->nr_sectors) > q->max_queue_sectors - q->batch_sectors; +} + +static inline void blk_started_io(struct request * req, int nsects) +{ + request_queue_t * q = req->q; + + if (q) + atomic_add(nsects, &q->nr_sectors); + BUG_ON(atomic_read(&q->nr_sectors) < 0); +} + +static inline void blk_finished_io(struct request * req, int nsects) +{ + request_queue_t * q = req->q; + + /* special requests belongs to a null queue */ + if (q) { + atomic_sub(nsects, &q->nr_sectors); + if (q->rq.count >= q->batch_requests && !blk_oversized_queue_batch(q)) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); + } + } + BUG_ON(atomic_read(&q->nr_sectors) < 0); +} static inline unsigned int blksize_bits(unsigned int size) { diff -urNp --exclude CVS --exclude BitKeeper x-ref/include/linux/elevator.h x/include/linux/elevator.h --- x-ref/include/linux/elevator.h 2002-11-29 02:23:18.000000000 +0100 +++ x/include/linux/elevator.h 2003-06-07 12:37:50.000000000 +0200 @@ -80,7 +80,7 @@ static inline int elevator_request_laten return latency; } -#define ELV_LINUS_SEEK_COST 16 +#define ELV_LINUS_SEEK_COST 1 #define ELEVATOR_NOOP \ ((elevator_t) { \ @@ -93,8 +93,8 @@ static inline int elevator_request_laten #define ELEVATOR_LINUS \ ((elevator_t) { \ - 2048, /* read passovers */ \ - 8192, /* write passovers */ \ + 128, /* read passovers */ \ + 512, /* write passovers */ \ \ elevator_linus_merge, /* elevator_merge_fn */ \ elevator_linus_merge_req, /* elevator_merge_req_fn */ \ diff -urNp --exclude CVS --exclude BitKeeper x-ref/include/linux/nbd.h x/include/linux/nbd.h --- x-ref/include/linux/nbd.h 2003-04-01 12:07:54.000000000 +0200 +++ x/include/linux/nbd.h 2003-06-07 12:37:50.000000000 +0200 @@ -48,7 +48,7 @@ nbd_end_request(struct request *req) spin_lock_irqsave(&io_request_lock, flags); while((bh = req->bh) != NULL) { nsect = bh->b_size >> 9; - blk_finished_io(nsect); + blk_finished_io(req, nsect); req->bh = bh->b_reqnext; bh->b_reqnext = NULL; bh->b_end_io(bh, uptodate); ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-09 22:19 ` Andrea Arcangeli @ 2003-06-10 0:27 ` Chris Mason 2003-06-10 23:13 ` Chris Mason 1 sibling, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-10 0:27 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Mon, 2003-06-09 at 18:19, Andrea Arcangeli wrote: > > Anyway, less talk, more code. Treat this with care, it has only been > > lightly tested. Thanks to Andrea and Nick whose patches this is largely > > based on: > > I spent last Saturday working on this too. This is the status of my > current patches, would be interesting to compare them. they're not very > well tested yet though. > I'll try to get some numbers in the morning. > They would obsoletes the old fix-pausing and the old elevator-lowlatency > (I was going to release a new tree today, but I delayed it so I fixed > uml today too first [tested with skas and w/o skas]). > > those backout the rc7 interactivity changes (the only one that wasn't in > my tree was the add_wait_queue_exclusive, that IMHO would better stay > for scalability reasons). > I didn't test without _exlusive for the final iteration of my patch, but in all the early ones using _exclusive improve latencies. I think people are reporting otherwise because they have hit the sweet spot for number of procs going after the requests. With _exclusive they have a higher chance of getting starved by a new process coming in, without the _exclusive, each waiter has a fighting chance of getting to the free request on their own. Hopefully we can do better with the _exclusive, it does seem to scale much better. Aside from the io in flight calculations, the major difference between our patches is in __get_request_wait. Once a process waits once, that call to __get_request_wait ignores q->full in my code. I found the q->full checks did help, but as you increased the number of concurrent readers/writers things broke down to the old high latencies. By delaying the point where q->full was cleared, I could make the latency benefit last for a higher number of procs. Finally I gave up and left it set until all the waiters were gone, which seems to have the most consistent results. The interesting part was it didn't seem to change the hit in throughput. The cost was about the same between the original patch and my final one, but I need to test more. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-09 22:19 ` Andrea Arcangeli 2003-06-10 0:27 ` Chris Mason @ 2003-06-10 23:13 ` Chris Mason 2003-06-11 0:16 ` Andrea Arcangeli 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-10 23:13 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Mon, 2003-06-09 at 18:19, Andrea Arcangeli wrote: > I spent last Saturday working on this too. This is the status of my > current patches, would be interesting to compare them. they're not very > well tested yet though. > > They would obsoletes the old fix-pausing and the old elevator-lowlatency > (I was going to release a new tree today, but I delayed it so I fixed > uml today too first [tested with skas and w/o skas]). > > those backout the rc7 interactivity changes (the only one that wasn't in > my tree was the add_wait_queue_exclusive, that IMHO would better stay > for scalability reasons). > > Of course I would be very interested to know if those two patches (or > Chris's one, you also retained the exclusive wakeup) are still greatly > improved by removing the _exclusive weakups and going wake-all (in > theory they shouldn't). Ok, I merged these into rc7 along with the __get_request_wait stats patch. All numbers below were on ext2...I'm calling your patches -aa, even though it's just a small part of the real -aa ;-) After a dbench 50 run, the -aa __get_request_wait latencies look like this: device 08:01: num_req 6029, total jiffies waited 213475 844 forced to wait 2 min wait, 806 max wait 252 average wait 357 < 100, 29 < 200, 110 < 300, 111 < 400, 82 < 500 155 waits longer than 500 jiffies I changed my patch to have q->nr_requests at 1024 like yours, and reran the dbench 50: device 08:01: num_req 11122, total jiffies waited 121573 8782 forced to wait 1 min wait, 237 max wait 13 average wait 8654 < 100, 126 < 200, 2 < 300, 0 < 400, 0 < 500 0 waits longer than 500 jiffies So, I had 5000 more requests for the same workload, and 8000 of my requests were forced to wait (compared to 844 of yours). But the total number of jiffies spent waiting on my patch was lower, as were the average and max waits. Increasing the number of requests with my patch make the system feel slower, even though the __get_request_wait latency numbers didn't change. On this dbench run, you got a throughput of 118mb/s and I got 90mb/s. The __get_request_wait latency numbers were reliable across runs, but I might as well have thrown a dart to pick throughput numbers. So, next tests were done with iozone. On aa after iozone -s 100M -i 0 -t 20 (20 procs each doing streaming writes to a private 100M file) device 08:01: num_req 167133, total jiffies waited 872566 6424 forced to wait 4 min wait, 507 max wait 135 average wait 2619 < 100, 2020 < 200, 1433 < 300, 325 < 400, 26 < 500 1 waits longer than 500 jiffies And the iozone throughput numbers looked like so (again -aa patches) Children see throughput for 20 initial writers = 13824.22 KB/sec Parent sees throughput for 20 initial writers = 6811.29 KB/sec Min throughput per process = 451.99 KB/sec Max throughput per process = 904.14 KB/sec Avg throughput per process = 691.21 KB/sec Min xfer = 51136.00 KB The avg throughput per process with vanilla rc7 is 3MB/s, the best I've been able to do was with nr_requests at higher levels was 1.3MB/s. With smaller of iozone threads (10 and lower so far) I can match rc7 speeds, but not with 20 procs. Anyway, my latency numbers for iozone -s 100M -i 0 -t 20: device 08:01: num_req 146049, total jiffies waited 434025 130670 forced to wait 1 min wait, 65 max wait 3 average wait 130671 < 100, 0 < 200, 0 < 300, 0 < 400, 0 < 500 0 waits longer than 500 jiffies And the iozone reported throughput: Children see throughput for 20 initial writers = 19828.92 KB/sec Parent sees throughput for 20 initial writers = 7003.36 KB/sec Min throughput per process = 526.61 KB/sec Max throughput per process = 1353.45 KB/sec Avg throughput per process = 991.45 KB/sec Min xfer = 39968.00 KB The patch I was working on today was almost the same as the one I posted yesterday, the only difference being the hunk below and changes to nr_requests (256 balanced nicely on my box, all numbers above were at 1024). This hunk against my patch yesterday just avoids an unplug in __get_request_wait if there are still available requests. A process might be waiting in __get_request_wait just because the queue was full, which has little do to with the queue needing an unplug. He'll get woken up later by get_request_wait_wakeup if nobody else manages to wake him (I think). diff -u edited/drivers/block/ll_rw_blk.c edited/drivers/block/ll_rw_blk.c --- edited/drivers/block/ll_rw_blk.c Mon Jun 9 17:13:16 2003 +++ edited/drivers/block/ll_rw_blk.c Tue Jun 10 16:46:50 2003 @@ -661,7 +661,8 @@ set_current_state(TASK_UNINTERRUPTIBLE); spin_lock_irq(&io_request_lock); if ((!waited && queue_full(q, rw)) || q->rq[rw].count == 0) { - __generic_unplug_device(q); + if (q->rq[rw].count == 0) + __generic_unplug_device(q); spin_unlock_irq(&io_request_lock); schedule(); spin_lock_irq(&io_request_lock); ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-10 23:13 ` Chris Mason @ 2003-06-11 0:16 ` Andrea Arcangeli 2003-06-11 0:44 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 0:16 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, Jun 10, 2003 at 07:13:45PM -0400, Chris Mason wrote: > On Mon, 2003-06-09 at 18:19, Andrea Arcangeli wrote: > The avg throughput per process with vanilla rc7 is 3MB/s, the best I've > been able to do was with nr_requests at higher levels was 1.3MB/s. With > smaller of iozone threads (10 and lower so far) I can match rc7 speeds, > but not with 20 procs. at least with my patches, I also made this change: -#define ELV_LINUS_SEEK_COST 16 +#define ELV_LINUS_SEEK_COST 1 #define ELEVATOR_NOOP \ ((elevator_t) { \ @@ -93,8 +93,8 @@ static inline int elevator_request_laten #define ELEVATOR_LINUS \ ((elevator_t) { \ - 2048, /* read passovers */ \ - 8192, /* write passovers */ \ + 128, /* read passovers */ \ + 512, /* write passovers */ \ \ you didn't change the I/O scheduler at all compared to mainline, so there can be quite a lot of difference in the bandwidth average per process between my patches and mainline and your patches (unless you run elvtune or unless you backed out the above). Anyways the 130671 < 100, 0 < 200, 0 < 300, 0 < 400, 0 < 500 from your patch sounds perfectly fair and that's unrelated to I/O scheduler and size of runqueue. I believe the most interesting difference is the blocking of tasks until the waitqueue is empty (i.e. clearing the waitqueue-full bit only when nobody is waiting). That is the right thing to do of course, that was a bug in my patch I merged by mistake from Nick's original patch, and that I'm going to fix immediatly of course. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 0:16 ` Andrea Arcangeli @ 2003-06-11 0:44 ` Chris Mason 0 siblings, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-11 0:44 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, 2003-06-10 at 20:16, Andrea Arcangeli wrote: > On Tue, Jun 10, 2003 at 07:13:45PM -0400, Chris Mason wrote: > > On Mon, 2003-06-09 at 18:19, Andrea Arcangeli wrote: > > The avg throughput per process with vanilla rc7 is 3MB/s, the best I've > > been able to do was with nr_requests at higher levels was 1.3MB/s. With > > smaller of iozone threads (10 and lower so far) I can match rc7 speeds, > > but not with 20 procs. > > at least with my patches, I also made this change: > > -#define ELV_LINUS_SEEK_COST 16 > +#define ELV_LINUS_SEEK_COST 1 > > #define ELEVATOR_NOOP \ > ((elevator_t) { \ > @@ -93,8 +93,8 @@ static inline int elevator_request_laten > > #define ELEVATOR_LINUS \ > ((elevator_t) { \ > - 2048, /* read passovers */ \ > - 8192, /* write passovers */ \ > + 128, /* read passovers */ \ > + 512, /* write passovers */ \ > \ > Right, I had forgotten to elvtune these in before my runs. It shouldn't change the __get_request_wait numbers, except for changes in the percentage of merged requests leading to a different number of requests overall (which my numbers did show). > you didn't change the I/O scheduler at all compared to mainline, so > there can be quite a lot of difference in the bandwidth average per > process between my patches and mainline and your patches (unless you run > elvtune or unless you backed out the above). > > Anyways the 130671 < 100, 0 < 200, 0 < 300, 0 < 400, 0 < 500 from your > patch sounds perfectly fair and that's unrelated to I/O scheduler and > size of runqueue. I believe the most interesting difference is the > blocking of tasks until the waitqueue is empty (i.e. clearing the > waitqueue-full bit only when nobody is waiting). That is the right thing > to do of course, that was a bug in my patch I merged by mistake from > Nick's original patch, and that I'm going to fix immediatly of course. Ok, Increasing q->nr_requests also changes the throughput in high merge workloads. Basically if we have 20 procs doing streaming buffered io, the buffers end up mixed together on the dirty list. So assuming we hit the hard dirty limit and all 20 procs are running write_some_buffers() the only way we'll be able to efficiently merge the end result is if we can get in 20 * 32 requests before unplugging. This is because write_some_buffers grabs 32 buffers at a time, and each caller has to wait fairly in __get_request_wait. With only 128 requests in the run queue, the disk is unplugged before any of the 20 procs has submitted each of their 32 buffers. It might make sense to change write_some_buffers to work in smaller units, 32 seems like a lot of times to wait in __get_request_wait just for an atime update. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2003-06-09 22:19 ` Andrea Arcangeli @ 2003-06-09 23:51 ` Nick Piggin 2003-06-10 0:32 ` Chris Mason 2003-06-10 1:48 ` Robert White 2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli 2 siblings, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-09 23:51 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >Ok, there are lots of different problems here, and I've spent a little >while trying to get some numbers with the __get_request_wait stats patch >I posted before. This is all on ext2, since I wanted to rule out >interactions with the journal flavors. > >Basically a dbench 90 run on ext2 rc6 vanilla kernels can generate >latencies of over 2700 jiffies in __get_request_wait, with an average >latency over 250 jiffies. > >No, most desktop workloads aren't dbench 90, but between balance_dirty() >and the way we send stuff to disk during memory allocations, just about >any process can get stuck submitting dirty buffers even if you've just >got one process doing a dd if=/dev/zero of=foo. > >So, for the moment I'm going to pretend people seeing stalls in X are >stuck in atime updates or memory allocations, or reading proc or some >other silly spot. > >For the SMP corner cases, I've merged Andrea's fix-pausing patch into >rc7, along with an altered form of Nick Piggin's queue_full patch to try >and fix the latency problems. > >The major difference from Nick's patch is that once the queue is marked >full, I don't clear the full flag until the wait queue is empty. This >means new io can't steal available requests until every existing waiter >has been granted a request. > Yes, this is probably a good idea. > >The latency results are better, with average time spent in >__get_request_wait being around 28 jiffies, and a max of 170 jiffies. >The cost is throughput, further benchmarking needs to be done but, but I >wanted to get this out for review and testing. It should at least help >us decide if the request allocation code really is causing our problems. > Well the latency numbers are good - is this with dbench 90? snip > >+static inline void set_queue_full(request_queue_t *q, int rw) >+{ >+ wmb(); >+ if (rw == READ) >+ q->read_full = 1; >+ else >+ q->write_full = 1; >+} >+ >+static inline void clear_queue_full(request_queue_t *q, int rw) >+{ >+ wmb(); >+ if (rw == READ) >+ q->read_full = 0; >+ else >+ q->write_full = 0; >+} >+ >+static inline int queue_full(request_queue_t *q, int rw) >+{ >+ rmb(); >+ if (rw == READ) >+ return q->read_full; >+ else >+ return q->write_full; >+} >+ > I don't think you need the barriers here, do you? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin @ 2003-06-10 0:32 ` Chris Mason 2003-06-10 0:47 ` Nick Piggin 2003-06-10 1:48 ` Robert White 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-10 0:32 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Mon, 2003-06-09 at 19:51, Nick Piggin wrote: > > > >The latency results are better, with average time spent in > >__get_request_wait being around 28 jiffies, and a max of 170 jiffies. > >The cost is throughput, further benchmarking needs to be done but, but I > >wanted to get this out for review and testing. It should at least help > >us decide if the request allocation code really is causing our problems. > > > > Well the latency numbers are good - is this with dbench 90? > Yes, that number was dbench 90, but dbench 50,90, and 120 gave about the same stats with the final patch. > snip > >+ > >+static inline int queue_full(request_queue_t *q, int rw) > >+{ > >+ rmb(); > >+ if (rw == READ) > >+ return q->read_full; > >+ else > >+ return q->write_full; > >+} > >+ > > > > I don't think you need the barriers here, do you? > I put the barriers in early on when almost all the calls were done outside spin locks, the current flavor of the patch only does one clear_queue_full without the io_request_lock held. It should be enough to toss a barrier in just that one spot. But I wanted to leave them in so I could move things around until the final version (if there ever is one ;-) -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-10 0:32 ` Chris Mason @ 2003-06-10 0:47 ` Nick Piggin 0 siblings, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-10 0:47 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Mon, 2003-06-09 at 19:51, Nick Piggin wrote: > > >>>The latency results are better, with average time spent in >>>__get_request_wait being around 28 jiffies, and a max of 170 jiffies. >>>The cost is throughput, further benchmarking needs to be done but, but I >>>wanted to get this out for review and testing. It should at least help >>>us decide if the request allocation code really is causing our problems. >>> >>> >>Well the latency numbers are good - is this with dbench 90? >> >> > >Yes, that number was dbench 90, but dbench 50,90, and 120 gave about the >same stats with the final patch. > Great. > >>snip >> > >>>+ >>>+static inline int queue_full(request_queue_t *q, int rw) >>>+{ >>>+ rmb(); >>>+ if (rw == READ) >>>+ return q->read_full; >>>+ else >>>+ return q->write_full; >>>+} >>>+ >>> >>> >>I don't think you need the barriers here, do you? >> >> > >I put the barriers in early on when almost all the calls were done >outside spin locks, the current flavor of the patch only does one >clear_queue_full without the io_request_lock held. It should be enough >to toss a barrier in just that one spot. But I wanted to leave them in >so I could move things around until the final version (if there ever is >one ;-) > Yeah I see. ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH] io stalls 2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin 2003-06-10 0:32 ` Chris Mason @ 2003-06-10 1:48 ` Robert White 2003-06-10 2:13 ` Chris Mason 2003-06-10 3:22 ` Nick Piggin 1 sibling, 2 replies; 114+ messages in thread From: Robert White @ 2003-06-10 1:48 UTC (permalink / raw) To: Nick Piggin, Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Nick Piggin > Chris Mason wrote: > >The major difference from Nick's patch is that once the queue is marked > >full, I don't clear the full flag until the wait queue is empty. This > >means new io can't steal available requests until every existing waiter > >has been granted a request. > Yes, this is probably a good idea. Err... wouldn't this subvert the spirit, if not the warrant, of real time scheduling and time-critical applications? After all we *do* want to all-but-starve lower priority tasks of IO in the presence of higher priority tasks. A select few applications absolutely need to be pampered (think ProTools audio mixing suite on the Mac etc.) and any solution that doesn't take this into account will have to be re-done by the people who want to bring these kinds of tasks to Linux. I am not most familiar with this body of code, but wouldn't the people trying to do audio sampling and gaming get really frosted if they had to wait for a list of lower priority IO events to completely drain before they could get back to work? It would certainly produce really bad encoding of live data streams (etc). >From a purely queue-theory stand point, I'm not even sure why this queue can become "full". Shouldn't the bounding case come about primarily by lack of resources (can't allocate a queue entry or a data block) out where the users can see and cope with the problem before all the expensive blocking and waiting. Still from a pure-theory standpoint, it would be "better" to make the wait queues priority queues and leave their sizes unbounded. In practice it is expensive to maintain a fully "proper" priority queue for a queue of non-trivial size. Then again, IO isn't cheap over the domain of time anyway. The solution proposed, by limiting the queue size sort-of turns the scheduler's wakeup behavior into that priority queue sorting mechanism. That in turn would (it seems to me) lead to some degenerate behaviors just outside the zone of midline stability. In short several very-high-priority tasks could completely starve out the system if they can consistently submit enough request to fill the queue. [That is: consider a bunch of tasks sleeping in the scheduler because they are waiting for the queue to empty. When they are all woken up, they will actually be scheduled in priority order. So higher priority tasks get first crack at the "empty" queue. If there are "enough" such tasks (which are IO bound on this device) they will keep getting serviced, and then keep going back to sleep on the full queue. (And god help you if they are runaways 8-). The high priority tasks constantly butt in line (because the scheduler is now the keeper of the IO queue) and the lower priority tasks could wait forever.] {please note; I write some fairly massively-threaded applications, it would only take one such application running at a high priority to produce "a substantial number" of high priority processes submitting IO requests, so the scenario, while not common, is potentially real.} (so just off the top of my head...) I would think that the best theoretical solution would be a priority heap. (ignoring heap storage requirements for a moment) you keep the highest priority items in the front of the heap and any time a heap reorg passes a node by you jack that nodes priority by one. For an extremely busy queue nothing is starved, but the incline remains high enough to make sure that the truly desperate priorities (of which there should be few in a real world system) will "never" wait behind some dd(1) of vanishingly close to no import. Clearly doing a full heap with only pointers is ugly almost beyond comprehension, and doing a heap in an array would tend to be impractical for a large list under variable conditions. A red-black tree gets too expensive if you use them that many times throughout a system. (and so on) While several possible sort-of-heapish or sort-of-priority-queueish data str uctures come to mind, I don't have a replacement concept that I can really promote just now... I would say that at a MINIMUM there needs to be some threshold of priority for requests that get to go on a "full list" no matter what. There really "ought to be" a way for requests from higher priority tasks to get closer to the front of the list. There "should be" a priority floor where tasks with lower priorities get their requests queued up with the current first-come-first-served mentality (as we don't need to spend a lot of time thinking about things that have been nice(d) into the noise floor). And then there should be a promotion mechanism to prevent complete starvation. Anything simpler and it is safer from a system stability standpoint to keep with the current high-latency-on-occasion simple queue solution. Rob. ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH] io stalls 2003-06-10 1:48 ` Robert White @ 2003-06-10 2:13 ` Chris Mason 2003-06-10 23:04 ` Robert White 2003-06-10 3:22 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-10 2:13 UTC (permalink / raw) To: Robert White Cc: Nick Piggin, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Mon, 2003-06-09 at 21:48, Robert White wrote: > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Nick Piggin > > > Chris Mason wrote: > > > >The major difference from Nick's patch is that once the queue is marked > > >full, I don't clear the full flag until the wait queue is empty. This > > >means new io can't steal available requests until every existing waiter > > >has been granted a request. > > > Yes, this is probably a good idea. > > > Err... wouldn't this subvert the spirit, if not the warrant, of real time > scheduling and time-critical applications? > [ lots of interesting points ] Heh, I didn't really make my goals for the patch clear. They go: 1) quantify the stalls people are seeing with real numbers so we can point at a section of code causing bad performance. 2) Provide a somewhat obvious patch that makes the current __get_request_wait call significantly more fair, in hopes of either blaming it for the stalls or removing it from the list of candidates 3) fix the stalls Most of your suggestions are 2.5 discussion material, where real experimental work is going on. The 2.4 io request wait queue isn't working on priorities, the current one tries to be fair to everyone and provide good throughput to everyone at the same time. It's failing on at least one of those, and until we can fix that I don't even want to think about more complex issues. Current users of the vanilla 2.4 tree will hopefully benefit from a lower latency io request wait queue. The next best thing to real time is a consistently small wait, which is what my patch is trying for. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH] io stalls 2003-06-10 2:13 ` Chris Mason @ 2003-06-10 23:04 ` Robert White 2003-06-11 0:58 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Robert White @ 2003-06-10 23:04 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller From: Chris Mason [mailto:mason@suse.com] Sent: Monday, June 09, 2003 7:13 PM > 2) Provide a somewhat obvious patch that makes the current > __get_request_wait call significantly more fair, in hopes of either > blaming it for the stalls or removing it from the list of candidates Without the a_w_q_exclusive() on add_wait_queue the FIFO effect is lost when all the members of the wait queue compete for their timeslice in the scheduler. For all intents and purposes the fairness goes up some (you stop having the one guy sorted to the un-happy end of the disk) but low priority tasks will still always end up stalled on the dirty end of the stick. Basically each new round at the queue-empty moment is a mob rush for the door. With the a_w_q_exclusive(), you get past fair and well into anti-optimal. Your FIFO becomes essentially mandatory with no regard for anything but the order things hit the wait queue. (Particularly on an SMP machine, however) "new requestors" may/will jump to the head of the line because they were never *in* the wait queue. So you have only achieved "fairness" with respect to requests that come in to a io queue that was full-at-the-time of the initial entry into the driver. This becomes exactly like the experience of waiting patiently on line to get off the highway and watching all the rude people driving by you only to cut over and nose into the queue just at the exit sign. So you need the _exclusive if you want any kind of predictable fairness (without getting into anything obscure) but it is still only "fair" for those that were unfortunate enough to end up on the wait queue originally. There is a small window for tasks to butt in freely. > 3) fix the stalls Without the _exclusive() you can't have fixed the stalls, you can only have moved the locus-of-blame to the scheduler which may (or may not) have some way to compensate and "fake fairness" built in by coincidence. The thing I suggest in my other email, where you use the non-exclusive version of the routine but temporarily bump the process priority each time a request gets foisted off on the wait_queue instead of the IO queue, actually has semantic fairness built in. This basically builds a fairness elevator that functions both over time-in-queue and original process priority (when built into your basic patch). It's also quite space/time efficient and fairly clear to reader and implementer alike. Rob. ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH] io stalls 2003-06-10 23:04 ` Robert White @ 2003-06-11 0:58 ` Chris Mason 0 siblings, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-11 0:58 UTC (permalink / raw) To: Robert White Cc: Nick Piggin, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, 2003-06-10 at 19:04, Robert White wrote: > From: Chris Mason [mailto:mason@suse.com] > Sent: Monday, June 09, 2003 7:13 PM > > > 2) Provide a somewhat obvious patch that makes the current > > __get_request_wait call significantly more fair, in hopes of either > > blaming it for the stalls or removing it from the list of candidates > > Without the a_w_q_exclusive() on add_wait_queue the FIFO effect is lost when > all the members of the wait queue compete for their timeslice in the > scheduler. For all intents and purposes the fairness goes up some (you stop > having the one guy sorted to the un-happy end of the disk) but low priority > tasks will still always end up stalled on the dirty end of the stick. > Basically each new round at the queue-empty moment is a mob rush for the > door. > > With the a_w_q_exclusive(), you get past fair and well into anti-optimal. > Your FIFO becomes essentially mandatory with no regard for anything but the > order things hit the wait queue. (Particularly on an SMP machine, however) > "new requestors" may/will jump to the head of the line because they were > never *in* the wait queue. The patches flying around force new io into the wait queue any time someone else is already waiting, nobody is allowed to jump to the head of the line. The rest of your ideas are interesting, we just can't smush them into 2.4. Please consider doing some experiments on the 2.5 io schedulers and making suggestions, it's a critical area. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-10 1:48 ` Robert White 2003-06-10 2:13 ` Chris Mason @ 2003-06-10 3:22 ` Nick Piggin 2003-06-10 21:17 ` Robert White 1 sibling, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-10 3:22 UTC (permalink / raw) To: Robert White Cc: Chris Mason, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Robert White wrote: >From: linux-kernel-owner@vger.kernel.org >[mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Nick Piggin > > >>Chris Mason wrote: >> > >>>The major difference from Nick's patch is that once the queue is marked >>>full, I don't clear the full flag until the wait queue is empty. This >>>means new io can't steal available requests until every existing waiter >>>has been granted a request. >>> > >>Yes, this is probably a good idea. >> > > >Err... wouldn't this subvert the spirit, if not the warrant, of real time >scheduling and time-critical applications? > No, my patch (plus Chris' modification) change request allocation from an overloaded queue from semi random (timing dependant mixture of LIFO and FIFO), to FIFO. As Chris has shown, can cause a task to be starved for 2.7s (and theoretically infinite) when it should be woken in < 200ms under similar situations with the FIFO scheme. > >After all we *do* want to all-but-starve lower priority tasks of IO in the >presence of higher priority tasks. A select few applications absolutely >need to be pampered (think ProTools audio mixing suite on the Mac etc.) and >any solution that doesn't take this into account will have to be re-done by >the people who want to bring these kinds of tasks to Linux. > >I am not most familiar with this body of code, but wouldn't the people >trying to do audio sampling and gaming get really frosted if they had to >wait for a list of lower priority IO events to completely drain before they >could get back to work? It would certainly produce really bad encoding of >live data streams (etc). > > Actually, there is no priority other than time (ie. FIFO), and seek distance in the IO subsystem. I guess this is why your arguments fall down ;) >>From a purely queue-theory stand point, I'm not even sure why this queue can >become "full". Shouldn't the bounding case come about primarily by lack of >resources (can't allocate a queue entry or a data block) out where the users >can see and cope with the problem before all the expensive blocking and >waiting. > In practice, the problems of having a memory size limited queue outweigh the benefits. > >Still from a pure-theory standpoint, it would be "better" to make the wait >queues priority queues and leave their sizes unbounded. > >In practice it is expensive to maintain a fully "proper" priority queue for >a queue of non-trivial size. Then again, IO isn't cheap over the domain of >time anyway. > If IO priorities were implemented, you still have the problem of starvation. It would be better to simply have a per process limit on request allocation, and implement the priority scheduling in the io scheduler. I think you would find that most processes do just fine with just a couple of requests each, though. > > >The solution proposed, by limiting the queue size sort-of turns the >scheduler's wakeup behavior into that priority queue sorting mechanism. >That in turn would (it seems to me) lead to some degenerate behaviors just >outside the zone of midline stability. In short several very-high-priority >tasks could completely starve out the system if they can consistently submit >enough request to fill the queue. > >[That is: consider a bunch of tasks sleeping in the scheduler because they >are waiting for the queue to empty. When they are all woken up, they will >actually be scheduled in priority order. So higher priority tasks get first >crack at the "empty" queue. If there are "enough" such tasks (which are IO >bound on this device) they will keep getting serviced, and then keep going >back to sleep on the full queue. (And god help you if they are runaways >8-). The high priority tasks constantly butt in line (because the scheduler >is now the keeper of the IO queue) and the lower priority tasks could wait >forever.] > No, they will be woken up one at a time as requests become freed, and in FIFO order. It might be possible for a higher (CPU) priority task to be woken up before the previous has a chance to run, but this scheme is no worse than before (the solution here is per process request limits, but this is 2.4). ^ permalink raw reply [flat|nested] 114+ messages in thread
* RE: [PATCH] io stalls 2003-06-10 3:22 ` Nick Piggin @ 2003-06-10 21:17 ` Robert White 2003-06-11 0:40 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Robert White @ 2003-06-10 21:17 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller From: Nick Piggin [mailto:piggin@cyberone.com.au] Sent: Monday, June 09, 2003 8:23 PM > > Actually, there is no priority other than time (ie. FIFO), and > seek distance in the IO subsystem. I guess this is why your > arguments fall down ;) I'll buy that for the most part, though one of the differences I read elsewhere in the thread was the choice between add_wait_queue() and add_wait_queue_exclusive(). You will, however, note that one of the factors that is playing in this patch is process priority. (If I understand correctly) The wait queue in question becomes your FIFOing agent, it is kind of a pre-queue on the actual IO queue, once you reach a "full" condition. In the later case [add_wait_queue_exclusive()] you are strictly FIFO over the set of processes, where the moment-of-order is determined by insertion into the wait queue. In the former case [add_wait_queue()] when the queue is woken up all the waiters will be marked executable on the scheduler, and the scheduler will then (at least tend to) sort the submissions into task priority order. So the higher priority tasks will get to butt into line. Worse, the FIFO is essentially lost to the vagaries of the scheduler so without the _exclusive you have no FIFO at all. I think that is the reason that Chris was saying the add_wait_queue_exclusive() mode "does seem to scale much better." So your "original new" batching agent is really order-of-arrival that becomes anti-sorted by process priority. Which can lead to scheduler induced starvation (and the observed "improvements" by using the strict FIFO created by a_w_q_exclusive). The problem is that you get a little communist about the FIFO-ness when you use a_w_q_exclusive() and that can *SERIOUSLY* harm a task that must approach real-time behavior. One solution would be to stick with the add_wait_queue() process-priority influenced never-really-FIFO, but every time a process/task wakes up, and it then doesn't get its request onto the queue, add a small fixed increment to its priority before going back into the wait. This gives you both the process-priority mechanism and a fairness metric. Something like (in pure pseudo-code since I don't have my references here): int priority_delta = 0 while (try_enqueing_io_request() == queue_full) { if (current->priority < priority_max) { current->priority += priority_increment; priority_delta += priority_increment; } wait_on_queue() } current->priority -= priority_delta; (and still, of course, only wake the wait queue when the "full" queue reaches empty.) What that gets you is democratic entry into the io request queue when it is non-full. It gets you seniority-based (plutocratic?) access to the io queue as your request "ages" in the full pool. If the pool gets so large that all the requests are making their tasks reach priority_max then you "degrade" to the fairness of the scheduler, which is an arbitrary but workable metric. You get all that, but you preserve (or invent) a relationship that lets the task priority automagically factor in "for free" so that relative starvation (which is a good thing for deliberately asymmetric task priorities, and matches user expectations) can be achieved without ever having absolute starvation. Further if priority_max isn't priority_system_max you get the real-time-trumps-all behavior that something like a live audio stream encoder may need (for any priority >= priority_max). Rob. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-10 21:17 ` Robert White @ 2003-06-11 0:40 ` Nick Piggin 0 siblings, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-11 0:40 UTC (permalink / raw) To: Robert White Cc: Chris Mason, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Robert White wrote: >From: Nick Piggin [mailto:piggin@cyberone.com.au] >Sent: Monday, June 09, 2003 8:23 PM > >>Actually, there is no priority other than time (ie. FIFO), and >>seek distance in the IO subsystem. I guess this is why your >>arguments fall down ;) >> > >I'll buy that for the most part, though one of the differences I read >elsewhere in the thread was the choice between add_wait_queue() and >add_wait_queue_exclusive(). You will, however, note that one of the factors >that is playing in this patch is process priority. > >(If I understand correctly) The wait queue in question becomes your FIFOing >agent, it is kind of a pre-queue on the actual IO queue, once you reach a >"full" condition. > Right. > >In the later case [add_wait_queue_exclusive()] you are strictly FIFO over >the set of processes, where the moment-of-order is determined by insertion >into the wait queue. > >In the former case [add_wait_queue()] when the queue is woken up all the >waiters will be marked executable on the scheduler, and the scheduler will >then (at least tend to) sort the submissions into task priority order. So >the higher priority tasks will get to butt into line. Worse, the FIFO is >essentially lost to the vagaries of the scheduler so without the _exclusive >you have no FIFO at all. > >I think that is the reason that Chris was saying the >add_wait_queue_exclusive() mode "does seem to scale much better." > Yep > >So your "original new" batching agent is really order-of-arrival that >becomes anti-sorted by process priority. Which can lead to scheduler >induced starvation (and the observed "improvements" by using the strict FIFO >created by a_w_q_exclusive). The problem is that you get a little communist >about the FIFO-ness when you use a_w_q_exclusive() and that can *SERIOUSLY* >harm a task that must approach real-time behavior. > I think it had better be FIFO for now. If its not, you're making the worst case latency worse. It requires a lot of careful testing to get something like that working right. You have some good ideas, and quite possibly they would be worth implementing, but the behaviour of the code is quite complex, especially when you take into account its affect on the io scheduler. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2003-06-09 22:19 ` Andrea Arcangeli 2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin @ 2003-06-11 0:33 ` Andrea Arcangeli 2003-06-11 0:48 ` [PATCH] io stalls Nick Piggin 2003-06-11 0:54 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2 siblings, 2 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 0:33 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > + if (!waitqueue_active(&q->wait_for_requests[rw])) > + clear_queue_full(q, rw); you've an smp race above, the smp safe implementation is this: if (!waitqueue_active(&q->wait_for_requests[rw])) { clear_queue_full(q, rw); mb(); if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) wake_up(&q->wait_for_requests[rw]); } I'm also unsure what the "waited" logic does, it doesn't seem necessary. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli @ 2003-06-11 0:48 ` Nick Piggin 2003-06-11 1:07 ` Andrea Arcangeli 2003-06-11 0:54 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 1 sibling, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-11 0:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > >>+ if (!waitqueue_active(&q->wait_for_requests[rw])) >>+ clear_queue_full(q, rw); >> > >you've an smp race above, the smp safe implementation is this: > > if (!waitqueue_active(&q->wait_for_requests[rw])) { > clear_queue_full(q, rw); > mb(); > if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) > wake_up(&q->wait_for_requests[rw]); > } > >I'm also unsure what the "waited" logic does, it doesn't seem necessary. > When a task is woken up, it is quite likely that the queue is still marked full. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-11 0:48 ` [PATCH] io stalls Nick Piggin @ 2003-06-11 1:07 ` Andrea Arcangeli 0 siblings, 0 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 1:07 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 11, 2003 at 10:48:23AM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > > > >>+ if (!waitqueue_active(&q->wait_for_requests[rw])) > >>+ clear_queue_full(q, rw); > >> > > > >you've an smp race above, the smp safe implementation is this: > > > > if (!waitqueue_active(&q->wait_for_requests[rw])) { > > clear_queue_full(q, rw); > > mb(); > > if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) > > wake_up(&q->wait_for_requests[rw]); > > } > > > >I'm also unsure what the "waited" logic does, it doesn't seem necessary. > > > > When a task is woken up, it is quite likely that the > queue is still marked full. but we don't care if it's marked full, see __get_request. If we cared about full it would deadlock anyways (no matter the waited logic) Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli 2003-06-11 0:48 ` [PATCH] io stalls Nick Piggin @ 2003-06-11 0:54 ` Chris Mason 2003-06-11 1:06 ` Andrea Arcangeli 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-11 0:54 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, 2003-06-10 at 20:33, Andrea Arcangeli wrote: > On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > > + if (!waitqueue_active(&q->wait_for_requests[rw])) > > + clear_queue_full(q, rw); > > you've an smp race above, the smp safe implementation is this: > clear_queue_full has a wmb() in my patch, and queue_full has a rmb(), I thought that covered these cases? I'd rather remove those though, since the spot you point out is the only place done outside the io_request_lock. > if (!waitqueue_active(&q->wait_for_requests[rw])) { > clear_queue_full(q, rw); > mb(); > if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) > wake_up(&q->wait_for_requests[rw]); > } > I don't think we need the extra wake_up (this is in __get_request_wait, right?), since it gets done by get_request_wait_wakeup() > I'm also unsure what the "waited" logic does, it doesn't seem necessary. Once a process waits once, they are allowed to ignore the q->full flag. This way existing waiters can make progress even when q->full is set. Without the waited check, q->full will never get cleared because the last writer wouldn't proceed until the last writer was gone. I had to make __get_request for the same reason. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 0:54 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason @ 2003-06-11 1:06 ` Andrea Arcangeli 2003-06-11 1:57 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 1:06 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, Jun 10, 2003 at 08:54:00PM -0400, Chris Mason wrote: > On Tue, 2003-06-10 at 20:33, Andrea Arcangeli wrote: > > On Mon, Jun 09, 2003 at 05:39:23PM -0400, Chris Mason wrote: > > > + if (!waitqueue_active(&q->wait_for_requests[rw])) > > > + clear_queue_full(q, rw); > > > > you've an smp race above, the smp safe implementation is this: > > > > clear_queue_full has a wmb() in my patch, and queue_full has a rmb(), I > thought that covered these cases? I'd rather remove those though, since > the spot you point out is the only place done outside the > io_request_lock. > > > if (!waitqueue_active(&q->wait_for_requests[rw])) { > > clear_queue_full(q, rw); > > mb(); > > if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) > > wake_up(&q->wait_for_requests[rw]); > > } > > > I don't think we need the extra wake_up (this is in __get_request_wait, > right?), since it gets done by get_request_wait_wakeup() there's no get_request_wait_wakeup in blkdev_release_request. I put the construct in both places though (i've the clear_queue_full explicit as q->full = 0). And I don't think any of your barriers is needed at all, I mean, we only need to be careful to clear it right, we don't need to be careful to set or read it right when it transits from 0 to 1. And the above seems enough to me to get right the clearing. > > I'm also unsure what the "waited" logic does, it doesn't seem necessary. > > Once a process waits once, they are allowed to ignore the q->full flag. > This way existing waiters can make progress even when q->full is set. > Without the waited check, q->full will never get cleared because the > last writer wouldn't proceed until the last writer was gone. I had to > make __get_request for the same reason. __get_request makes perfect sense of course and it's needed, this is not the issue, my point about the waited check is that the last writer has to get the wakeup (and the wakeup has nothing to do with the waited check since waited == 0), and after the wakeup it will get the request and it won't re-run the loop, so I don't see why waited is needed. Furthmore even if for whatever reason it doesn't get the request, it will re-set full to 1 and it'll be still the first to get the wakeup, and it will definitely get another wakeup if none request was available. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 1:06 ` Andrea Arcangeli @ 2003-06-11 1:57 ` Chris Mason 2003-06-11 2:10 ` Andrea Arcangeli 0 siblings, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-11 1:57 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, 2003-06-10 at 21:06, Andrea Arcangeli wrote: > And I don't think any of your barriers is needed at all, I mean, we only > need to be careful to clear it right, we don't need to be careful to set > or read it right when it transits from 0 to 1. And the above seems > enough to me to get right the clearing. > The current form of the patch has way too many barriers. When I first added them the patch was really different, I left them in because it seems to be easier to remember to rip them out than add them back ;-) > > > I'm also unsure what the "waited" logic does, it doesn't seem necessary. > > > > Once a process waits once, they are allowed to ignore the q->full flag. > > This way existing waiters can make progress even when q->full is set. > > Without the waited check, q->full will never get cleared because the > > last writer wouldn't proceed until the last writer was gone. I had to > > make __get_request for the same reason. > > __get_request makes perfect sense of course and it's needed, this is not > the issue, my point about the waited check is that the last writer has > to get the wakeup (and the wakeup has nothing to do with the waited > check since waited == 0), and after the wakeup it will get the request > and it won't re-run the loop, so I don't see why waited is needed. > Furthmore even if for whatever reason it doesn't get the request, it > will re-set full to 1 and it'll be still the first to get the wakeup, > and it will definitely get another wakeup if none request was available. Ok, I see your point, we don't strictly need the waited check. I had added it as an optimization at first, so that those who waited once were not penalized by further queue_full checks. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 1:57 ` Chris Mason @ 2003-06-11 2:10 ` Andrea Arcangeli 2003-06-11 12:24 ` Chris Mason 2003-06-11 17:42 ` Chris Mason 0 siblings, 2 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 2:10 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, Jun 10, 2003 at 09:57:11PM -0400, Chris Mason wrote: > Ok, I see your point, we don't strictly need the waited check. I had > added it as an optimization at first, so that those who waited once were > not penalized by further queue_full checks. I could taste the feeling of not penalizing while reading the code but that's just a feeling, in reality if they blocked it means they set full by themself and there was no request so they want to go to sleep no matter ->full or not ;) Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 2:10 ` Andrea Arcangeli @ 2003-06-11 12:24 ` Chris Mason 2003-06-11 17:42 ` Chris Mason 1 sibling, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-11 12:24 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Tue, 2003-06-10 at 22:10, Andrea Arcangeli wrote: > On Tue, Jun 10, 2003 at 09:57:11PM -0400, Chris Mason wrote: > > Ok, I see your point, we don't strictly need the waited check. I had > > added it as an optimization at first, so that those who waited once were > > not penalized by further queue_full checks. > > I could taste the feeling of not penalizing while reading the code but > that's just a feeling, in reality if they blocked it means they set full > by themself and there was no request so they want to go to sleep no > matter ->full or not ;) You're completely right, as the patch changed I didn't realize waited wasn't needed anymore ;-) Are you adding the hunk from yesterday to avoid unplugs when q->rq.count != 0? -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 2:10 ` Andrea Arcangeli 2003-06-11 12:24 ` Chris Mason @ 2003-06-11 17:42 ` Chris Mason 2003-06-11 18:12 ` Andrea Arcangeli 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-11 17:42 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 493 bytes --] Ok here's an updated patch, it changes the barriers around, updates comments, and gets rid of the waited check in __get_request_wait. It is still a combined patch with fix_pausing, queue_full and latency stats, mostly because I want to make really sure any testers are using all three. So, if someone who saw io stalls in 2.4.21-rc could give this a try, I'd be grateful. If you still see stalls with this applied, run elvtune /dev/xxx and send along the resulting console output. -chris [-- Attachment #2: io-stalls-5.diff --] [-- Type: text/plain, Size: 15306 bytes --] --- 1.9/drivers/block/blkpg.c Sat Mar 30 06:58:05 2002 +++ edited/drivers/block/blkpg.c Tue Jun 10 14:49:27 2003 @@ -261,6 +261,7 @@ return blkpg_ioctl(dev, (struct blkpg_ioctl_arg *) arg); case BLKELVGET: + blk_print_stats(dev); return blkelvget_ioctl(&blk_get_queue(dev)->elevator, (blkelv_ioctl_arg_t *) arg); case BLKELVSET: --- 1.45/drivers/block/ll_rw_blk.c Wed May 28 03:50:02 2003 +++ edited/drivers/block/ll_rw_blk.c Wed Jun 11 13:36:10 2003 @@ -429,6 +429,8 @@ q->rq[READ].count = 0; q->rq[WRITE].count = 0; q->nr_requests = 0; + q->read_full = 0; + q->write_full = 0; si_meminfo(&si); megs = si.totalram >> (20 - PAGE_SHIFT); @@ -442,6 +444,56 @@ spin_lock_init(&q->queue_lock); } +void blk_print_stats(kdev_t dev) +{ + request_queue_t *q; + unsigned long avg_wait; + unsigned long min_wait; + unsigned long high_wait; + unsigned long *d; + + q = blk_get_queue(dev); + if (!q) + return; + + min_wait = q->min_wait; + if (min_wait == ~0UL) + min_wait = 0; + if (q->num_wait) + avg_wait = q->total_wait / q->num_wait; + else + avg_wait = 0; + printk("device %s: num_req %lu, total jiffies waited %lu\n", + kdevname(dev), q->num_req, q->total_wait); + printk("\t%lu forced to wait\n", q->num_wait); + printk("\t%lu min wait, %lu max wait\n", min_wait, q->max_wait); + printk("\t%lu average wait\n", avg_wait); + d = q->deviation; + printk("\t%lu < 100, %lu < 200, %lu < 300, %lu < 400, %lu < 500\n", + d[0], d[1], d[2], d[3], d[4]); + high_wait = d[0] + d[1] + d[2] + d[3] + d[4]; + high_wait = q->num_wait - high_wait; + printk("\t%lu waits longer than 500 jiffies\n", high_wait); +} + +static void reset_stats(request_queue_t *q) +{ + q->max_wait = 0; + q->min_wait = ~0UL; + q->total_wait = 0; + q->num_req = 0; + q->num_wait = 0; + memset(q->deviation, 0, sizeof(q->deviation)); +} +void blk_reset_stats(kdev_t dev) +{ + request_queue_t *q; + q = blk_get_queue(dev); + if (!q) + return; + printk("reset latency stats on device %s\n", kdevname(dev)); + reset_stats(q); +} static int __make_request(request_queue_t * q, int rw, struct buffer_head * bh); /** @@ -491,6 +543,9 @@ q->plug_tq.routine = &generic_unplug_device; q->plug_tq.data = q; q->plugged = 0; + + reset_stats(q); + /* * These booleans describe the queue properties. We set the * default (and most common) values here. Other drivers can @@ -508,7 +563,7 @@ * Get a free request. io_request_lock must be held and interrupts * disabled on the way in. Returns NULL if there are no free requests. */ -static struct request *get_request(request_queue_t *q, int rw) +static struct request *__get_request(request_queue_t *q, int rw) { struct request *rq = NULL; struct request_list *rl = q->rq + rw; @@ -521,35 +576,48 @@ rq->cmd = rw; rq->special = NULL; rq->q = q; - } + } else + set_queue_full(q, rw); return rq; } /* - * Here's the request allocation design: + * get a free request, honoring the queue_full condition + */ +static inline struct request *get_request(request_queue_t *q, int rw) +{ + if (queue_full(q, rw)) + return NULL; + return __get_request(q, rw); +} + +/* + * helper func to do memory barriers and wakeups when we finally decide + * to clear the queue full condition + */ +static inline void clear_full_and_wake(request_queue_t *q, int rw) +{ + clear_queue_full(q, rw); + mb(); + if (unlikely(waitqueue_active(&q->wait_for_requests[rw]))) + wake_up(&q->wait_for_requests[rw]); +} + +/* + * Here's the request allocation design, low latency version: * * 1: Blocking on request exhaustion is a key part of I/O throttling. * * 2: We want to be `fair' to all requesters. We must avoid starvation, and * attempt to ensure that all requesters sleep for a similar duration. Hence * no stealing requests when there are other processes waiting. - * - * 3: We also wish to support `batching' of requests. So when a process is - * woken, we want to allow it to allocate a decent number of requests - * before it blocks again, so they can be nicely merged (this only really - * matters if the process happens to be adding requests near the head of - * the queue). - * - * 4: We want to avoid scheduling storms. This isn't really important, because - * the system will be I/O bound anyway. But it's easy. - * - * There is tension between requirements 2 and 3. Once a task has woken, - * we don't want to allow it to sleep as soon as it takes its second request. - * But we don't want currently-running tasks to steal all the requests - * from the sleepers. We handle this with wakeup hysteresis around - * 0 .. batch_requests and with the assumption that request taking is much, - * much faster than request freeing. + * + * There used to be more here, attempting to allow a process to send in a + * number of requests once it has woken up. But, there's no way to + * tell if a process has just been woken up, or if it is a new process + * coming in to steal requests from the waiters. So, we give up and force + * everyone to wait fairly. * * So here's what we do: * @@ -561,50 +629,78 @@ * * When a process wants a new request: * - * b) If free_requests == 0, the requester sleeps in FIFO manner. - * - * b) If 0 < free_requests < batch_requests and there are waiters, - * we still take a request non-blockingly. This provides batching. - * - * c) If free_requests >= batch_requests, the caller is immediately - * granted a new request. + * b) If free_requests == 0, the requester sleeps in FIFO manner, and + * the queue full condition is set. The full condition is not + * cleared until there are no longer any waiters. Once the full + * condition is set, all new io must wait, hopefully for a very + * short period of time. * * When a request is released: * - * d) If free_requests < batch_requests, do nothing. - * - * f) If free_requests >= batch_requests, wake up a single waiter. - * - * The net effect is that when a process is woken at the batch_requests level, - * it will be able to take approximately (batch_requests) requests before - * blocking again (at the tail of the queue). + * c) If free_requests < batch_requests, do nothing. * - * This all assumes that the rate of taking requests is much, much higher - * than the rate of releasing them. Which is very true. + * d) If free_requests >= batch_requests, wake up a single waiter. * - * -akpm, Feb 2002. + * As each waiter gets a request, he wakes another waiter. We do this + * to prevent a race where an unplug might get run before a request makes + * it's way onto the queue. The result is a cascade of wakeups, so delaying + * the initial wakeup until we've got batch_requests available helps avoid + * wakeups where there aren't any requests available yet. */ static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; + unsigned long wait_start = jiffies; + unsigned long time_waited; DECLARE_WAITQUEUE(wait, current); - add_wait_queue(&q->wait_for_requests[rw], &wait); + add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + do { set_current_state(TASK_UNINTERRUPTIBLE); - generic_unplug_device(q); - if (q->rq[rw].count == 0) - schedule(); spin_lock_irq(&io_request_lock); - rq = get_request(q, rw); + if (queue_full(q, rw) || q->rq[rw].count == 0) { + if (q->rq[rw].count == 0) + __generic_unplug_device(q); + spin_unlock_irq(&io_request_lock); + schedule(); + spin_lock_irq(&io_request_lock); + } + rq = __get_request(q, rw); spin_unlock_irq(&io_request_lock); } while (rq == NULL); remove_wait_queue(&q->wait_for_requests[rw], &wait); current->state = TASK_RUNNING; + + if (!waitqueue_active(&q->wait_for_requests[rw])) + clear_full_and_wake(q, rw); + + time_waited = jiffies - wait_start; + if (time_waited > q->max_wait) + q->max_wait = time_waited; + if (time_waited && time_waited < q->min_wait) + q->min_wait = time_waited; + q->total_wait += time_waited; + q->num_wait++; + if (time_waited < 500) { + q->deviation[time_waited/100]++; + } + return rq; } +static void get_request_wait_wakeup(request_queue_t *q, int rw) +{ + /* + * avoid losing an unplug if a second __get_request_wait did the + * generic_unplug_device while our __get_request_wait was running + * w/o the queue_lock held and w/ our request out of the queue. + */ + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); +} + /* RO fail safe mechanism */ static long ro_bits[MAX_BLKDEV][8]; @@ -829,8 +925,14 @@ */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests) - wake_up(&q->wait_for_requests[rw]); + q->rq[rw].count++; + if (q->rq[rw].count >= q->batch_requests) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); + else + clear_full_and_wake(q, rw); + } } } @@ -948,7 +1050,6 @@ */ max_sectors = get_max_sectors(bh->b_rdev); -again: req = NULL; head = &q->queue_head; /* @@ -957,6 +1058,7 @@ */ spin_lock_irq(&io_request_lock); +again: insert_here = head->prev; if (list_empty(head)) { q->plug_device_fn(q, bh->b_rdev); /* is atomic */ @@ -1042,6 +1144,9 @@ if (req == NULL) { spin_unlock_irq(&io_request_lock); freereq = __get_request_wait(q, rw); + head = &q->queue_head; + spin_lock_irq(&io_request_lock); + get_request_wait_wakeup(q, rw); goto again; } } @@ -1063,6 +1168,7 @@ req->rq_dev = bh->b_rdev; req->start_time = jiffies; req_new_io(req, 0, count); + q->num_req++; blk_started_io(count); add_request(q, req, insert_here); out: @@ -1196,8 +1302,15 @@ bh->b_rdev = bh->b_dev; bh->b_rsector = bh->b_blocknr * count; + get_bh(bh); generic_make_request(rw, bh); + /* fix race condition with wait_on_buffer() */ + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(&bh->b_wait)) + wake_up(&bh->b_wait); + + put_bh(bh); switch (rw) { case WRITE: kstat.pgpgout += count; --- 1.83/fs/buffer.c Wed May 14 12:51:00 2003 +++ edited/fs/buffer.c Wed Jun 11 09:56:27 2003 @@ -153,10 +153,23 @@ get_bh(bh); add_wait_queue(&bh->b_wait, &wait); do { - run_task_queue(&tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!buffer_locked(bh)) break; + /* + * We must read tq_disk in TQ_ACTIVE after the + * add_wait_queue effect is visible to other cpus. + * We could unplug some line above it wouldn't matter + * but we can't do that right after add_wait_queue + * without an smp_mb() in between because spin_unlock + * has inclusive semantics. + * Doing it here is the most efficient place so we + * don't do a suprious unplug if we get a racy + * wakeup that make buffer_locked to return 0, and + * doing it here avoids an explicit smp_mb() we + * rely on the implicit one in set_task_state. + */ + run_task_queue(&tq_disk); schedule(); } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; @@ -1507,6 +1520,9 @@ /* Done - end_buffer_io_async will unlock */ SetPageUptodate(page); + + wakeup_page_waiters(page); + return 0; out: @@ -1538,6 +1554,7 @@ } while (bh != head); if (need_unlock) UnlockPage(page); + wakeup_page_waiters(page); return err; } @@ -1765,6 +1782,8 @@ else submit_bh(READ, bh); } + + wakeup_page_waiters(page); return 0; } @@ -2378,6 +2397,7 @@ submit_bh(rw, bh); bh = next; } while (bh != head); + wakeup_page_waiters(page); return 0; } --- 1.49/fs/super.c Wed Dec 18 21:34:24 2002 +++ edited/fs/super.c Tue Jun 10 14:49:27 2003 @@ -726,6 +726,7 @@ if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0)) goto Einval; s->s_flags |= MS_ACTIVE; + blk_reset_stats(dev); path_release(&nd); return s; --- 1.45/fs/reiserfs/inode.c Thu May 22 16:35:02 2003 +++ edited/fs/reiserfs/inode.c Tue Jun 10 14:49:27 2003 @@ -2048,6 +2048,7 @@ */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { UnlockPage(page) ; } --- 1.23/include/linux/blkdev.h Fri Nov 29 17:03:01 2002 +++ edited/include/linux/blkdev.h Wed Jun 11 09:56:55 2003 @@ -126,6 +126,14 @@ */ char head_active; + /* + * Booleans that indicate whether the queue's free requests have + * been exhausted and is waiting to drop below the batch_requests + * threshold + */ + char read_full; + char write_full; + unsigned long bounce_pfn; /* @@ -138,8 +146,17 @@ * Tasks wait here for free read and write requests */ wait_queue_head_t wait_for_requests[2]; + unsigned long max_wait; + unsigned long min_wait; + unsigned long total_wait; + unsigned long num_req; + unsigned long num_wait; + unsigned long deviation[5]; }; +void blk_reset_stats(kdev_t dev); +void blk_print_stats(kdev_t dev); + #define blk_queue_plugged(q) (q)->plugged #define blk_fs_request(rq) ((rq)->cmd == READ || (rq)->cmd == WRITE) #define blk_queue_empty(q) list_empty(&(q)->queue_head) @@ -156,6 +173,30 @@ } } +static inline void set_queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + q->read_full = 1; + else + q->write_full = 1; +} + +static inline void clear_queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + q->read_full = 0; + else + q->write_full = 0; +} + +static inline int queue_full(request_queue_t *q, int rw) +{ + if (rw == READ) + return q->read_full; + else + return q->write_full; +} + extern unsigned long blk_max_low_pfn, blk_max_pfn; #define BLK_BOUNCE_HIGH (blk_max_low_pfn << PAGE_SHIFT) @@ -217,6 +258,7 @@ extern void generic_make_request(int rw, struct buffer_head * bh); extern inline request_queue_t *blk_get_queue(kdev_t dev); extern void blkdev_release_request(struct request *); +extern void blk_print_stats(kdev_t dev); /* * Access functions for manipulating queue properties --- 1.19/include/linux/pagemap.h Sun Aug 25 15:32:11 2002 +++ edited/include/linux/pagemap.h Wed Jun 11 08:57:12 2003 @@ -97,6 +97,8 @@ ___wait_on_page(page); } +extern void FASTCALL(wakeup_page_waiters(struct page * page)); + /* * Returns locked page at given index in given cache, creating it if needed. */ --- 1.68/kernel/ksyms.c Fri May 23 17:40:47 2003 +++ edited/kernel/ksyms.c Tue Jun 10 14:49:27 2003 @@ -295,6 +295,7 @@ EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); --- 1.77/mm/filemap.c Thu Apr 24 11:05:10 2003 +++ edited/mm/filemap.c Tue Jun 10 14:49:28 2003 @@ -812,6 +812,20 @@ return &wait[hash]; } +/* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * head; + + head = page_waitqueue(page); + if (waitqueue_active(head)) + wake_up(head); +} + /* * Wait for a page to get unlocked. * ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 17:42 ` Chris Mason @ 2003-06-11 18:12 ` Andrea Arcangeli 2003-06-11 18:27 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 18:12 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > + if (q->rq[rw].count >= q->batch_requests) { > + smp_mb(); > + if (waitqueue_active(&q->wait_for_requests[rw])) > + wake_up(&q->wait_for_requests[rw]); in my tree I also changed this to: wake_up_nr(&q->wait_for_requests[rw], q->rq[rw].count); otherwise only one waiter will eat the requests, while multiple waiters can eat requests in parallel instead because we freed not just 1 request but many of them. I wonder if my above change is really the right way to implement the removal of the _exclusive line that went in rc6. However with your patch the wake_up_nr (or ~equivalent removal of _exclusive wakeup of rc6) should mostly improve cpu parallelism in smp and while waiting for I/O, the amount of stuff in the I/O queue and the overall fariness shouldn't change very significantly with this new completely fair FIFO request allocator. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 18:12 ` Andrea Arcangeli @ 2003-06-11 18:27 ` Chris Mason 2003-06-11 18:35 ` Andrea Arcangeli 0 siblings, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-11 18:27 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: > On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > > + if (q->rq[rw].count >= q->batch_requests) { > > + smp_mb(); > > + if (waitqueue_active(&q->wait_for_requests[rw])) > > + wake_up(&q->wait_for_requests[rw]); > > in my tree I also changed this to: > > wake_up_nr(&q->wait_for_requests[rw], q->rq[rw].count); > > otherwise only one waiter will eat the requests, while multiple waiters > can eat requests in parallel instead because we freed not just 1 request > but many of them. I tried a few variations of this yesterday and they all led to horrible latencies, but I couldn't really explain why. I had a bunch of other stuff in at the time to try and improve throughput though, so I'll try it again. I think part of the problem is the cascading wakeups from get_request_wait_wakeup(). So if we wakeup 32 procs they in turn wakeup another 32, etc. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) 2003-06-11 18:27 ` Chris Mason @ 2003-06-11 18:35 ` Andrea Arcangeli 2003-06-12 1:04 ` [PATCH] io stalls Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-11 18:35 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote: > On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: > > On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > > > + if (q->rq[rw].count >= q->batch_requests) { > > > + smp_mb(); > > > + if (waitqueue_active(&q->wait_for_requests[rw])) > > > + wake_up(&q->wait_for_requests[rw]); > > > > in my tree I also changed this to: > > > > wake_up_nr(&q->wait_for_requests[rw], q->rq[rw].count); > > > > otherwise only one waiter will eat the requests, while multiple waiters > > can eat requests in parallel instead because we freed not just 1 request > > but many of them. > > I tried a few variations of this yesterday and they all led to horrible > latencies, but I couldn't really explain why. I had a bunch of other the I/O latency in theory shouldn't change, we're not reordering the queue at all, they'll go to sleep immediatly again if __get_request returns null. > stuff in at the time to try and improve throughput though, so I'll try > it again. > > I think part of the problem is the cascading wakeups from > get_request_wait_wakeup(). So if we wakeup 32 procs they in turn wakeup > another 32, etc. so maybe it's enough to wakeup count / 2 to account for the double wakeup? that will avoid some overscheduling indeed. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-11 18:35 ` Andrea Arcangeli @ 2003-06-12 1:04 ` Nick Piggin 2003-06-12 1:12 ` Chris Mason 2003-06-12 1:29 ` Andrea Arcangeli 0 siblings, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 1:04 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote: > >>On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: >> >>>On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: >>> >>>>+ if (q->rq[rw].count >= q->batch_requests) { >>>>+ smp_mb(); >>>>+ if (waitqueue_active(&q->wait_for_requests[rw])) >>>>+ wake_up(&q->wait_for_requests[rw]); >>>> >>>in my tree I also changed this to: >>> >>> wake_up_nr(&q->wait_for_requests[rw], q->rq[rw].count); >>> >>>otherwise only one waiter will eat the requests, while multiple waiters >>>can eat requests in parallel instead because we freed not just 1 request >>>but many of them. >>> >>I tried a few variations of this yesterday and they all led to horrible >>latencies, but I couldn't really explain why. I had a bunch of other >> > >the I/O latency in theory shouldn't change, we're not reordering the >queue at all, they'll go to sleep immediatly again if __get_request >returns null. > And go to the end of the queue? > >>stuff in at the time to try and improve throughput though, so I'll try >>it again. >> >>I think part of the problem is the cascading wakeups from >>get_request_wait_wakeup(). So if we wakeup 32 procs they in turn wakeup >>another 32, etc. >> > >so maybe it's enough to wakeup count / 2 to account for the double >wakeup? that will avoid some overscheduling indeed. > > Andrea, this isn't needed because when the queue falls below the batch limit, every retiring request will do a wake up and another request will be put on (as long as the waitqueue is active). ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 1:04 ` [PATCH] io stalls Nick Piggin @ 2003-06-12 1:12 ` Chris Mason 2003-06-12 1:29 ` Andrea Arcangeli 1 sibling, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-12 1:12 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, 2003-06-11 at 21:04, Nick Piggin wrote: > Andrea Arcangeli wrote: > > >On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote: > > > >>On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: > >> > >>>On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > >>> > >>>>+ if (q->rq[rw].count >= q->batch_requests) { > >>>>+ smp_mb(); > >>>>+ if (waitqueue_active(&q->wait_for_requests[rw])) > >>>>+ wake_up(&q->wait_for_requests[rw]); > >>>> > >>>in my tree I also changed this to: > >>> > >>> wake_up_nr(&q->wait_for_requests[rw], q->rq[rw].count); > >>> > >>>otherwise only one waiter will eat the requests, while multiple waiters > >>>can eat requests in parallel instead because we freed not just 1 request > >>>but many of them. > >>> > >>I tried a few variations of this yesterday and they all led to horrible > >>latencies, but I couldn't really explain why. I had a bunch of other > >> > > > >the I/O latency in theory shouldn't change, we're not reordering the > >queue at all, they'll go to sleep immediatly again if __get_request > >returns null. > > > > And go to the end of the queue? > This got dragged into private mail for a few messages, but we figured out the problem turns into scheduling fairness with wake_up_nr() 32 procs might get woken, but when the first of those procs gets a request, he'll wake another, and so on. But there's no promise that getting woken fairly means you'll get scheduled fairly, so you might not get scheduled in for quite a while, perhaps even after new requests have gone onto the wait queue and gotten woken up. The real problem is get_request_wait_wakeup, andrea is working on a few changes to that. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 1:04 ` [PATCH] io stalls Nick Piggin 2003-06-12 1:12 ` Chris Mason @ 2003-06-12 1:29 ` Andrea Arcangeli 2003-06-12 1:37 ` Andrea Arcangeli 2003-06-12 2:22 ` Chris Mason 1 sibling, 2 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 1:29 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 11:04:42AM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote: > > > >>On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: > >> > >>>On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > >>> > >>>>+ if (q->rq[rw].count >= q->batch_requests) { > >>>>+ smp_mb(); > >>>>+ if (waitqueue_active(&q->wait_for_requests[rw])) > >>>>+ wake_up(&q->wait_for_requests[rw]); > >>>> > >>>in my tree I also changed this to: > >>> > >>> wake_up_nr(&q->wait_for_requests[rw], > >>> q->rq[rw].count); > >>> > >>>otherwise only one waiter will eat the requests, while multiple waiters > >>>can eat requests in parallel instead because we freed not just 1 request > >>>but many of them. > >>> > >>I tried a few variations of this yesterday and they all led to horrible > >>latencies, but I couldn't really explain why. I had a bunch of other > >> > > > >the I/O latency in theory shouldn't change, we're not reordering the > >queue at all, they'll go to sleep immediatly again if __get_request > >returns null. > > > > And go to the end of the queue? they stay in queue, so they don't go to the end. but as Chris found since we've the get_request_wait_wakeup, even waking free-requests/2 isn't enough since that will generate free-requests *1.5 of wakeups where the last free-requests/2 (implicitly generated by the get_request_wait_wakeup) will become runnable and they will race with the other tasks later waken by another request release. I sort of fixed that by remebering an old suggestion from Andrew: static void get_request_wait_wakeup(request_queue_t *q, int rw) { /* * avoid losing an unplug if a second __get_request_wait did the * generic_unplug_device while our __get_request_wait was * running * w/o the queue_lock held and w/ our request out of the queue. */ if (waitqueue_active(&q->wait_for_requests)) run_task_queue(&tq_disk); } this will avoid get_request_wait_wakeup to mess the wakeup, so we can wakep_nr(rq.count) safely. then there's the last issue raised by Chris, that is if we get request released faster than the tasks can run, still we can generate a not perfect fairness. My solution to that is to change wake_up to have a nr_exclusive not obeying to the try_to_wakeup retval. that should guarantee exact FIFO then, but it's a minor issue because the requests shouldn't be released systematically in a flood. So I'm leaving it opened for now, the others already addressed should be the major ones. > >>stuff in at the time to try and improve throughput though, so I'll try > >>it again. > >> > >>I think part of the problem is the cascading wakeups from > >>get_request_wait_wakeup(). So if we wakeup 32 procs they in turn wakeup > >>another 32, etc. > >> > > > >so maybe it's enough to wakeup count / 2 to account for the double > >wakeup? that will avoid some overscheduling indeed. > > > > > > Andrea, this isn't needed because when the queue falls below actually the problem is that it isn't enough, not that it isn't needed. I had to stop get_request_wait_wakeup to mess the wakeups, so now I can return doing /2. > the batch limit, every retiring request will do a wake up and > another request will be put on (as long as the waitqueue is > active). this was my argument for doing /2, but that will lead to count * 1.5 of wakeups, where the last count /2 will race with further wakeups screwing the FIFO ordering. As said that's fixed completely now and the last issue is the one with the flood of request release that can't keep up with the tasks becoming runnable but it's hopefully a minor issue (I'm not going to change how wake_up_nr works right now, maybe later). Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 1:29 ` Andrea Arcangeli @ 2003-06-12 1:37 ` Andrea Arcangeli 2003-06-12 2:22 ` Chris Mason 1 sibling, 0 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 1:37 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 03:29:51AM +0200, Andrea Arcangeli wrote: > static void get_request_wait_wakeup(request_queue_t *q, int rw) > { > /* > * avoid losing an unplug if a second __get_request_wait did the > * generic_unplug_device while our __get_request_wait was > * running > * w/o the queue_lock held and w/ our request out of the queue. > */ > if (waitqueue_active(&q->wait_for_requests)) > run_task_queue(&tq_disk); btw, that was the old version, Chris did it right s/run_task_queue(&tq_disk)/__generic_unplug_device(q)/ Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 1:29 ` Andrea Arcangeli 2003-06-12 1:37 ` Andrea Arcangeli @ 2003-06-12 2:22 ` Chris Mason 2003-06-12 2:41 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-12 2:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: > this will avoid get_request_wait_wakeup to mess the wakeup, so we can > wakep_nr(rq.count) safely. > > then there's the last issue raised by Chris, that is if we get request > released faster than the tasks can run, still we can generate a not > perfect fairness. My solution to that is to change wake_up to have a > nr_exclusive not obeying to the try_to_wakeup retval. that should > guarantee exact FIFO then, but it's a minor issue because the requests > shouldn't be released systematically in a flood. So I'm leaving it > opened for now, the others already addressed should be the major ones. I think the only time we really need to wakeup more than one waiter is when we hit the q->batch_request mark. After that, each new request that is freed can be matched with a single waiter, and we know that any previously finished requests have probably already been matched to their own waiter. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:22 ` Chris Mason @ 2003-06-12 2:41 ` Nick Piggin 2003-06-12 2:46 ` Andrea Arcangeli 2003-06-12 11:57 ` Chris Mason 0 siblings, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 2:41 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: > > >>this will avoid get_request_wait_wakeup to mess the wakeup, so we can >>wakep_nr(rq.count) safely. >> >>then there's the last issue raised by Chris, that is if we get request >>released faster than the tasks can run, still we can generate a not >>perfect fairness. My solution to that is to change wake_up to have a >>nr_exclusive not obeying to the try_to_wakeup retval. that should >>guarantee exact FIFO then, but it's a minor issue because the requests >>shouldn't be released systematically in a flood. So I'm leaving it >>opened for now, the others already addressed should be the major ones. >> > >I think the only time we really need to wakeup more than one waiter is >when we hit the q->batch_request mark. After that, each new request >that is freed can be matched with a single waiter, and we know that any >previously finished requests have probably already been matched to their >own waiter. > > Nope. Not even then. Each retiring request should submit a wake up, and the process will submit another request. So the number of requests will be held at the batch_request mark until no more waiters. Now that begs the question, why have batch_requests anymore? It no longer does anything. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:41 ` Nick Piggin @ 2003-06-12 2:46 ` Andrea Arcangeli 2003-06-12 2:49 ` Nick Piggin 2003-06-25 19:03 ` Chris Mason 2003-06-12 11:57 ` Chris Mason 1 sibling, 2 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 2:46 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 12:41:58PM +1000, Nick Piggin wrote: > > > Chris Mason wrote: > > >On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: > > > > > >>this will avoid get_request_wait_wakeup to mess the wakeup, so we can > >>wakep_nr(rq.count) safely. > >> > >>then there's the last issue raised by Chris, that is if we get request > >>released faster than the tasks can run, still we can generate a not > >>perfect fairness. My solution to that is to change wake_up to have a > >>nr_exclusive not obeying to the try_to_wakeup retval. that should > >>guarantee exact FIFO then, but it's a minor issue because the requests > >>shouldn't be released systematically in a flood. So I'm leaving it > >>opened for now, the others already addressed should be the major ones. > >> > > > >I think the only time we really need to wakeup more than one waiter is > >when we hit the q->batch_request mark. After that, each new request > >that is freed can be matched with a single waiter, and we know that any > >previously finished requests have probably already been matched to their > >own waiter. > > > > > Nope. Not even then. Each retiring request should submit > a wake up, and the process will submit another request. > So the number of requests will be held at the batch_request > mark until no more waiters. > > Now that begs the question, why have batch_requests anymore? > It no longer does anything. it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added the wake_up_nr. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:46 ` Andrea Arcangeli @ 2003-06-12 2:49 ` Nick Piggin 2003-06-12 2:51 ` Nick Piggin 2003-06-12 2:58 ` Andrea Arcangeli 2003-06-25 19:03 ` Chris Mason 1 sibling, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 2:49 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Thu, Jun 12, 2003 at 12:41:58PM +1000, Nick Piggin wrote: > >> >>Chris Mason wrote: >> >> >>>On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: >>> >>> >>> >>>>this will avoid get_request_wait_wakeup to mess the wakeup, so we can >>>>wakep_nr(rq.count) safely. >>>> >>>>then there's the last issue raised by Chris, that is if we get request >>>>released faster than the tasks can run, still we can generate a not >>>>perfect fairness. My solution to that is to change wake_up to have a >>>>nr_exclusive not obeying to the try_to_wakeup retval. that should >>>>guarantee exact FIFO then, but it's a minor issue because the requests >>>>shouldn't be released systematically in a flood. So I'm leaving it >>>>opened for now, the others already addressed should be the major ones. >>>> >>>> >>>I think the only time we really need to wakeup more than one waiter is >>>when we hit the q->batch_request mark. After that, each new request >>>that is freed can be matched with a single waiter, and we know that any >>>previously finished requests have probably already been matched to their >>>own waiter. >>> >>> >>> >>Nope. Not even then. Each retiring request should submit >>a wake up, and the process will submit another request. >>So the number of requests will be held at the batch_request >>mark until no more waiters. >> >>Now that begs the question, why have batch_requests anymore? >>It no longer does anything. >> > >it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added >the wake_up_nr. > > That is pretty pointless as well. You might as well just start waking up at the queue full limit, and wake one at a time. The purpose for batch_requests was I think for devices with a very small request size, to reduce context switches. >Andrea > > > ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:49 ` Nick Piggin @ 2003-06-12 2:51 ` Nick Piggin 2003-06-12 2:52 ` Nick Piggin 2003-06-12 3:04 ` Andrea Arcangeli 2003-06-12 2:58 ` Andrea Arcangeli 1 sibling, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 2:51 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Nick Piggin wrote: > > > Andrea Arcangeli wrote: > >> On Thu, Jun 12, 2003 at 12:41:58PM +1000, Nick Piggin wrote: >> >>> >>> Chris Mason wrote: >>> >>> >>>> On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: >>>> >>>> >>>> >>>>> this will avoid get_request_wait_wakeup to mess the wakeup, so we can >>>>> wakep_nr(rq.count) safely. >>>>> >>>>> then there's the last issue raised by Chris, that is if we get >>>>> request >>>>> released faster than the tasks can run, still we can generate a not >>>>> perfect fairness. My solution to that is to change wake_up to have a >>>>> nr_exclusive not obeying to the try_to_wakeup retval. that should >>>>> guarantee exact FIFO then, but it's a minor issue because the >>>>> requests >>>>> shouldn't be released systematically in a flood. So I'm leaving it >>>>> opened for now, the others already addressed should be the major >>>>> ones. >>>>> >>>>> >>>> I think the only time we really need to wakeup more than one waiter is >>>> when we hit the q->batch_request mark. After that, each new request >>>> that is freed can be matched with a single waiter, and we know that >>>> any >>>> previously finished requests have probably already been matched to >>>> their >>>> own waiter. >>>> >>>> >>>> >>> Nope. Not even then. Each retiring request should submit >>> a wake up, and the process will submit another request. >>> So the number of requests will be held at the batch_request >>> mark until no more waiters. >>> >>> Now that begs the question, why have batch_requests anymore? >>> It no longer does anything. >>> >> >> it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added >> the wake_up_nr. >> >> > That is pretty pointless as well. You might as well just start > waking up at the queue full limit, and wake one at a time. > > The purpose for batch_requests was I think for devices with a > very small request size, to reduce context switches. I guess you could fix this by having a "last woken" flag, and allow that process to allocate requests without blocking from the batch limit until the queue full limit. That is how batch_requests is supposed to work. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:51 ` Nick Piggin @ 2003-06-12 2:52 ` Nick Piggin 2003-06-12 3:04 ` Andrea Arcangeli 1 sibling, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 2:52 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Nick Piggin wrote: > > I guess you could fix this by having a "last woken" flag, and > allow that process to allocate requests without blocking from > the batch limit until the queue full limit. That is how > batch_requests is supposed to work. s/flag/pid maybe? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:51 ` Nick Piggin 2003-06-12 2:52 ` Nick Piggin @ 2003-06-12 3:04 ` Andrea Arcangeli 1 sibling, 0 replies; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 3:04 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 12:51:30PM +1000, Nick Piggin wrote: > I guess you could fix this by having a "last woken" flag, and > allow that process to allocate requests without blocking from > the batch limit until the queue full limit. That is how > batch_requests is supposed to work. I see what you mean, I did care about the case of each request belonging to a different task, but of course this doesn't work if there's just one task. In such case there will be a single wakeup and one for each request, so it won't be able to eat all the requests and it'll keep hanging on the full bitflag. So yes, the ->full bit partly disabled the batch sectors in presence of only 1 task. With multiple tasks and the wake_up_nr batch_sectors will still work. However I don't care about that right now ;), it's a minor issue I guess, single task I/O normally doesn't seek heavily so more likely it will run into the oversized queue before being able to take advantage of the batch sectors. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:49 ` Nick Piggin 2003-06-12 2:51 ` Nick Piggin @ 2003-06-12 2:58 ` Andrea Arcangeli 2003-06-12 3:04 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 2:58 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 12:49:46PM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Thu, Jun 12, 2003 at 12:41:58PM +1000, Nick Piggin wrote: > > > >> > >>Chris Mason wrote: > >> > >> > >>>On Wed, 2003-06-11 at 21:29, Andrea Arcangeli wrote: > >>> > >>> > >>> > >>>>this will avoid get_request_wait_wakeup to mess the wakeup, so we can > >>>>wakep_nr(rq.count) safely. > >>>> > >>>>then there's the last issue raised by Chris, that is if we get request > >>>>released faster than the tasks can run, still we can generate a not > >>>>perfect fairness. My solution to that is to change wake_up to have a > >>>>nr_exclusive not obeying to the try_to_wakeup retval. that should > >>>>guarantee exact FIFO then, but it's a minor issue because the requests > >>>>shouldn't be released systematically in a flood. So I'm leaving it > >>>>opened for now, the others already addressed should be the major ones. > >>>> > >>>> > >>>I think the only time we really need to wakeup more than one waiter is > >>>when we hit the q->batch_request mark. After that, each new request > >>>that is freed can be matched with a single waiter, and we know that any > >>>previously finished requests have probably already been matched to their > >>>own waiter. > >>> > >>> > >>> > >>Nope. Not even then. Each retiring request should submit > >>a wake up, and the process will submit another request. > >>So the number of requests will be held at the batch_request > >>mark until no more waiters. > >> > >>Now that begs the question, why have batch_requests anymore? > >>It no longer does anything. > >> > > > >it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added > >the wake_up_nr. > > > > > That is pretty pointless as well. You might as well just start > waking up at the queue full limit, and wake one at a time. > > The purpose for batch_requests was I think for devices with a > very small request size, to reduce context switches. batch_requests at least in my tree matters only when each request is 512btyes and you've some thousand of them to compose a 4M queue or so. To maximize cpu cache usage etc.. I try to wakeup a task every 512bytes written, but every 32*512bytes written or so. Of course w/o the wake_up_nr that I added, that wasn't really working w/ the _exlusive wakeup. if you check my tree you'll see that for sequential I/O with 512k in each request (not 512bytes!) batch_requests is already a noop. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:58 ` Andrea Arcangeli @ 2003-06-12 3:04 ` Nick Piggin 2003-06-12 3:12 ` Andrea Arcangeli 0 siblings, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-12 3:04 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Thu, Jun 12, 2003 at 12:49:46PM +1000, Nick Piggin wrote: > >> >>Andrea Arcangeli wrote: >> >>>it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added >>>the wake_up_nr. >>> >>> >>> >>That is pretty pointless as well. You might as well just start >>waking up at the queue full limit, and wake one at a time. >> >>The purpose for batch_requests was I think for devices with a >>very small request size, to reduce context switches. >> > >batch_requests at least in my tree matters only when each request is >512btyes and you've some thousand of them to compose a 4M queue or so. >To maximize cpu cache usage etc.. I try to wakeup a task every 512bytes >written, but every 32*512bytes written or so. Of course w/o the >wake_up_nr that I added, that wasn't really working w/ the _exlusive >wakeup. > >if you check my tree you'll see that for sequential I/O with 512k in >each request (not 512bytes!) batch_requests is already a noop. > You are waking up multiple tasks which will each submit 1 request. You want to be waking up 1 task which will submit multiple requests - that is how you will save context switches, cpu cache, etc, and that task's requests will have a much better chance of being merged, or at least serviced as a nice batch than unrelated tasks. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:04 ` Nick Piggin @ 2003-06-12 3:12 ` Andrea Arcangeli 2003-06-12 3:20 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 3:12 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 01:04:27PM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Thu, Jun 12, 2003 at 12:49:46PM +1000, Nick Piggin wrote: > > > >> > >>Andrea Arcangeli wrote: > >> > >>>it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added > >>>the wake_up_nr. > >>> > >>> > >>> > >>That is pretty pointless as well. You might as well just start > >>waking up at the queue full limit, and wake one at a time. > >> > >>The purpose for batch_requests was I think for devices with a > >>very small request size, to reduce context switches. > >> > > > >batch_requests at least in my tree matters only when each request is > >512btyes and you've some thousand of them to compose a 4M queue or so. > >To maximize cpu cache usage etc.. I try to wakeup a task every 512bytes > >written, but every 32*512bytes written or so. Of course w/o the > >wake_up_nr that I added, that wasn't really working w/ the _exlusive > >wakeup. > > > >if you check my tree you'll see that for sequential I/O with 512k in > >each request (not 512bytes!) batch_requests is already a noop. > > > > > You are waking up multiple tasks which will each submit > 1 request. You want to be waking up 1 task which will > submit multiple requests - that is how you will save > context switches, cpu cache, etc, and that task's requests > will have a much better chance of being merged, or at > least serviced as a nice batch than unrelated tasks. for fairness reasons if there are multiple tasks, I want to wake them all and let the others be able to eat requests before the first allocates all the batch_sectors. So the current code is fine and batch_sectors still works fine with multiple tasks queued in the waitqueue, it still makes sense to wake more than one of them at the same time to improve cpu utilization (regardless they're different tasks, for istance we take less frequently the waitqueue spinlocks etc..). What we disabled is only the batch_sectors in function of the single task, so if for example there's just 1 single task, we could let it go, but it's quite a special case, if for example there would be two tasks, we wouldn't want to let them go ahead (unless we can distributed exactly count/2 requests to each task w/o reentering into __get_request_wait that's unlikely). So the current code looks ok to me with the wake_up_nr to take advantage of the batch_sectors against different tasks, still w/o penalizing fariness. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:12 ` Andrea Arcangeli @ 2003-06-12 3:20 ` Nick Piggin 2003-06-12 3:33 ` Andrea Arcangeli 2003-06-12 16:06 ` Chris Mason 0 siblings, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 3:20 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Thu, Jun 12, 2003 at 01:04:27PM +1000, Nick Piggin wrote: > >> >>Andrea Arcangeli wrote: >> >> >>>On Thu, Jun 12, 2003 at 12:49:46PM +1000, Nick Piggin wrote: >>> >>> >>>>Andrea Arcangeli wrote: >>>> >>>> >>>>>it does nothing w/ _exclusive and w/o the wake_up_nr, that's why I added >>>>>the wake_up_nr. >>>>> >>>>> >>>>> >>>>> >>>>That is pretty pointless as well. You might as well just start >>>>waking up at the queue full limit, and wake one at a time. >>>> >>>>The purpose for batch_requests was I think for devices with a >>>>very small request size, to reduce context switches. >>>> >>>> >>>batch_requests at least in my tree matters only when each request is >>>512btyes and you've some thousand of them to compose a 4M queue or so. >>>To maximize cpu cache usage etc.. I try to wakeup a task every 512bytes >>>written, but every 32*512bytes written or so. Of course w/o the >>>wake_up_nr that I added, that wasn't really working w/ the _exlusive >>>wakeup. >>> >>>if you check my tree you'll see that for sequential I/O with 512k in >>>each request (not 512bytes!) batch_requests is already a noop. >>> >>> >> >>You are waking up multiple tasks which will each submit >>1 request. You want to be waking up 1 task which will >>submit multiple requests - that is how you will save >>context switches, cpu cache, etc, and that task's requests >>will have a much better chance of being merged, or at >>least serviced as a nice batch than unrelated tasks. >> > >for fairness reasons if there are multiple tasks, I want to wake them >all and let the others be able to eat requests before the first >allocates all the batch_sectors. So the current code is fine and >batch_sectors still works fine with multiple tasks queued in the >waitqueue, it still makes sense to wake more than one of them at the >same time to improve cpu utilization (regardless they're different >tasks, for istance we take less frequently the waitqueue spinlocks >etc..). > Its no less fair this way, tasks will still be woken in fifo order. They will just be given the chance to submit a batch of requests. I think the cpu utilization gain of waking a number of tasks at once would be outweighed by advantage of waking 1 task and not putting it to sleep again for a number of requests. You obviously are not claiming concurrency improvements, as your method would also increase contention on the io lock (or the queue lock in 2.5). Then you have the cache gains of running each task for a longer period of time. You also get possible IO scheduling improvements. Consider 8 requests, batch_requests at 4, 10 tasks writing to different areas of disk. Your method still only allows each task to have 1 request in the elevator at once. Mine allows each to have a run of 4 requests in the elevator. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:20 ` Nick Piggin @ 2003-06-12 3:33 ` Andrea Arcangeli 2003-06-12 3:48 ` Nick Piggin 2003-06-12 16:06 ` Chris Mason 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 3:33 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 01:20:44PM +1000, Nick Piggin wrote: > Its no less fair this way, tasks will still be woken in fifo > order. They will just be given the chance to submit a batch > of requests. If you change the behaviour with queued_task_nr > batch_requests it is less fair period. Whatever else thing I don't care about right now because it is a minor cpu improvement anyways. I'm not talking about performance, I'm talking about latency and fariness only. This is the whole point of the ->full logic. > I think the cpu utilization gain of waking a number of tasks > at once would be outweighed by advantage of waking 1 task > and not putting it to sleep again for a number of requests. > You obviously are not claiming concurrency improvements, as > your method would also increase contention on the io lock > (or the queue lock in 2.5). I'm claiming that with queued_task_nr > batch_requests the batch_requests logic still has a chance to save some cpu, this is the only reason I didn't nuke it completely as you suggested some email ago. > Then you have the cache gains of running each task for a > longer period of time. You also get possible IO scheduling > improvements. > > Consider 8 requests, batch_requests at 4, 10 tasks writing > to different areas of disk. > > Your method still only allows each task to have 1 request in > the elevator at once. Mine allows each to have a run of 4 > requests in the elevator. I definitely want 1 request in the elevator at once or we can as well drop your ->full and return to be unfair. The whole point of ->full is to get the total fariness, across the tasks in the queue queue, and for tasks outside the queue calling get_request too. Since not all tasks will fit in the I/O queue, providing a very fair FIFO in the wait_for_request is fundamental to provide any sort of latency guarantee IMHO (the fact an _exclusive wakeup removal that mixes stuff and probably has the side effect of being more fair, made that much difference to mainline users kind of confirms that). Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:33 ` Andrea Arcangeli @ 2003-06-12 3:48 ` Nick Piggin 2003-06-12 4:17 ` Andrea Arcangeli 0 siblings, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-12 3:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Thu, Jun 12, 2003 at 01:20:44PM +1000, Nick Piggin wrote: > >>Its no less fair this way, tasks will still be woken in fifo >>order. They will just be given the chance to submit a batch >>of requests. >> > >If you change the behaviour with queued_task_nr > batch_requests it is >less fair period. Whatever else thing I don't care about right now >because it is a minor cpu improvement anyways. > >I'm not talking about performance, I'm talking about latency and >fariness only. This is the whole point of the ->full logic. > I say each task getting 8 requests at a time is as fair as each getting 1 request at a time. Yes, you may get a worse latency, but _fairness_ is the same. > >>I think the cpu utilization gain of waking a number of tasks >>at once would be outweighed by advantage of waking 1 task >>and not putting it to sleep again for a number of requests. >>You obviously are not claiming concurrency improvements, as >>your method would also increase contention on the io lock >>(or the queue lock in 2.5). >> > >I'm claiming that with queued_task_nr > batch_requests the >batch_requests logic still has a chance to save some cpu, this is the >only reason I didn't nuke it completely as you suggested some email ago. > Well I'm not so sure that your method will do a great deal of good. On SMP you would increase contention on the spinlock. IMO it would be better to serialise them on the waitqueue instead of a spinlock seeing as they are already sleeping. > >>Then you have the cache gains of running each task for a >>longer period of time. You also get possible IO scheduling >>improvements. >> >>Consider 8 requests, batch_requests at 4, 10 tasks writing >>to different areas of disk. >> >>Your method still only allows each task to have 1 request in >>the elevator at once. Mine allows each to have a run of 4 >>requests in the elevator. >> > >I definitely want 1 request in the elevator at once or we can as well >drop your ->full and return to be unfair. The whole point of ->full is >to get the total fariness, across the tasks in the queue queue, and for >tasks outside the queue calling get_request too. Since not all tasks >will fit in the I/O queue, providing a very fair FIFO in the >wait_for_request is fundamental to provide any sort of latency >guarantee IMHO (the fact an _exclusive wakeup removal that mixes stuff >and probably has the side effect of being more fair, made that much >difference to mainline users kind of confirms that). > > I think we'll just have to agree to disagree here. This sort of thing came up in our CFQ discussion as well. Its not that I think your way is without merits, but I think in an overload situtation its a better aim to attempt to keep throughput up rather than attempt to provide the lowest possible latency. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:48 ` Nick Piggin @ 2003-06-12 4:17 ` Andrea Arcangeli 2003-06-12 4:41 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-12 4:17 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, Jun 12, 2003 at 01:48:04PM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Thu, Jun 12, 2003 at 01:20:44PM +1000, Nick Piggin wrote: > > > >>Its no less fair this way, tasks will still be woken in fifo > >>order. They will just be given the chance to submit a batch > >>of requests. > >> > > > >If you change the behaviour with queued_task_nr > batch_requests it is > >less fair period. Whatever else thing I don't care about right now > >because it is a minor cpu improvement anyways. > > > >I'm not talking about performance, I'm talking about latency and > >fariness only. This is the whole point of the ->full logic. > > > > I say each task getting 8 requests at a time is as fair > as each getting 1 request at a time. Yes, you may get a > worse latency, but _fairness_ is the same. It is the worse latency that is the problem of course. Fariness in this case isn't affected (assuming you would write the batch_sectors fair), but latency would definitely be affected. > Well I'm not so sure that your method will do a great deal > of good. On SMP you would increase contention on the spinlock. > IMO it would be better to serialise them on the waitqueue > instead of a spinlock seeing as they are already sleeping. I think the worse part is the cacheline bouncing, but that might happen anyways under load. at least certainly it makes sense for UP or if you're lucky and the tasks run serially (possible if all cpus are running). > I think we'll just have to agree to disagree here. This > sort of thing came up in our CFQ discussion as well. Its > not that I think your way is without merits, but I think > in an overload situtation its a better aim to attempt to > keep throughput up rather than attempt to provide the > lowest possible latency. Those changes (like the CFQ I/O scheduler in 2.5) are being developed mostly due the latency complains we get as feedback on l-k. That's why I care about latency first here. But we've to care about throughput too of course. This isn't CFQ, it just tries to provide new requests to different tasks with the minimal possible latency which in turn also guarantees fariness. That sounds a good default to me, especially when I see the removal of the _exclusive wakeup in mainline taken as a major fariness/latency improvement. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 4:17 ` Andrea Arcangeli @ 2003-06-12 4:41 ` Nick Piggin 0 siblings, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 4:41 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Mason, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Andrea Arcangeli wrote: >On Thu, Jun 12, 2003 at 01:48:04PM +1000, Nick Piggin wrote: > >> >>Andrea Arcangeli wrote: >> >> >>>On Thu, Jun 12, 2003 at 01:20:44PM +1000, Nick Piggin wrote: >>> >>> >>>>Its no less fair this way, tasks will still be woken in fifo >>>>order. They will just be given the chance to submit a batch >>>>of requests. >>>> >>>> >>>If you change the behaviour with queued_task_nr > batch_requests it is >>>less fair period. Whatever else thing I don't care about right now >>>because it is a minor cpu improvement anyways. >>> >>>I'm not talking about performance, I'm talking about latency and >>>fariness only. This is the whole point of the ->full logic. >>> >>> >>I say each task getting 8 requests at a time is as fair >>as each getting 1 request at a time. Yes, you may get a >>worse latency, but _fairness_ is the same. >> > >It is the worse latency that is the problem of course. Fariness in this >case isn't affected (assuming you would write the batch_sectors fair), >but latency would definitely be affected. > Yep. > >>Well I'm not so sure that your method will do a great deal >>of good. On SMP you would increase contention on the spinlock. >>IMO it would be better to serialise them on the waitqueue >>instead of a spinlock seeing as they are already sleeping. >> > >I think the worse part is the cacheline bouncing, but that might happen >anyways under load. at least certainly it makes sense for UP or if >you're lucky and the tasks run serially (possible if all cpus are >running). > > >>I think we'll just have to agree to disagree here. This >>sort of thing came up in our CFQ discussion as well. Its >>not that I think your way is without merits, but I think >>in an overload situtation its a better aim to attempt to >>keep throughput up rather than attempt to provide the >>lowest possible latency. >> > >Those changes (like the CFQ I/O scheduler in 2.5) are being developed >mostly due the latency complains we get as feedback on l-k. That's why >I care about latency first here. But we've to care about throughput too >of course. This isn't CFQ, it just tries to provide new requests to >different tasks with the minimal possible latency which in turn also >guarantees fariness. That sounds a good default to me, especially when I >see the removal of the _exclusive wakeup in mainline taken as a major >fariness/latency improvement. > Throughput vs latency is always difficult I guess. In this case, I think when there are few waiters, then latency should not be much worse. When there are a lot of waiters it is probably not an interactive load to start with and throughput is more important. Anyway, the ideas you are following are interesting and worthwhile, so we'll each try our own thing :) ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 3:20 ` Nick Piggin 2003-06-12 3:33 ` Andrea Arcangeli @ 2003-06-12 16:06 ` Chris Mason 2003-06-12 16:16 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-12 16:06 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 1782 bytes --] On Wed, 2003-06-11 at 23:20, Nick Piggin wrote: > > I think the cpu utilization gain of waking a number of tasks > at once would be outweighed by advantage of waking 1 task > and not putting it to sleep again for a number of requests. > You obviously are not claiming concurrency improvements, as > your method would also increase contention on the io lock > (or the queue lock in 2.5). I've been trying variations on this for a few days, none have been thrilling but the end result is better dbench and iozone throughput overall. For the 20 writer iozone test, rc7 got an average throughput of 3MB/s, and yesterdays latency patch got 500k/s or so. Ouch. This gets us up to 1.2MB/s. I'm keeping yesterday's get_request_wait_wake, which wakes up a waiter instead of unplugging. The basic idea here is that after a process is woken up and grabs a request, he becomes the batch owner. Batch owners get to ignore the q->full flag for either 1/5 second or 32 requests, whichever comes first. The timer part is an attempt at preventing memory pressure writers (who go 1 req at a time) from holding onto batch ownership for too long. Latency stats after dbench 50: device 08:01: num_req 120077, total jiffies waited 663231 65538 forced to wait 1 min wait, 175 max wait 10 average wait 65296 < 100, 242 < 200, 0 < 300, 0 < 400, 0 < 500 0 waits longer than 500 jiffies Good latency system wide comes from fair waiting, but it also comes from how fast we can run write_some_buffers(), since that is the unit of throttling. Hopefully this patch decreases the time it takes for write_some_buffers over the past latency patches, or gives someone else a better idea ;-) Attached is an incremental over yesterday's io-stalls-5.diff. -chris [-- Attachment #2: io-stalls-6-inc.diff --] [-- Type: text/plain, Size: 3421 bytes --] diff -u edited/drivers/block/ll_rw_blk.c edited/drivers/block/ll_rw_blk.c --- edited/drivers/block/ll_rw_blk.c Wed Jun 11 13:36:10 2003 +++ edited/drivers/block/ll_rw_blk.c Thu Jun 12 11:53:03 2003 @@ -437,6 +437,12 @@ nr_requests = 128; if (megs < 32) nr_requests /= 2; + q->batch_owner[0] = NULL; + q->batch_owner[1] = NULL; + q->batch_remaining[0] = 0; + q->batch_remaining[1] = 0; + q->batch_jiffies[0] = 0; + q->batch_jiffies[1] = 0; blk_grow_request_list(q, nr_requests); init_waitqueue_head(&q->wait_for_requests[0]); @@ -558,6 +564,31 @@ blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH); } +#define BATCH_JIFFIES (HZ/5) +static void check_batch_owner(request_queue_t *q, int rw) +{ + if (q->batch_owner[rw] != current) + return; + if (--q->batch_remaining[rw] > 0 && + jiffies - q->batch_jiffies[rw] < BATCH_JIFFIES) { + return; + } + q->batch_owner[rw] = NULL; +} + +static void set_batch_owner(request_queue_t *q, int rw) +{ + struct task_struct *tsk = current; + if (q->batch_owner[rw] == tsk) + return; + if (q->batch_owner[rw] && + jiffies - q->batch_jiffies[rw] < BATCH_JIFFIES) + return; + q->batch_jiffies[rw] = jiffies; + q->batch_owner[rw] = current; + q->batch_remaining[rw] = q->batch_requests; +} + #define blkdev_free_rq(list) list_entry((list)->next, struct request, queue); /* * Get a free request. io_request_lock must be held and interrupts @@ -587,9 +618,13 @@ */ static inline struct request *get_request(request_queue_t *q, int rw) { - if (queue_full(q, rw)) + struct request *rq; + if (queue_full(q, rw) && q->batch_owner[rw] != current) return NULL; - return __get_request(q, rw); + rq = __get_request(q, rw); + if (rq) + check_batch_owner(q, rw); + return rq; } /* @@ -657,9 +692,9 @@ add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + spin_lock_irq(&io_request_lock); do { set_current_state(TASK_UNINTERRUPTIBLE); - spin_lock_irq(&io_request_lock); if (queue_full(q, rw) || q->rq[rw].count == 0) { if (q->rq[rw].count == 0) __generic_unplug_device(q); @@ -668,8 +703,9 @@ spin_lock_irq(&io_request_lock); } rq = __get_request(q, rw); - spin_unlock_irq(&io_request_lock); } while (rq == NULL); + set_batch_owner(q, rw); + spin_unlock_irq(&io_request_lock); remove_wait_queue(&q->wait_for_requests[rw], &wait); current->state = TASK_RUNNING; @@ -1010,6 +1046,7 @@ struct list_head *head, *insert_here; int latency; elevator_t *elevator = &q->elevator; + int need_unplug = 0; count = bh->b_size >> 9; sector = bh->b_rsector; @@ -1145,8 +1182,8 @@ spin_unlock_irq(&io_request_lock); freereq = __get_request_wait(q, rw); head = &q->queue_head; + need_unplug = 1; spin_lock_irq(&io_request_lock); - get_request_wait_wakeup(q, rw); goto again; } } @@ -1174,6 +1211,8 @@ out: if (freereq) blkdev_release_request(freereq); + if (need_unplug) + get_request_wait_wakeup(q, rw); spin_unlock_irq(&io_request_lock); return 0; end_io: diff -u edited/include/linux/blkdev.h edited/include/linux/blkdev.h --- edited/include/linux/blkdev.h Wed Jun 11 09:56:55 2003 +++ edited/include/linux/blkdev.h Thu Jun 12 09:44:26 2003 @@ -92,6 +92,10 @@ */ int batch_requests; + struct task_struct *batch_owner[2]; + int batch_remaining[2]; + unsigned long batch_jiffies[2]; + /* * Together with queue_head for cacheline sharing */ ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 16:06 ` Chris Mason @ 2003-06-12 16:16 ` Nick Piggin 0 siblings, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-12 16:16 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Wed, 2003-06-11 at 23:20, Nick Piggin wrote: > > >>I think the cpu utilization gain of waking a number of tasks >>at once would be outweighed by advantage of waking 1 task >>and not putting it to sleep again for a number of requests. >>You obviously are not claiming concurrency improvements, as >>your method would also increase contention on the io lock >>(or the queue lock in 2.5). >> > >I've been trying variations on this for a few days, none have been >thrilling but the end result is better dbench and iozone throughput >overall. For the 20 writer iozone test, rc7 got an average throughput >of 3MB/s, and yesterdays latency patch got 500k/s or so. Ouch. > >This gets us up to 1.2MB/s. I'm keeping yesterday's >get_request_wait_wake, which wakes up a waiter instead of unplugging. > >The basic idea here is that after a process is woken up and grabs a >request, he becomes the batch owner. Batch owners get to ignore the >q->full flag for either 1/5 second or 32 requests, whichever comes >first. The timer part is an attempt at preventing memory pressure >writers (who go 1 req at a time) from holding onto batch ownership for >too long. Latency stats after dbench 50: > Yeah, I get ~50% more throughput and up to 20% better CPU efficiency on tiobench 256 for sequential and random writers by doing something similar. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:46 ` Andrea Arcangeli 2003-06-12 2:49 ` Nick Piggin @ 2003-06-25 19:03 ` Chris Mason 2003-06-25 19:25 ` Andrea Arcangeli 2003-06-26 5:48 ` [PATCH] io stalls Nick Piggin 1 sibling, 2 replies; 114+ messages in thread From: Chris Mason @ 2003-06-25 19:03 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 2624 bytes --] Hello all, [ short version, the patch attached should fix io latencies in 2.4.21. Please review and/or give it a try ] My last set of patches was directed at reducing the latencies in __get_request_wait, which really helped reduce stalls when you had lots of io to one device and balance_dirty() was causing pauses while you tried to do io to other devices. But, a streaming write could still starve reads to the same device, mostly because the read would have to send down any huge merged writes that were before it in the queue. Andrea's kernel has a fix for this too, he limits the total number of sectors that can be in the request queue at any given time. But, his patches change blk_finished_io, both in the arguments it takes and the side effects of calling it. I don't think we can merge his current form without breaking external drivers. So, I added a can_throttle flag to the queue struct, drivers can enable it if they are going to call the new blk_started_sectors and blk_finished_sectors funcs any time they call blk_{started,finished}_io, and these do all the -aa style sector throttling. There were a few other small changes to Andrea's patch, he wasn't setting q->full when get_request decided there were too many sectors in flight. This resulted in large latencies in __get_request_wait. He was also unconditionally clearing q->full in blkdev_release_request, my code only clears q->full when all the waiters are gone. I changed generic_unplug_device to zero the elevator_sequence field of the last request on the queue. This means there won't be any merges with requests pending once an unplug is done, and helps limit the number of sectors that need to be sent down during the run_task_queue(&tq_disk) in wait_on_buffer. I lowered the -aa default limit on sectors in flight from 4MB to 2MB. We probably want an elvtune for it, large arrays with writeback cache should be able to tolerate larger values. There's still a little work left to do, this patch enables sector throttling for scsi and IDE. cciss, DAC960 and cpqarray need modification too (99% done already in -aa). No sense in doing that until after the bulk of the patch is reviewed though. As before, most of the code here is from Andrea and Nick, I've just wrapped a lot of duct tape around it and done some tweaking. The primary pieces are: fix-pausing (andrea, corner cases where wakeups are missed) elevator-low-latency (andrea, limit sectors in flight) queue_full (Nick, fairness in __get_request_wait) I've removed my latency stats for __get_request_wait in hopes of making it a better merging candidate. -chris [-- Attachment #2: io-stalls-7.diff --] [-- Type: text/plain, Size: 24826 bytes --] diff -urN --exclude '*.orig' --exclude '*.rej' parent/drivers/block/ll_rw_blk.c comp/drivers/block/ll_rw_blk.c --- parent/drivers/block/ll_rw_blk.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/drivers/block/ll_rw_blk.c 2003-06-25 14:11:56.000000000 -0400 @@ -176,11 +176,12 @@ { int count = q->nr_requests; - count -= __blk_cleanup_queue(&q->rq[READ]); - count -= __blk_cleanup_queue(&q->rq[WRITE]); + count -= __blk_cleanup_queue(&q->rq); if (count) printk("blk_cleanup_queue: leaked requests (%d)\n", count); + if (atomic_read(&q->nr_sectors)) + printk("blk_cleanup_queue: leaked sectors (%d)\n", atomic_read(&q->nr_sectors)); memset(q, 0, sizeof(*q)); } @@ -215,6 +216,24 @@ } /** + * blk_queue_throttle_sectors - indicates you will call sector throttling funcs + * @q: The queue which this applies to. + * @active: A flag indication if you want sector throttling on + * + * Description: + * The sector throttling code allows us to put a limit on the number of + * sectors pending io to the disk at a given time, sending @active nonzero + * indicates you will call blk_started_sectors and blk_finished_sectors in + * addition to calling blk_started_io and blk_finished_io in order to + * keep track of the number of sectors in flight. + **/ + +void blk_queue_throttle_sectors(request_queue_t * q, int active) +{ + q->can_throttle = active; +} + +/** * blk_queue_make_request - define an alternate make_request function for a device * @q: the request queue for the device to be affected * @mfn: the alternate make_request function @@ -360,8 +379,20 @@ { if (q->plugged) { q->plugged = 0; - if (!list_empty(&q->queue_head)) + if (!list_empty(&q->queue_head)) { + struct request *rq; + + /* we don't want merges later on to come in + * and significantly increase the amount of + * work during an unplug, it can lead to high + * latencies while some poor waiter tries to + * run an ever increasing chunk of io. + * This does lower throughput some though. + */ + rq = blkdev_entry_prev_request(&q->queue_head), + rq->elevator_sequence = 0; q->request_fn(q); + } } } @@ -389,7 +420,7 @@ * * Returns the (new) number of requests which the queue has available. */ -int blk_grow_request_list(request_queue_t *q, int nr_requests) +int blk_grow_request_list(request_queue_t *q, int nr_requests, int max_queue_sectors) { unsigned long flags; /* Several broken drivers assume that this function doesn't sleep, @@ -399,21 +430,34 @@ spin_lock_irqsave(&io_request_lock, flags); while (q->nr_requests < nr_requests) { struct request *rq; - int rw; rq = kmem_cache_alloc(request_cachep, SLAB_ATOMIC); if (rq == NULL) break; memset(rq, 0, sizeof(*rq)); rq->rq_status = RQ_INACTIVE; - rw = q->nr_requests & 1; - list_add(&rq->queue, &q->rq[rw].free); - q->rq[rw].count++; + list_add(&rq->queue, &q->rq.free); + q->rq.count++; + q->nr_requests++; } + + /* + * Wakeup waiters after both one quarter of the + * max-in-fligh queue and one quarter of the requests + * are available again. + */ + q->batch_requests = q->nr_requests / 4; if (q->batch_requests > 32) q->batch_requests = 32; + q->batch_sectors = max_queue_sectors / 4; + + q->max_queue_sectors = max_queue_sectors; + + BUG_ON(!q->batch_sectors); + atomic_set(&q->nr_sectors, 0); + spin_unlock_irqrestore(&io_request_lock, flags); return q->nr_requests; } @@ -422,23 +466,27 @@ { struct sysinfo si; int megs; /* Total memory, in megabytes */ - int nr_requests; - - INIT_LIST_HEAD(&q->rq[READ].free); - INIT_LIST_HEAD(&q->rq[WRITE].free); - q->rq[READ].count = 0; - q->rq[WRITE].count = 0; + int nr_requests, max_queue_sectors = MAX_QUEUE_SECTORS; + + INIT_LIST_HEAD(&q->rq.free); + q->rq.count = 0; q->nr_requests = 0; si_meminfo(&si); megs = si.totalram >> (20 - PAGE_SHIFT); - nr_requests = 128; - if (megs < 32) - nr_requests /= 2; - blk_grow_request_list(q, nr_requests); + nr_requests = MAX_NR_REQUESTS; + if (megs < 30) { + nr_requests /= 2; + max_queue_sectors /= 2; + } + /* notice early if anybody screwed the defaults */ + BUG_ON(!nr_requests); + BUG_ON(!max_queue_sectors); + + blk_grow_request_list(q, nr_requests, max_queue_sectors); + + init_waitqueue_head(&q->wait_for_requests); - init_waitqueue_head(&q->wait_for_requests[0]); - init_waitqueue_head(&q->wait_for_requests[1]); spin_lock_init(&q->queue_lock); } @@ -491,6 +539,9 @@ q->plug_tq.routine = &generic_unplug_device; q->plug_tq.data = q; q->plugged = 0; + q->full = 0; + q->can_throttle = 0; + /* * These booleans describe the queue properties. We set the * default (and most common) values here. Other drivers can @@ -508,12 +559,13 @@ * Get a free request. io_request_lock must be held and interrupts * disabled on the way in. Returns NULL if there are no free requests. */ -static struct request *get_request(request_queue_t *q, int rw) +static struct request *__get_request(request_queue_t *q, int rw) { struct request *rq = NULL; - struct request_list *rl = q->rq + rw; + struct request_list *rl; - if (!list_empty(&rl->free)) { + rl = &q->rq; + if (!list_empty(&rl->free) && !blk_oversized_queue(q)) { rq = blkdev_free_rq(&rl->free); list_del(&rq->queue); rl->count--; @@ -521,35 +573,47 @@ rq->cmd = rw; rq->special = NULL; rq->q = q; - } - + } else + q->full = 1; return rq; } /* - * Here's the request allocation design: + * get a free request, honoring the queue_full condition + */ +static inline struct request *get_request(request_queue_t *q, int rw) +{ + if (q->full) + return NULL; + return __get_request(q, rw); +} + +/* + * helper func to do memory barriers and wakeups when we finally decide + * to clear the queue full condition + */ +static inline void clear_full_and_wake(request_queue_t *q) +{ + q->full = 0; + mb(); + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); +} + +/* + * Here's the request allocation design, low latency version: * * 1: Blocking on request exhaustion is a key part of I/O throttling. * * 2: We want to be `fair' to all requesters. We must avoid starvation, and * attempt to ensure that all requesters sleep for a similar duration. Hence * no stealing requests when there are other processes waiting. - * - * 3: We also wish to support `batching' of requests. So when a process is - * woken, we want to allow it to allocate a decent number of requests - * before it blocks again, so they can be nicely merged (this only really - * matters if the process happens to be adding requests near the head of - * the queue). - * - * 4: We want to avoid scheduling storms. This isn't really important, because - * the system will be I/O bound anyway. But it's easy. - * - * There is tension between requirements 2 and 3. Once a task has woken, - * we don't want to allow it to sleep as soon as it takes its second request. - * But we don't want currently-running tasks to steal all the requests - * from the sleepers. We handle this with wakeup hysteresis around - * 0 .. batch_requests and with the assumption that request taking is much, - * much faster than request freeing. + * + * There used to be more here, attempting to allow a process to send in a + * number of requests once it has woken up. But, there's no way to + * tell if a process has just been woken up, or if it is a new process + * coming in to steal requests from the waiters. So, we give up and force + * everyone to wait fairly. * * So here's what we do: * @@ -561,28 +625,23 @@ * * When a process wants a new request: * - * b) If free_requests == 0, the requester sleeps in FIFO manner. - * - * b) If 0 < free_requests < batch_requests and there are waiters, - * we still take a request non-blockingly. This provides batching. - * - * c) If free_requests >= batch_requests, the caller is immediately - * granted a new request. + * b) If free_requests == 0, the requester sleeps in FIFO manner, and + * the queue full condition is set. The full condition is not + * cleared until there are no longer any waiters. Once the full + * condition is set, all new io must wait, hopefully for a very + * short period of time. * * When a request is released: * - * d) If free_requests < batch_requests, do nothing. - * - * f) If free_requests >= batch_requests, wake up a single waiter. + * c) If free_requests < batch_requests, do nothing. * - * The net effect is that when a process is woken at the batch_requests level, - * it will be able to take approximately (batch_requests) requests before - * blocking again (at the tail of the queue). - * - * This all assumes that the rate of taking requests is much, much higher - * than the rate of releasing them. Which is very true. + * d) If free_requests >= batch_requests, wake up a single waiter. * - * -akpm, Feb 2002. + * As each waiter gets a request, he wakes another waiter. We do this + * to prevent a race where an unplug might get run before a request makes + * it's way onto the queue. The result is a cascade of wakeups, so delaying + * the initial wakeup until we've got batch_requests available helps avoid + * wakeups where there aren't any requests available yet. */ static struct request *__get_request_wait(request_queue_t *q, int rw) @@ -590,21 +649,40 @@ register struct request *rq; DECLARE_WAITQUEUE(wait, current); - add_wait_queue(&q->wait_for_requests[rw], &wait); + add_wait_queue_exclusive(&q->wait_for_requests, &wait); + do { set_current_state(TASK_UNINTERRUPTIBLE); - generic_unplug_device(q); - if (q->rq[rw].count == 0) - schedule(); spin_lock_irq(&io_request_lock); - rq = get_request(q, rw); + if (q->full || blk_oversized_queue(q)) { + __generic_unplug_device(q); + spin_unlock_irq(&io_request_lock); + schedule(); + spin_lock_irq(&io_request_lock); + } + rq = __get_request(q, rw); spin_unlock_irq(&io_request_lock); } while (rq == NULL); - remove_wait_queue(&q->wait_for_requests[rw], &wait); + remove_wait_queue(&q->wait_for_requests, &wait); current->state = TASK_RUNNING; + + if (!waitqueue_active(&q->wait_for_requests)) + clear_full_and_wake(q); + return rq; } +static void get_request_wait_wakeup(request_queue_t *q, int rw) +{ + /* + * avoid losing an unplug if a second __get_request_wait did the + * generic_unplug_device while our __get_request_wait was running + * w/o the queue_lock held and w/ our request out of the queue. + */ + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); +} + /* RO fail safe mechanism */ static long ro_bits[MAX_BLKDEV][8]; @@ -818,7 +896,6 @@ void blkdev_release_request(struct request *req) { request_queue_t *q = req->q; - int rw = req->cmd; req->rq_status = RQ_INACTIVE; req->q = NULL; @@ -828,9 +905,19 @@ * assume it has free buffers and check waiters */ if (q) { - list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests) - wake_up(&q->wait_for_requests[rw]); + int oversized_batch = 0; + + if (q->can_throttle) + oversized_batch = blk_oversized_queue_batch(q); + q->rq.count++; + list_add(&req->queue, &q->rq.free); + if (q->rq.count >= q->batch_requests && !oversized_batch) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); + else + clear_full_and_wake(q); + } } } @@ -908,6 +995,7 @@ struct list_head *head, *insert_here; int latency; elevator_t *elevator = &q->elevator; + int should_wake = 0; count = bh->b_size >> 9; sector = bh->b_rsector; @@ -948,7 +1036,6 @@ */ max_sectors = get_max_sectors(bh->b_rdev); -again: req = NULL; head = &q->queue_head; /* @@ -957,7 +1044,9 @@ */ spin_lock_irq(&io_request_lock); +again: insert_here = head->prev; + if (list_empty(head)) { q->plug_device_fn(q, bh->b_rdev); /* is atomic */ goto get_rq; @@ -976,6 +1065,7 @@ req->bhtail = bh; req->nr_sectors = req->hard_nr_sectors += count; blk_started_io(count); + blk_started_sectors(req, count); drive_stat_acct(req->rq_dev, req->cmd, count, 0); req_new_io(req, 1, count); attempt_back_merge(q, req, max_sectors, max_segments); @@ -998,6 +1088,7 @@ req->sector = req->hard_sector = sector; req->nr_sectors = req->hard_nr_sectors += count; blk_started_io(count); + blk_started_sectors(req, count); drive_stat_acct(req->rq_dev, req->cmd, count, 0); req_new_io(req, 1, count); attempt_front_merge(q, head, req, max_sectors, max_segments); @@ -1030,7 +1121,7 @@ * See description above __get_request_wait() */ if (rw_ahead) { - if (q->rq[rw].count < q->batch_requests) { + if (q->rq.count < q->batch_requests || blk_oversized_queue_batch(q)) { spin_unlock_irq(&io_request_lock); goto end_io; } @@ -1042,6 +1133,9 @@ if (req == NULL) { spin_unlock_irq(&io_request_lock); freereq = __get_request_wait(q, rw); + head = &q->queue_head; + spin_lock_irq(&io_request_lock); + should_wake = 1; goto again; } } @@ -1064,10 +1158,13 @@ req->start_time = jiffies; req_new_io(req, 0, count); blk_started_io(count); + blk_started_sectors(req, count); add_request(q, req, insert_here); out: if (freereq) blkdev_release_request(freereq); + if (should_wake) + get_request_wait_wakeup(q, rw); spin_unlock_irq(&io_request_lock); return 0; end_io: @@ -1196,8 +1293,15 @@ bh->b_rdev = bh->b_dev; bh->b_rsector = bh->b_blocknr * count; + get_bh(bh); generic_make_request(rw, bh); + /* fix race condition with wait_on_buffer() */ + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(&bh->b_wait)) + wake_up(&bh->b_wait); + + put_bh(bh); switch (rw) { case WRITE: kstat.pgpgout += count; @@ -1350,6 +1454,7 @@ if ((bh = req->bh) != NULL) { nsect = bh->b_size >> 9; blk_finished_io(nsect); + blk_finished_sectors(req, nsect); req->bh = bh->b_reqnext; bh->b_reqnext = NULL; bh->b_end_io(bh, uptodate); @@ -1509,6 +1614,7 @@ EXPORT_SYMBOL(blk_get_queue); EXPORT_SYMBOL(blk_cleanup_queue); EXPORT_SYMBOL(blk_queue_headactive); +EXPORT_SYMBOL(blk_queue_throttle_sectors); EXPORT_SYMBOL(blk_queue_make_request); EXPORT_SYMBOL(generic_make_request); EXPORT_SYMBOL(blkdev_release_request); diff -urN --exclude '*.orig' --exclude '*.rej' parent/drivers/ide/ide-probe.c comp/drivers/ide/ide-probe.c --- parent/drivers/ide/ide-probe.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/drivers/ide/ide-probe.c 2003-06-25 14:11:55.000000000 -0400 @@ -971,6 +971,7 @@ q->queuedata = HWGROUP(drive); blk_init_queue(q, do_ide_request); + blk_queue_throttle_sectors(q, 1); } #undef __IRQ_HELL_SPIN diff -urN --exclude '*.orig' --exclude '*.rej' parent/drivers/scsi/scsi.c comp/drivers/scsi/scsi.c --- parent/drivers/scsi/scsi.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/drivers/scsi/scsi.c 2003-06-25 14:11:55.000000000 -0400 @@ -197,6 +197,7 @@ blk_init_queue(q, scsi_request_fn); blk_queue_headactive(q, 0); + blk_queue_throttle_sectors(q, 1); q->queuedata = (void *) SDpnt; } diff -urN --exclude '*.orig' --exclude '*.rej' parent/drivers/scsi/scsi_lib.c comp/drivers/scsi/scsi_lib.c --- parent/drivers/scsi/scsi_lib.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/drivers/scsi/scsi_lib.c 2003-06-25 14:11:55.000000000 -0400 @@ -378,6 +378,7 @@ if ((bh = req->bh) != NULL) { nsect = bh->b_size >> 9; blk_finished_io(nsect); + blk_finished_sectors(req, nsect); req->bh = bh->b_reqnext; bh->b_reqnext = NULL; sectors -= nsect; diff -urN --exclude '*.orig' --exclude '*.rej' parent/fs/buffer.c comp/fs/buffer.c --- parent/fs/buffer.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/fs/buffer.c 2003-06-25 14:11:53.000000000 -0400 @@ -153,10 +153,23 @@ get_bh(bh); add_wait_queue(&bh->b_wait, &wait); do { - run_task_queue(&tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!buffer_locked(bh)) break; + /* + * We must read tq_disk in TQ_ACTIVE after the + * add_wait_queue effect is visible to other cpus. + * We could unplug some line above it wouldn't matter + * but we can't do that right after add_wait_queue + * without an smp_mb() in between because spin_unlock + * has inclusive semantics. + * Doing it here is the most efficient place so we + * don't do a suprious unplug if we get a racy + * wakeup that make buffer_locked to return 0, and + * doing it here avoids an explicit smp_mb() we + * rely on the implicit one in set_task_state. + */ + run_task_queue(&tq_disk); schedule(); } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; @@ -1523,6 +1536,9 @@ /* Done - end_buffer_io_async will unlock */ SetPageUptodate(page); + + wakeup_page_waiters(page); + return 0; out: @@ -1554,6 +1570,7 @@ } while (bh != head); if (need_unlock) UnlockPage(page); + wakeup_page_waiters(page); return err; } @@ -1781,6 +1798,8 @@ else submit_bh(READ, bh); } + + wakeup_page_waiters(page); return 0; } @@ -2394,6 +2413,7 @@ submit_bh(rw, bh); bh = next; } while (bh != head); + wakeup_page_waiters(page); return 0; } diff -urN --exclude '*.orig' --exclude '*.rej' parent/fs/reiserfs/inode.c comp/fs/reiserfs/inode.c --- parent/fs/reiserfs/inode.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/fs/reiserfs/inode.c 2003-06-25 14:11:53.000000000 -0400 @@ -2209,6 +2209,7 @@ */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { UnlockPage(page) ; } diff -urN --exclude '*.orig' --exclude '*.rej' parent/include/linux/blkdev.h comp/include/linux/blkdev.h --- parent/include/linux/blkdev.h 2003-06-25 14:12:09.000000000 -0400 +++ comp/include/linux/blkdev.h 2003-06-25 14:11:56.000000000 -0400 @@ -64,12 +64,6 @@ typedef void (plug_device_fn) (request_queue_t *q, kdev_t device); typedef void (unplug_device_fn) (void *q); -/* - * Default nr free requests per queue, ll_rw_blk will scale it down - * according to available RAM at init time - */ -#define QUEUE_NR_REQUESTS 8192 - struct request_list { unsigned int count; struct list_head free; @@ -80,7 +74,7 @@ /* * the queue request freelist, one for reads and one for writes */ - struct request_list rq[2]; + struct request_list rq; /* * The total number of requests on each queue @@ -93,6 +87,21 @@ int batch_requests; /* + * The total number of 512byte blocks on each queue + */ + atomic_t nr_sectors; + + /* + * Batching threshold for sleep/wakeup decisions + */ + int batch_sectors; + + /* + * The max number of 512byte blocks on each queue + */ + int max_queue_sectors; + + /* * Together with queue_head for cacheline sharing */ struct list_head queue_head; @@ -118,13 +127,28 @@ /* * Boolean that indicates whether this queue is plugged or not. */ - char plugged; + int plugged:1; /* * Boolean that indicates whether current_request is active or * not. */ - char head_active; + int head_active:1; + + /* + * Booleans that indicate whether the queue's free requests have + * been exhausted and is waiting to drop below the batch_requests + * threshold + */ + int full:1; + + /* + * Boolean that indicates you will use blk_started_sectors + * and blk_finished_sectors in addition to blk_started_io + * and blk_finished_io. It enables the throttling code to + * help keep the size of the in sectors to a reasonable number + */ + int can_throttle:1; unsigned long bounce_pfn; @@ -137,7 +161,7 @@ /* * Tasks wait here for free read and write requests */ - wait_queue_head_t wait_for_requests[2]; + wait_queue_head_t wait_for_requests; }; #define blk_queue_plugged(q) (q)->plugged @@ -217,14 +241,16 @@ extern void generic_make_request(int rw, struct buffer_head * bh); extern inline request_queue_t *blk_get_queue(kdev_t dev); extern void blkdev_release_request(struct request *); +extern void blk_print_stats(kdev_t dev); /* * Access functions for manipulating queue properties */ -extern int blk_grow_request_list(request_queue_t *q, int nr_requests); +extern int blk_grow_request_list(request_queue_t *q, int nr_requests, int max_queue_sectors); extern void blk_init_queue(request_queue_t *, request_fn_proc *); extern void blk_cleanup_queue(request_queue_t *); extern void blk_queue_headactive(request_queue_t *, int); +extern void blk_queue_throttle_sectors(request_queue_t *, int); extern void blk_queue_make_request(request_queue_t *, make_request_fn *); extern void generic_unplug_device(void *); extern inline int blk_seg_merge_ok(struct buffer_head *, struct buffer_head *); @@ -243,6 +269,8 @@ #define MAX_SEGMENTS 128 #define MAX_SECTORS 255 +#define MAX_QUEUE_SECTORS (2 << (20 - 9)) /* 2 mbytes when full sized */ +#define MAX_NR_REQUESTS 1024 /* 1024k when in 512 units, normally min is 1M in 1k units */ #define PageAlignSize(size) (((size) + PAGE_SIZE -1) & PAGE_MASK) @@ -268,9 +296,51 @@ return retval; } +static inline int blk_oversized_queue(request_queue_t * q) +{ + if (q->can_throttle) + return atomic_read(&q->nr_sectors) > q->max_queue_sectors; + return q->rq.count == 0; +} + +static inline int blk_oversized_queue_batch(request_queue_t * q) +{ + return atomic_read(&q->nr_sectors) > q->max_queue_sectors - q->batch_sectors; +} + #define blk_finished_io(nsects) do { } while (0) #define blk_started_io(nsects) do { } while (0) +static inline void blk_started_sectors(struct request *rq, int count) +{ + request_queue_t *q = rq->q; + if (q && q->can_throttle) { + atomic_add(count, &q->nr_sectors); + if (atomic_read(&q->nr_sectors) < 0) { + printk("nr_sectors is %d\n", atomic_read(&q->nr_sectors)); + BUG(); + } + } +} + +static inline void blk_finished_sectors(struct request *rq, int count) +{ + request_queue_t *q = rq->q; + if (q && q->can_throttle) { + atomic_sub(count, &q->nr_sectors); + + smp_mb(); + if (q->rq.count >= q->batch_requests && !blk_oversized_queue_batch(q)) { + if (waitqueue_active(&q->wait_for_requests)) + wake_up(&q->wait_for_requests); + } + if (atomic_read(&q->nr_sectors) < 0) { + printk("nr_sectors is %d\n", atomic_read(&q->nr_sectors)); + BUG(); + } + } +} + static inline unsigned int blksize_bits(unsigned int size) { unsigned int bits = 8; diff -urN --exclude '*.orig' --exclude '*.rej' parent/include/linux/elevator.h comp/include/linux/elevator.h --- parent/include/linux/elevator.h 2003-06-25 14:12:09.000000000 -0400 +++ comp/include/linux/elevator.h 2003-06-25 14:11:55.000000000 -0400 @@ -80,7 +80,7 @@ return latency; } -#define ELV_LINUS_SEEK_COST 16 +#define ELV_LINUS_SEEK_COST 1 #define ELEVATOR_NOOP \ ((elevator_t) { \ @@ -93,8 +93,8 @@ #define ELEVATOR_LINUS \ ((elevator_t) { \ - 2048, /* read passovers */ \ - 8192, /* write passovers */ \ + 128, /* read passovers */ \ + 512, /* write passovers */ \ \ elevator_linus_merge, /* elevator_merge_fn */ \ elevator_linus_merge_req, /* elevator_merge_req_fn */ \ diff -urN --exclude '*.orig' --exclude '*.rej' parent/include/linux/pagemap.h comp/include/linux/pagemap.h --- parent/include/linux/pagemap.h 2003-06-25 14:12:09.000000000 -0400 +++ comp/include/linux/pagemap.h 2003-06-25 14:11:53.000000000 -0400 @@ -97,6 +97,8 @@ ___wait_on_page(page); } +extern void FASTCALL(wakeup_page_waiters(struct page * page)); + /* * Returns locked page at given index in given cache, creating it if needed. */ diff -urN --exclude '*.orig' --exclude '*.rej' parent/kernel/ksyms.c comp/kernel/ksyms.c --- parent/kernel/ksyms.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/kernel/ksyms.c 2003-06-25 14:11:53.000000000 -0400 @@ -296,6 +296,7 @@ EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); diff -urN --exclude '*.orig' --exclude '*.rej' parent/mm/filemap.c comp/mm/filemap.c --- parent/mm/filemap.c 2003-06-25 14:12:09.000000000 -0400 +++ comp/mm/filemap.c 2003-06-25 14:11:53.000000000 -0400 @@ -812,6 +812,20 @@ return &wait[hash]; } +/* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * head; + + head = page_waitqueue(page); + if (waitqueue_active(head)) + wake_up(head); +} + /* * Wait for a page to get unlocked. * ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-25 19:03 ` Chris Mason @ 2003-06-25 19:25 ` Andrea Arcangeli 2003-06-25 20:18 ` Chris Mason 2003-06-26 5:48 ` [PATCH] io stalls Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-25 19:25 UTC (permalink / raw) To: Chris Mason Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, Jun 25, 2003 at 03:03:43PM -0400, Chris Mason wrote: > Hello all, > > [ short version, the patch attached should fix io latencies in 2.4.21. > Please review and/or give it a try ] > > My last set of patches was directed at reducing the latencies in > __get_request_wait, which really helped reduce stalls when you had lots > of io to one device and balance_dirty() was causing pauses while you > tried to do io to other devices. > > But, a streaming write could still starve reads to the same device, > mostly because the read would have to send down any huge merged writes > that were before it in the queue. > > Andrea's kernel has a fix for this too, he limits the total number of > sectors that can be in the request queue at any given time. But, his > patches change blk_finished_io, both in the arguments it takes and the > side effects of calling it. I don't think we can merge his current form > without breaking external drivers. > > So, I added a can_throttle flag to the queue struct, drivers can enable > it if they are going to call the new blk_started_sectors and > blk_finished_sectors funcs any time they call blk_{started,finished}_io, > and these do all the -aa style sector throttling. > > There were a few other small changes to Andrea's patch, he wasn't > setting q->full when get_request decided there were too many sectors in wasn't is really in the past, because I'm doing it in 2.4.21rc8aa1 and in my latest status. > flight. This resulted in large latencies in __get_request_wait. He was > also unconditionally clearing q->full in blkdev_release_request, my code > only clears q->full when all the waiters are gone. my current code including the older 2.4.21rc8aa1 does that too, merged from your previous patches. > I changed generic_unplug_device to zero the elevator_sequence field of > the last request on the queue. This means there won't be any merges > with requests pending once an unplug is done, and helps limit the number > of sectors that need to be sent down during the run_task_queue(&tq_disk) > in wait_on_buffer. this sounds like an hack, forbidding merges is pretty bad for performance in general, of course most of the merging happens in between the unplugs, but during heavy I/O with frequent unplugs from many readers this may hurt performance. And as you said this mostly has the advantage of limiting the size of the queue, like I enforce in my tree with the elevator-lowlatency. I doubt this is right. > I lowered the -aa default limit on sectors in flight from 4MB to 2MB. I got a few complains for performance slowdown, originally it was 2MB, so I increased it to 4, from 4M it should be enough for most hardware. > We probably want an elvtune for it, large arrays with writeback cache > should be able to tolerate larger values. Yes, it largely depends on the speed of the device. > There's still a little work left to do, this patch enables sector > throttling for scsi and IDE. cciss, DAC960 and cpqarray need > modification too (99% done already in -aa). No sense in doing that > until after the bulk of the patch is reviewed though. > > As before, most of the code here is from Andrea and Nick, I've just > wrapped a lot of duct tape around it and done some tweaking. The > primary pieces are: > > fix-pausing (andrea, corner cases where wakeups are missed) > elevator-low-latency (andrea, limit sectors in flight) > queue_full (Nick, fairness in __get_request_wait) > > I've removed my latency stats for __get_request_wait in hopes of making > it a better merging candidate. this is very similar to my status in -aa, exept for the hack that forbids merging which I think is wrong and the fact you miss the wake_up_nr that I added to give a meaning to the batching again and that you don't avoid the unplugs in get_request_wait_wakeup until the queue is empty. I mean this: +static void get_request_wait_wakeup(request_queue_t *q, int rw) +{ + /* + * avoid losing an unplug if a second __get_request_wait did the + * generic_unplug_device while our __get_request_wait was running + * w/o the queue_lock held and w/ our request out of the queue. + */ + if (q->rq[rw].count == 0 && waitqueue_active(&q->wait_for_requests[rw])) + __generic_unplug_device(q); +} + you said last week the above is racy and it even hanged your box, could you elaborate? The above is in 2.4.21rc8aa1 and it works fine so far (though especially the race in wait_for_request is never been known to be reproducible) thanks, Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-25 19:25 ` Andrea Arcangeli @ 2003-06-25 20:18 ` Chris Mason 2003-06-27 8:41 ` write-caches, I/O stalls: MUST-FIX (was: [PATCH] io stalls) Matthias Andree 0 siblings, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-25 20:18 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, 2003-06-25 at 15:25, Andrea Arcangeli wrote: > > There were a few other small changes to Andrea's patch, he wasn't > > setting q->full when get_request decided there were too many sectors in > > wasn't is really in the past, because I'm doing it in 2.4.21rc8aa1 and > in my latest status. > Hmm, I thought I grabbed the patch from rc8aa1, clearly not though, sorry about that. > > I changed generic_unplug_device to zero the elevator_sequence field of > > the last request on the queue. This means there won't be any merges > > with requests pending once an unplug is done, and helps limit the number > > of sectors that need to be sent down during the run_task_queue(&tq_disk) > > in wait_on_buffer. > > this sounds like an hack, forbidding merges is pretty bad for > performance in general, of course most of the merging happens in between > the unplugs, but during heavy I/O with frequent unplugs from many > readers this may hurt performance. And as you said this mostly has the > advantage of limiting the size of the queue, like I enforce in my tree > with the elevator-lowlatency. I doubt this is right. > Well, I would hit sysrq-t when I noticed read stalls, and the reader was frequently in run_task_queue. I kept the hunk because it made a noticeable difference. I agree there's a throughput tradeoff here, my goal for the patch was to find the major places I could improve latency and change them, then go back later and decide if each one was worth it. Your elevator-lowlatency patch doesn't enforce sector limits for merged requests, so a merger could constantly come in and steal space in the sector limit from other waiters. This lead to high latency in __get_request_wait. That hunk for generic_unplug_device solves both of those problems. > > I lowered the -aa default limit on sectors in flight from 4MB to 2MB. > > I got a few complains for performance slowdown, originally it was 2MB, > so I increased it to 4, from 4M it should be enough for most hardware. > I've no preference really. I didn't notice a throughput difference but my scsi drives only have 2MB of cache. > this is very similar to my status in -aa, exept for the hack that > forbids merging which I think is wrong and the fact you miss the > wake_up_nr that I added to give a meaning to the batching again and that > you don't avoid the unplugs in get_request_wait_wakeup until the queue > is empty. I mean this: > > +static void get_request_wait_wakeup(request_queue_t *q, int rw) > +{ > + /* > + * avoid losing an unplug if a second __get_request_wait did the > + * generic_unplug_device while our __get_request_wait was > running > + * w/o the queue_lock held and w/ our request out of the queue. > + */ > + if (q->rq[rw].count == 0 && waitqueue_active(&q->wait_for_requests[rw])) > + __generic_unplug_device(q); > +} > + > > you said last week the above is racy and it even hanged your box, could > you elaborate? The above is in 2.4.21rc8aa1 and it works fine so far > (though especially the race in wait_for_request is never been known to > be reproducible) It caused hangs/stalls, but I didn't have the sector throttling code at the time and it really changes the interaction of things. I think the hang went a little like this: Lets say all the pending io is done, but the wait queue isn't empty yet because all the waiting tasks haven't yet been scheduled in. Also, we have fewer than nr_requests processes waiting to start io, so when they do all get scheduled in they won't generate an unplug. q->rq.count = q->nr_requests, q->full = 1 new io comes in, sees q->full = 1, unplugs and sleeps. No io is done because the queue is empty. All the old waiters finally get scheduled in and grab their requests, but get_request_wait_wakeup doesn't unplug because q->rq.count != 0. If no additional io comes in, the queue never gets unplugged, and our waiter never gets woken. With the sector throttling on, we've got additional wakeups coming from blk_finished_io (or blk_finished_sectors in my patch). I kept out the wakeup_nr idea because I couldn't figure out how to keep __get_request_wait fair with it in. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* write-caches, I/O stalls: MUST-FIX (was: [PATCH] io stalls) 2003-06-25 20:18 ` Chris Mason @ 2003-06-27 8:41 ` Matthias Andree 0 siblings, 0 replies; 114+ messages in thread From: Matthias Andree @ 2003-06-27 8:41 UTC (permalink / raw) To: lkml On Wed, 25 Jun 2003, Chris Mason wrote: > I've no preference really. I didn't notice a throughput difference but > my scsi drives only have 2MB of cache. You shouldn't be using the drive's write cache in the first place! The write cache, regardless of ATA or SCSI, can, as far as I know, not be used safely with any Linux file systems (and my questions whether 2.6 will finally change that went unanswered so far), because the write reordering the write cache can do can seriously damage file systems, whether journalling or not. Please conduct all your tests with write caches turned off because that's what matters in REAL systems; in that case, these latencies become a REAL pain in the back because writing is so much slower because of all the seeks. Optimizing for write cached behaviour can happen not a single second before: 1. the file systems know how to queue "ordered tags" in the right places (write barrier to enforce proper ordering for on-disk consistency, which I assume will make for a lot of ordered tags for writing to the journal itself) 2. the device driver knows how to map "ordered tags" to flush or whatever operations for drives that don't do tagged command queueing (ATA mostly, or SCSI when TCQ is switched off). All these "0-bytes in file" problems with XFS, ReiserFS, JFS, ext2 and ext3 in data=writeback mode happen because the kernel doesn't care about write ordering, and these broken files are a) occasionally hard to find, b) another PITA. I consider proper write ordering and enforcing thereof a MUST-FIX. This is much more important than getting some extra latencies squished. It must do the right thing in the first place, and then it can do the right thing faster. I am aware that you're not the only person responsible for the state Linux is in, and I'd like to see the write barriers revived ASAP for at least ext2/ext3/reiserfs, sym53c8xx, aic7xxx, tmscsim and IDE. I am sorry not being able to offer any help on that, I'm not acquainted with the kernel stuff and I can't donate money to anyone for me to do it. SCNR. -- Matthias Andree ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-25 19:03 ` Chris Mason 2003-06-25 19:25 ` Andrea Arcangeli @ 2003-06-26 5:48 ` Nick Piggin 2003-06-26 11:48 ` Chris Mason 1 sibling, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-26 5:48 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >Hello all, > >[ short version, the patch attached should fix io latencies in 2.4.21. >Please review and/or give it a try ] > >My last set of patches was directed at reducing the latencies in >__get_request_wait, which really helped reduce stalls when you had lots >of io to one device and balance_dirty() was causing pauses while you >tried to do io to other devices. > >But, a streaming write could still starve reads to the same device, >mostly because the read would have to send down any huge merged writes >that were before it in the queue. > >Andrea's kernel has a fix for this too, he limits the total number of >sectors that can be in the request queue at any given time. But, his >patches change blk_finished_io, both in the arguments it takes and the >side effects of calling it. I don't think we can merge his current form >without breaking external drivers. > >So, I added a can_throttle flag to the queue struct, drivers can enable >it if they are going to call the new blk_started_sectors and >blk_finished_sectors funcs any time they call blk_{started,finished}_io, >and these do all the -aa style sector throttling. > >There were a few other small changes to Andrea's patch, he wasn't >setting q->full when get_request decided there were too many sectors in >flight. This resulted in large latencies in __get_request_wait. He was >also unconditionally clearing q->full in blkdev_release_request, my code >only clears q->full when all the waiters are gone. > >I changed generic_unplug_device to zero the elevator_sequence field of >the last request on the queue. This means there won't be any merges >with requests pending once an unplug is done, and helps limit the number >of sectors that need to be sent down during the run_task_queue(&tq_disk) >in wait_on_buffer. > >I lowered the -aa default limit on sectors in flight from 4MB to 2MB. >We probably want an elvtune for it, large arrays with writeback cache >should be able to tolerate larger values. > >There's still a little work left to do, this patch enables sector >throttling for scsi and IDE. cciss, DAC960 and cpqarray need >modification too (99% done already in -aa). No sense in doing that >until after the bulk of the patch is reviewed though. > >As before, most of the code here is from Andrea and Nick, I've just >wrapped a lot of duct tape around it and done some tweaking. The >primary pieces are: > >fix-pausing (andrea, corner cases where wakeups are missed) >elevator-low-latency (andrea, limit sectors in flight) >queue_full (Nick, fairness in __get_request_wait) > I am hoping to go a slightly different way in 2.5 pending inclusion of process io contexts. If you had time to look over my changes there (in current mm tree) it would be appreciated, but they don't help your problem for 2.4. I found that my queue full fairness for 2.4 didn't address the batching issue well. It does, guarantee lowest possible maximum latency for singular requests, but due to lowered throughput this can cause worse "high level" latency. I couldn't find a really good, comprehensive method of allowing processes to batch without resorting to very complex wakeup methods unless process io contexts are used. The other possibility would be to keep a list of "batching" processes which should achieve the same as io contexts. An easier approach would be to just allow the last woken process to submit a batch of requests. This wouldn't have as good guaranteed fairness, but not to say that it would have starvation issues. I'll help you implement it if you are interested. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-26 5:48 ` [PATCH] io stalls Nick Piggin @ 2003-06-26 11:48 ` Chris Mason 2003-06-26 13:04 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-26 11:48 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, 2003-06-26 at 01:48, Nick Piggin wrote: > I am hoping to go a slightly different way in 2.5 pending > inclusion of process io contexts. If you had time to look > over my changes there (in current mm tree) it would be > appreciated, but they don't help your problem for 2.4. > > I found that my queue full fairness for 2.4 didn't address > the batching issue well. It does, guarantee lowest possible > maximum latency for singular requests, but due to lowered > throughput this can cause worse "high level" latency. > > I couldn't find a really good, comprehensive method of > allowing processes to batch without resorting to very > complex wakeup methods unless process io contexts are used. > The other possibility would be to keep a list of "batching" > processes which should achieve the same as io contexts. > > An easier approach would be to just allow the last woken > process to submit a batch of requests. This wouldn't have > as good guaranteed fairness, but not to say that it would > have starvation issues. I'll help you implement it if you > are interested. One of the things I tried in this area was basically queue ownership. When each process woke up, he was given strict ownership of the queue and could submit up to N number of requests. One process waited for ownership in a yield loop for a max limit of a certain number of jiffies, all the others waited on the request queue. It generally increased the latency in __get_request wait by a multiple of N. I didn't keep it because the current patch is already full of subtle interactions, I didn't want to make things more confusing than they already were ;-) The real problem with this approach is that we're guessing about the number of requests a given process wants to submit, and we're assuming those requests are going to be highly mergable. If the higher levels pass these hints down to the elevator, we should be able to do a better job of giving both low latency and high throughput. Between bios and the pdflush daemons, I think 2.5 is in pretty good shape to do what we need. I'm not 100% sure we need batching when the requests being submitted are not highly mergable, but I haven't put lots of thought into that part yet. Anyway for 2.4 I'm not sure there's much more we can do. I'd like to add tunables to the current patch, so userland can control the max io in flight and a simple toggle between throughput mode and latency mode on a per device basis. It's not perfect but should tide us over until 2.6. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-26 11:48 ` Chris Mason @ 2003-06-26 13:04 ` Nick Piggin 2003-06-26 13:18 ` Nick Piggin 2003-06-26 15:55 ` Chris Mason 0 siblings, 2 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-26 13:04 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Thu, 2003-06-26 at 01:48, Nick Piggin wrote: > > >>I am hoping to go a slightly different way in 2.5 pending >>inclusion of process io contexts. If you had time to look >>over my changes there (in current mm tree) it would be >>appreciated, but they don't help your problem for 2.4. >> >>I found that my queue full fairness for 2.4 didn't address >>the batching issue well. It does, guarantee lowest possible >>maximum latency for singular requests, but due to lowered >>throughput this can cause worse "high level" latency. >> >>I couldn't find a really good, comprehensive method of >>allowing processes to batch without resorting to very >>complex wakeup methods unless process io contexts are used. >>The other possibility would be to keep a list of "batching" >>processes which should achieve the same as io contexts. >> >>An easier approach would be to just allow the last woken >>process to submit a batch of requests. This wouldn't have >>as good guaranteed fairness, but not to say that it would >>have starvation issues. I'll help you implement it if you >>are interested. >> > >One of the things I tried in this area was basically queue ownership. >When each process woke up, he was given strict ownership of the queue >and could submit up to N number of requests. One process waited for >ownership in a yield loop for a max limit of a certain number of >jiffies, all the others waited on the request queue. > Not sure exactly what you mean by one process waiting for ownership in a yield loop, but why don't you simply allow the queue "owner" to submit up to a maximum of N requests within a time limit. Once either limit expires (or, rarely, another might become owner -) the process would just be put to sleep by the normal queue_full mechanism. > >It generally increased the latency in __get_request wait by a multiple >of N. I didn't keep it because the current patch is already full of >subtle interactions, I didn't want to make things more confusing than >they already were ;-) > Yeah, something like that. I think that in a queue full situation, the processes are wanting to submit more than 1 request though. So the better thoughput you can achieve by batching translates to better effective throughput. Read my recent debate with Andrea about this though - I couldn't convince him! I have seen much better maximum latencies, 2-3 times the throughput, and an order of magnitude less context switching on many threaded tiobench write loads when using batching. In short, measuring get_request latency won't give you the full story. > >The real problem with this approach is that we're guessing about the >number of requests a given process wants to submit, and we're assuming >those requests are going to be highly mergable. If the higher levels >pass these hints down to the elevator, we should be able to do a better >job of giving both low latency and high throughput. > No, the numbers (batch # requests, batch time) are not highly scientific. Simply when a process wakes up, we'll let them submit a small burst of requests before they go back to sleep. Now in 2.5 (mm) we can cheat and make this more effective, fair, and without possible missed wakes because io contexts means that multiple processes can be batching at the same time, and dynamically allocated requests means it doesn't matter if we go a bit over the queue limit. I think a decent solution for 2.4 would be to simply have the one queue owner, but he allowed the queue to fall below the batch limit, wake someone else and make them the owner. It can be a bit less fair, and it doesn't work across queues, but they're less important cases. > >Between bios and the pdflush daemons, I think 2.5 is in pretty good >shape to do what we need. I'm not 100% sure we need batching when the >requests being submitted are not highly mergable, but I haven't put lots >of thought into that part yet. > No, there are a couple of problems here. First, good locality != sequential. I saw tiobench 256 random write throughput _doubled_ because each process is writing within its own file. Second, mergeable doesn't mean anything if your request size only grows to say 128KB (IDE). I saw tiobench 256 sequential writes on IDE go from ~ 25% peak throughput to ~70% (4.85->14.11 from 20MB/s disk) Third, context switch rate. In the latest IBM regression tests, tiobench 64 on ext2, 8xSMP (so don't look at throughput!), average cs/s was about 2500 with mainline (FIFO request allocation), and 140 in mm (batching allocation). So nearly 20x better. This might not be due to batching alone, but I didn't see any other obvious change in mm. > >Anyway for 2.4 I'm not sure there's much more we can do. I'd like to >add tunables to the current patch, so userland can control the max io in >flight and a simple toggle between throughput mode and latency mode on a >per device basis. It's not perfect but should tide us over until 2.6. > > The changes do seem to be a critical fix due to the starvation issue, but I'm worried that they take a big step back in performance under high disk load. I found my FIFO mechanism to be unacceptably slow for 2.5. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-26 13:04 ` Nick Piggin @ 2003-06-26 13:18 ` Nick Piggin 2003-06-26 15:55 ` Chris Mason 1 sibling, 0 replies; 114+ messages in thread From: Nick Piggin @ 2003-06-26 13:18 UTC (permalink / raw) To: Nick Piggin Cc: Chris Mason, Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Nick Piggin wrote: snip > > Yeah, something like that. I think that in a queue full situation, > the processes are wanting to submit more than 1 request though. So > the better thoughput you can achieve by batching translates to > better effective throughput. Read my recent debate with Andrea ^^^^^^^^^^ Err, latency snip > > No, the numbers (batch # requests, batch time) are not highly scientific. > Simply when a process wakes up, we'll let them submit a small burst of > requests before they go back to sleep. by this, I mean that its not a big problem that we don't know how many requests a process wants to submit. snip > > The changes do seem to be a critical fix due to the starvation issue, > but I'm worried that they take a big step back in performance under > high disk load. I found my FIFO mechanism to be unacceptably slow for > 2.5. BTW. sorry for the lack of better benchmark numbers. I couldn't find good ones lying around. I found uniprocessor tiobench to be quite helpful at queue_nr_requests * 0.5, 2 threads to measure different types of overloadedness. Also, I didn't see much gain in read performance in my testing - probably due to AS. I expect 2.4 and 2.5 non AS read performance to show bigger improvements from batching (ie. regressions). ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-26 13:04 ` Nick Piggin 2003-06-26 13:18 ` Nick Piggin @ 2003-06-26 15:55 ` Chris Mason 2003-06-27 1:21 ` Nick Piggin 1 sibling, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-26 15:55 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller [-- Attachment #1: Type: text/plain, Size: 6641 bytes --] On Thu, 2003-06-26 at 09:04, Nick Piggin wrote: > >One of the things I tried in this area was basically queue ownership. > >When each process woke up, he was given strict ownership of the queue > >and could submit up to N number of requests. One process waited for > >ownership in a yield loop for a max limit of a certain number of > >jiffies, all the others waited on the request queue. > > > > Not sure exactly what you mean by one process waiting for ownership > in a yield loop, but why don't you simply allow the queue "owner" to > submit up to a maximum of N requests within a time limit. Once either > limit expires (or, rarely, another might become owner -) the process > would just be put to sleep by the normal queue_full mechanism. > You need some way to wakeup the queue after that time limit has expired, in case the owner never submits another request. This can either be a timer or a process in a yield loop. Given that very short expire time I set (10 jiffies), I went for the yield loop. > > > >It generally increased the latency in __get_request wait by a multiple > >of N. I didn't keep it because the current patch is already full of > >subtle interactions, I didn't want to make things more confusing than > >they already were ;-) > > > > Yeah, something like that. I think that in a queue full situation, > the processes are wanting to submit more than 1 request though. So > the better thoughput you can achieve by batching translates to > better effective throughput. Read my recent debate with Andrea > about this though - I couldn't convince him! > Well, it depends ;-) I think we've got 3 basic kinds of procs during a q->full condition: 1) wants to submit lots of somewhat contiguous io 2) wants to submit a single io 3) wants to submit lots of random io >From a throughput point of view, we only care about giving batch ownership to #1. giving batch ownership to #3 will help reduce context switches, but if it helps throughput than the io wasn't really random (you've got a good point about locality below, drive write caches make a huge difference there). The problem I see in 2.4 is the elevator can't tell any of these cases apart, so any attempt at batch ownership is certain to be wrong at least part of the time. > I have seen much better maximum latencies, 2-3 times the > throughput, and an order of magnitude less context switching on > many threaded tiobench write loads when using batching. > > In short, measuring get_request latency won't give you the full > story. > Very true. But get_request latency is the minimum amount of time a single read is going to wait (in 2.4.x anyway), and that is what we need to focus on when we're trying to fix interactive performance. > > > >The real problem with this approach is that we're guessing about the > >number of requests a given process wants to submit, and we're assuming > >those requests are going to be highly mergable. If the higher levels > >pass these hints down to the elevator, we should be able to do a better > >job of giving both low latency and high throughput. > > > > No, the numbers (batch # requests, batch time) are not highly scientific. > Simply when a process wakes up, we'll let them submit a small burst of > requests before they go back to sleep. Now in 2.5 (mm) we can cheat and > make this more effective, fair, and without possible missed wakes because > io contexts means that multiple processes can be batching at the same > time, and dynamically allocated requests means it doesn't matter if we > go a bit over the queue limit. > I agree 2.5 has a lot more room for the contexts to be effective, and I think they are a really good idea. > I think a decent solution for 2.4 would be to simply have the one queue > owner, but he allowed the queue to fall below the batch limit, wake > someone else and make them the owner. It can be a bit less fair, and > it doesn't work across queues, but they're less important cases. > > > > >Between bios and the pdflush daemons, I think 2.5 is in pretty good > >shape to do what we need. I'm not 100% sure we need batching when the > >requests being submitted are not highly mergable, but I haven't put lots > >of thought into that part yet. > > > > No, there are a couple of problems here. > First, good locality != sequential. I saw tiobench 256 random write > throughput _doubled_ because each process is writing within its own > file. > > Second, mergeable doesn't mean anything if your request size only > grows to say 128KB (IDE). I saw tiobench 256 sequential writes on IDE > go from ~ 25% peak throughput to ~70% (4.85->14.11 from 20MB/s disk) Well, play around with raw io, my box writes at roughly disk speed with 128k synchronous requests (contiguous writes). > Third, context switch rate. In the latest IBM regression tests, > tiobench 64 on ext2, 8xSMP (so don't look at throughput!), average > cs/s was about 2500 with mainline (FIFO request allocation), and > 140 in mm (batching allocation). So nearly 20x better. This might > not be due to batching alone, but I didn't see any other obvious > change in mm. > Makes sense. > > > >Anyway for 2.4 I'm not sure there's much more we can do. I'd like to > >add tunables to the current patch, so userland can control the max io in > >flight and a simple toggle between throughput mode and latency mode on a > >per device basis. It's not perfect but should tide us over until 2.6. > > > > > > The changes do seem to be a critical fix due to the starvation issue, > but I'm worried that they take a big step back in performance under > high disk load. I found my FIFO mechanism to be unacceptably slow for > 2.5. Me too, but I'm not sure how to fix it other than a userspace knob to turn off the q->full checks for server workloads. Andrea's elevator-lowlatency alone has pretty good throughput numbers, since it still allows request stealing. Its get_request_wait latency numbers aren't horrible either, it only suffers in a few corner cases. But, if someone wants to play with this more, I've attached a quick remerge of my batch ownership code. I made a read and write owner, so that a reader doing a single request doesn't grab ownership and make all the writes wait. It does make throughput better overall, and it also makes latencies worse overall. We'll probably get similar results just by disabling q->full in io-stalls-7, but the batch ownership does a better job of limiting get_request latencies at a fixed (although potentially large) number. lat-stat-5.diff goes on top of io-stalls-7.diff from yesterday batch_owner.diff goes on top of lat-stat-5.diff. -chris [-- Attachment #2: batch_owner.diff --] [-- Type: text/plain, Size: 4506 bytes --] ===== drivers/block/ll_rw_blk.c 1.47 vs edited ===== --- 1.47/drivers/block/ll_rw_blk.c Thu Jun 26 09:20:08 2003 +++ edited/drivers/block/ll_rw_blk.c Thu Jun 26 10:52:17 2003 @@ -592,6 +592,8 @@ q->full = 0; q->can_throttle = 0; + memset(q->batch, 0, sizeof(struct queue_batch) * 2); + reset_stats(q); /* @@ -606,6 +608,48 @@ blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH); } +#define MSEC(x) ((x) * 1000 / HZ) +#define BATCH_MAX_AGE 100 +int grab_batch_ownership(request_queue_t *q, int rw) +{ + struct task_struct *tsk = current; + unsigned long age; + struct queue_batch *batch = q->batch + rw; + + if (batch->batch_waiter) + return 0; + if (!batch->batch_owner) + goto grab; + batch->batch_waiter = tsk; + while(1) { + age = jiffies - batch->batch_jiffies; + if (!batch->batch_owner || MSEC(age) > BATCH_MAX_AGE) + break; + set_current_state(TASK_RUNNING); + spin_unlock_irq(&io_request_lock); + schedule(); + spin_lock_irq(&io_request_lock); + } + batch->batch_waiter = NULL; +grab: + batch->batch_owner = tsk; + batch->batch_jiffies = jiffies; + batch->batch_remaining = q->batch_requests; + return 1; +} + +void decrement_batch_request(request_queue_t *q, int rw) +{ + struct queue_batch *batch = q->batch + rw; + if (batch->batch_owner == current) { + batch->batch_remaining--; + if (!batch->batch_remaining || + MSEC(jiffies - batch->batch_jiffies) > BATCH_MAX_AGE) { + batch->batch_owner = NULL; + } + } +} + #define blkdev_free_rq(list) list_entry((list)->next, struct request, queue); /* * Get a free request. io_request_lock must be held and interrupts @@ -625,6 +669,7 @@ rq->cmd = rw; rq->special = NULL; rq->q = q; + decrement_batch_request(q, rw); } else q->full = 1; return rq; @@ -635,7 +680,7 @@ */ static inline struct request *get_request(request_queue_t *q, int rw) { - if (q->full) + if (q->full && q->batch[rw].batch_owner != current) return NULL; return __get_request(q, rw); } @@ -698,25 +743,28 @@ static struct request *__get_request_wait(request_queue_t *q, int rw) { - register struct request *rq; + register struct request *rq = NULL; unsigned long wait_start = jiffies; unsigned long time_waited; DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(&q->wait_for_requests, &wait); + spin_lock_irq(&io_request_lock); do { set_current_state(TASK_UNINTERRUPTIBLE); - spin_lock_irq(&io_request_lock); if (q->full || blk_oversized_queue(q)) { - __generic_unplug_device(q); + if (blk_oversized_queue(q)) + __generic_unplug_device(q); spin_unlock_irq(&io_request_lock); schedule(); spin_lock_irq(&io_request_lock); + if (!grab_batch_ownership(q, rw)) + continue; } rq = __get_request(q, rw); - spin_unlock_irq(&io_request_lock); } while (rq == NULL); + spin_unlock_irq(&io_request_lock); remove_wait_queue(&q->wait_for_requests, &wait); current->state = TASK_RUNNING; @@ -978,9 +1026,9 @@ list_add(&req->queue, &q->rq.free); if (q->rq.count >= q->batch_requests && !oversized_batch) { smp_mb(); - if (waitqueue_active(&q->wait_for_requests)) + if (waitqueue_active(&q->wait_for_requests)) { wake_up(&q->wait_for_requests); - else + } else clear_full_and_wake(q); } } ===== include/linux/blkdev.h 1.25 vs edited ===== --- 1.25/include/linux/blkdev.h Thu Jun 26 09:20:08 2003 +++ edited/include/linux/blkdev.h Thu Jun 26 10:50:17 2003 @@ -69,6 +69,15 @@ struct list_head free; }; +struct queue_batch +{ + struct task_struct *batch_owner; + struct task_struct *batch_waiter; + unsigned long batch_jiffies; + int batch_remaining; + +}; + struct request_queue { /* @@ -141,7 +150,7 @@ * threshold */ int full:1; - + /* * Boolean that indicates you will use blk_started_sectors * and blk_finished_sectors in addition to blk_started_io @@ -162,6 +171,9 @@ * Tasks wait here for free read and write requests */ wait_queue_head_t wait_for_requests; + + struct queue_batch batch[2]; + unsigned long max_wait; unsigned long min_wait; unsigned long total_wait; @@ -278,7 +290,7 @@ #define MAX_SEGMENTS 128 #define MAX_SECTORS 255 -#define MAX_QUEUE_SECTORS (2 << (20 - 9)) /* 4 mbytes when full sized */ +#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ #define MAX_NR_REQUESTS 1024 /* 1024k when in 512 units, normally min is 1M in 1k units */ #define PageAlignSize(size) (((size) + PAGE_SIZE -1) & PAGE_MASK) [-- Attachment #3: lat-stat-5.diff --] [-- Type: text/plain, Size: 4473 bytes --] reverted: --- b/drivers/block/blkpg.c Thu Jun 26 09:12:14 2003 +++ a/drivers/block/blkpg.c Thu Jun 26 09:12:14 2003 @@ -261,6 +261,7 @@ return blkpg_ioctl(dev, (struct blkpg_ioctl_arg *) arg); case BLKELVGET: + blk_print_stats(dev); return blkelvget_ioctl(&blk_get_queue(dev)->elevator, (blkelv_ioctl_arg_t *) arg); case BLKELVSET: reverted: --- b/drivers/block/ll_rw_blk.c Thu Jun 26 09:12:14 2003 +++ a/drivers/block/ll_rw_blk.c Thu Jun 26 09:12:14 2003 @@ -490,6 +490,56 @@ spin_lock_init(&q->queue_lock); } +void blk_print_stats(kdev_t dev) +{ + request_queue_t *q; + unsigned long avg_wait; + unsigned long min_wait; + unsigned long high_wait; + unsigned long *d; + + q = blk_get_queue(dev); + if (!q) + return; + + min_wait = q->min_wait; + if (min_wait == ~0UL) + min_wait = 0; + if (q->num_wait) + avg_wait = q->total_wait / q->num_wait; + else + avg_wait = 0; + printk("device %s: num_req %lu, total jiffies waited %lu\n", + kdevname(dev), q->num_req, q->total_wait); + printk("\t%lu forced to wait\n", q->num_wait); + printk("\t%lu min wait, %lu max wait\n", min_wait, q->max_wait); + printk("\t%lu average wait\n", avg_wait); + d = q->deviation; + printk("\t%lu < 100, %lu < 200, %lu < 300, %lu < 400, %lu < 500\n", + d[0], d[1], d[2], d[3], d[4]); + high_wait = d[0] + d[1] + d[2] + d[3] + d[4]; + high_wait = q->num_wait - high_wait; + printk("\t%lu waits longer than 500 jiffies\n", high_wait); +} + +static void reset_stats(request_queue_t *q) +{ + q->max_wait = 0; + q->min_wait = ~0UL; + q->total_wait = 0; + q->num_req = 0; + q->num_wait = 0; + memset(q->deviation, 0, sizeof(q->deviation)); +} +void blk_reset_stats(kdev_t dev) +{ + request_queue_t *q; + q = blk_get_queue(dev); + if (!q) + return; + printk("reset latency stats on device %s\n", kdevname(dev)); + reset_stats(q); +} static int __make_request(request_queue_t * q, int rw, struct buffer_head * bh); /** @@ -542,6 +592,8 @@ q->full = 0; q->can_throttle = 0; + reset_stats(q); + /* * These booleans describe the queue properties. We set the * default (and most common) values here. Other drivers can @@ -647,6 +699,8 @@ static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; + unsigned long wait_start = jiffies; + unsigned long time_waited; DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(&q->wait_for_requests, &wait); @@ -669,6 +723,17 @@ if (!waitqueue_active(&q->wait_for_requests)) clear_full_and_wake(q); + time_waited = jiffies - wait_start; + if (time_waited > q->max_wait) + q->max_wait = time_waited; + if (time_waited && time_waited < q->min_wait) + q->min_wait = time_waited; + q->total_wait += time_waited; + q->num_wait++; + if (time_waited < 500) { + q->deviation[time_waited/100]++; + } + return rq; } @@ -1157,6 +1222,7 @@ req->rq_dev = bh->b_rdev; req->start_time = jiffies; req_new_io(req, 0, count); + q->num_req++; blk_started_io(count); blk_started_sectors(req, count); add_request(q, req, insert_here); reverted: --- b/fs/super.c Thu Jun 26 09:12:14 2003 +++ a/fs/super.c Thu Jun 26 09:12:14 2003 @@ -726,6 +726,7 @@ if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0)) goto Einval; s->s_flags |= MS_ACTIVE; + blk_reset_stats(dev); path_release(&nd); return s; reverted: --- b/include/linux/blkdev.h Thu Jun 26 09:12:14 2003 +++ a/include/linux/blkdev.h Thu Jun 26 09:12:14 2003 @@ -162,8 +162,17 @@ * Tasks wait here for free read and write requests */ wait_queue_head_t wait_for_requests; + unsigned long max_wait; + unsigned long min_wait; + unsigned long total_wait; + unsigned long num_req; + unsigned long num_wait; + unsigned long deviation[5]; }; +void blk_reset_stats(kdev_t dev); +void blk_print_stats(kdev_t dev); + #define blk_queue_plugged(q) (q)->plugged #define blk_fs_request(rq) ((rq)->cmd == READ || (rq)->cmd == WRITE) #define blk_queue_empty(q) list_empty(&(q)->queue_head) @@ -269,7 +278,7 @@ #define MAX_SEGMENTS 128 #define MAX_SECTORS 255 +#define MAX_QUEUE_SECTORS (2 << (20 - 9)) /* 4 mbytes when full sized */ -#define MAX_QUEUE_SECTORS (2 << (20 - 9)) /* 2 mbytes when full sized */ #define MAX_NR_REQUESTS 1024 /* 1024k when in 512 units, normally min is 1M in 1k units */ #define PageAlignSize(size) (((size) + PAGE_SIZE -1) & PAGE_MASK) ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-26 15:55 ` Chris Mason @ 2003-06-27 1:21 ` Nick Piggin 2003-06-27 1:39 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-27 1:21 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Thu, 2003-06-26 at 09:04, Nick Piggin wrote: > > >>>One of the things I tried in this area was basically queue ownership. >>>When each process woke up, he was given strict ownership of the queue >>>and could submit up to N number of requests. One process waited for >>>ownership in a yield loop for a max limit of a certain number of >>>jiffies, all the others waited on the request queue. >>> >>> >>Not sure exactly what you mean by one process waiting for ownership >>in a yield loop, but why don't you simply allow the queue "owner" to >>submit up to a maximum of N requests within a time limit. Once either >>limit expires (or, rarely, another might become owner -) the process >>would just be put to sleep by the normal queue_full mechanism. >> >> > >You need some way to wakeup the queue after that time limit has expired, >in case the owner never submits another request. This can either be a >timer or a process in a yield loop. Given that very short expire time I >set (10 jiffies), I went for the yield loop. > > >>>It generally increased the latency in __get_request wait by a multiple >>>of N. I didn't keep it because the current patch is already full of >>>subtle interactions, I didn't want to make things more confusing than >>>they already were ;-) >>> >>> >>Yeah, something like that. I think that in a queue full situation, >>the processes are wanting to submit more than 1 request though. So >>the better thoughput you can achieve by batching translates to >>better effective throughput. Read my recent debate with Andrea >>about this though - I couldn't convince him! >> >> > >Well, it depends ;-) I think we've got 3 basic kinds of procs during a >q->full condition: > >1) wants to submit lots of somewhat contiguous io >2) wants to submit a single io >3) wants to submit lots of random io > >>From a throughput point of view, we only care about giving batch >ownership to #1. giving batch ownership to #3 will help reduce context >switches, but if it helps throughput than the io wasn't really random >(you've got a good point about locality below, drive write caches make a >huge difference there). > >The problem I see in 2.4 is the elevator can't tell any of these cases >apart, so any attempt at batch ownership is certain to be wrong at least >part of the time. > I think though, for fairness, if we allow one to submit a batch of requests, we have to give that opportunity to the others. And yeah, it does reduce context switches, and it does improve throughput for "random" localised IO. > > >>I have seen much better maximum latencies, 2-3 times the >>throughput, and an order of magnitude less context switching on >>many threaded tiobench write loads when using batching. >> >>In short, measuring get_request latency won't give you the full >>story. >> >> > >Very true. But get_request latency is the minimum amount of time a >single read is going to wait (in 2.4.x anyway), and that is what we need >to focus on when we're trying to fix interactive performance. > The read situation is different to write. To fill the read queue, you need queue_nr_requests / 2-3 (for readahead) reading processes to fill the queue, more if the reads are random. If this kernel is being used interactively, its not our fault we might not give quite as good interactive performance. I'm sure the fileserver admin would rather take the tripled bandwidth ;) That said, I think a lot of interactive programs will want to do more than 1 request at a time anyway. > >>>The real problem with this approach is that we're guessing about the >>>number of requests a given process wants to submit, and we're assuming >>>those requests are going to be highly mergable. If the higher levels >>>pass these hints down to the elevator, we should be able to do a better >>>job of giving both low latency and high throughput. >>> >>> >>No, the numbers (batch # requests, batch time) are not highly scientific. >>Simply when a process wakes up, we'll let them submit a small burst of >>requests before they go back to sleep. Now in 2.5 (mm) we can cheat and >>make this more effective, fair, and without possible missed wakes because >>io contexts means that multiple processes can be batching at the same >>time, and dynamically allocated requests means it doesn't matter if we >>go a bit over the queue limit. >> >> > >I agree 2.5 has a lot more room for the contexts to be effective, and I >think they are a really good idea. > > >>I think a decent solution for 2.4 would be to simply have the one queue >>owner, but he allowed the queue to fall below the batch limit, wake >>someone else and make them the owner. It can be a bit less fair, and >>it doesn't work across queues, but they're less important cases. >> >> >>>Between bios and the pdflush daemons, I think 2.5 is in pretty good >>>shape to do what we need. I'm not 100% sure we need batching when the >>>requests being submitted are not highly mergable, but I haven't put lots >>>of thought into that part yet. >>> >>> >>No, there are a couple of problems here. >>First, good locality != sequential. I saw tiobench 256 random write >>throughput _doubled_ because each process is writing within its own >>file. >> >>Second, mergeable doesn't mean anything if your request size only >>grows to say 128KB (IDE). I saw tiobench 256 sequential writes on IDE >>go from ~ 25% peak throughput to ~70% (4.85->14.11 from 20MB/s disk) >> > >Well, play around with raw io, my box writes at roughly disk speed with >128k synchronous requests (contiguous writes). > Yeah, I'm not talking about request overhead - I think a 128K sized request is just fine. But when there are 256 threads writing, with FIFO method, 128 threads will each have 1 request in the queue. If they are sequential writers, each request will probably be 128K. That isn't enough to get good disk bandwidth. The elevator _has_ to make a suboptimal decision. With batching, say 8 processes have 16 sequential requests on the queue each. The elevator can make good choices. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-27 1:21 ` Nick Piggin @ 2003-06-27 1:39 ` Chris Mason 2003-06-27 9:45 ` Nick Piggin 0 siblings, 1 reply; 114+ messages in thread From: Chris Mason @ 2003-06-27 1:39 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Thu, 2003-06-26 at 21:21, Nick Piggin wrote: > >Very true. But get_request latency is the minimum amount of time a > >single read is going to wait (in 2.4.x anyway), and that is what we need > >to focus on when we're trying to fix interactive performance. > > > > The read situation is different to write. To fill the read queue, > you need queue_nr_requests / 2-3 (for readahead) reading processes > to fill the queue, more if the reads are random. > If this kernel is being used interactively, its not our fault we > might not give quite as good interactive performance. I'm sure > the fileserver admin would rather take the tripled bandwidth ;) > > That said, I think a lot of interactive programs will want to do > more than 1 request at a time anyway. > My intuition agrees with yours, but if this is true then andrea's old elevator-lowlatency patch alone is enough, and we don't need q->full at all. Users continued to complain of bad latencies even with his code applied. >From a practical point of view his old code is the same as the batch wakeup code for get_request latencies and provides good throughput. There are a few cases where batch wakeup has shorter overall latencies, but I don't think people were in those heavy workloads while they were complaining of stalls in -aa. > >>Second, mergeable doesn't mean anything if your request size only > >>grows to say 128KB (IDE). I saw tiobench 256 sequential writes on IDE > >>go from ~ 25% peak throughput to ~70% (4.85->14.11 from 20MB/s disk) > >> > > > >Well, play around with raw io, my box writes at roughly disk speed with > >128k synchronous requests (contiguous writes). > > > > Yeah, I'm not talking about request overhead - I think a 128K sized > request is just fine. But when there are 256 threads writing, with > FIFO method, 128 threads will each have 1 request in the queue. If > they are sequential writers, each request will probably be 128K. > That isn't enough to get good disk bandwidth. The elevator _has_ to > make a suboptimal decision. > > With batching, say 8 processes have 16 sequential requests on the > queue each. The elevator can make good choices. I agree here too, it just doesn't match the user reports we've been getting in 2.4 ;-) If 2.5 can dynamically allocate requests now and then you can get much better results with io contexts/dynamic wakeups, but I can't see how to make it work in 2.4 without larger backports. So, the way I see things, we've got a few choices. 1) do nothing. 2.6 isn't that far off. 2) add elevator-lowlatency without q->full. It solves 90% of the problem 3) add q->full as well and make it the default. Great latencies, not so good throughput. Add userland tunables so people can switch. 4) back port some larger chunk of 2.5 and find a better overall solution. I vote for #3, don't care much if q->full is on or off by default, as long as we make an easy way for people to set it. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-27 1:39 ` Chris Mason @ 2003-06-27 9:45 ` Nick Piggin 2003-06-27 12:41 ` Chris Mason 0 siblings, 1 reply; 114+ messages in thread From: Nick Piggin @ 2003-06-27 9:45 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller Chris Mason wrote: >On Thu, 2003-06-26 at 21:21, Nick Piggin wrote: > > >>>Very true. But get_request latency is the minimum amount of time a >>>single read is going to wait (in 2.4.x anyway), and that is what we need >>>to focus on when we're trying to fix interactive performance. >>> >>> >>The read situation is different to write. To fill the read queue, >>you need queue_nr_requests / 2-3 (for readahead) reading processes >>to fill the queue, more if the reads are random. >>If this kernel is being used interactively, its not our fault we >>might not give quite as good interactive performance. I'm sure >>the fileserver admin would rather take the tripled bandwidth ;) >> >>That said, I think a lot of interactive programs will want to do >>more than 1 request at a time anyway. >> >> > >My intuition agrees with yours, but if this is true then andrea's old >elevator-lowlatency patch alone is enough, and we don't need q->full at >all. Users continued to complain of bad latencies even with his code >applied. > Didn't that still have the starvation issues in get_request that my patch addressed though? This batching is needed due to the strict FIFO behaviour that my "q->full" thing did. > >>From a practical point of view his old code is the same as the batch >wakeup code for get_request latencies and provides good throughput. >There are a few cases where batch wakeup has shorter overall latencies, >but I don't think people were in those heavy workloads while they were >complaining of stalls in -aa. > > >>>>Second, mergeable doesn't mean anything if your request size only >>>>grows to say 128KB (IDE). I saw tiobench 256 sequential writes on IDE >>>>go from ~ 25% peak throughput to ~70% (4.85->14.11 from 20MB/s disk) >>>> >>>> >>>Well, play around with raw io, my box writes at roughly disk speed with >>>128k synchronous requests (contiguous writes). >>> >>> >>Yeah, I'm not talking about request overhead - I think a 128K sized >>request is just fine. But when there are 256 threads writing, with >>FIFO method, 128 threads will each have 1 request in the queue. If >>they are sequential writers, each request will probably be 128K. >>That isn't enough to get good disk bandwidth. The elevator _has_ to >>make a suboptimal decision. >> >>With batching, say 8 processes have 16 sequential requests on the >>queue each. The elevator can make good choices. >> > >I agree here too, it just doesn't match the user reports we've been >getting in 2.4 ;-) If 2.5 can dynamically allocate requests now and >then you can get much better results with io contexts/dynamic wakeups, >but I can't see how to make it work in 2.4 without larger backports. > >So, the way I see things, we've got a few choices. > >1) do nothing. 2.6 isn't that far off. > >2) add elevator-lowlatency without q->full. It solves 90% of the >problem > >3) add q->full as well and make it the default. Great latencies, not so >good throughput. Add userland tunables so people can switch. > >4) back port some larger chunk of 2.5 and find a better overall >solution. > >I vote for #3, don't care much if q->full is on or off by default, as >long as we make an easy way for people to set it. > 5) include the "q->full" starvation fix; add the concept of a queue owner, the batching process. I'm a bit busy at the moment and so I won't test this, unfortunately. I would prefer that if something like #5 doesn't get in, then nothing be done for .22 unless its backed up by a few decent benchmarks. But its not my call anyway. Cheers, Nick ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-27 9:45 ` Nick Piggin @ 2003-06-27 12:41 ` Chris Mason 0 siblings, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-27 12:41 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Fri, 2003-06-27 at 05:45, Nick Piggin wrote: > Chris Mason wrote: > >>> > >>The read situation is different to write. To fill the read queue, > >>you need queue_nr_requests / 2-3 (for readahead) reading processes > >>to fill the queue, more if the reads are random. > >>If this kernel is being used interactively, its not our fault we > >>might not give quite as good interactive performance. I'm sure > >>the fileserver admin would rather take the tripled bandwidth ;) > >> > >>That said, I think a lot of interactive programs will want to do > >>more than 1 request at a time anyway. > >> > >> > > > >My intuition agrees with yours, but if this is true then andrea's old > >elevator-lowlatency patch alone is enough, and we don't need q->full at > >all. Users continued to complain of bad latencies even with his code > >applied. > > > > Didn't that still have the starvation issues in get_request that > my patch addressed though? This batching is needed due to the > strict FIFO behaviour that my "q->full" thing did. > Sure, but even though the batch wakeup code didn't have starvation issues, the overall get_request latency was still high. The end result was basically the same, without q->full we've got a higher max wait and a lower average wait. With batch wakeup we've got a higher average (300-400 jiffies) and a lower max (800-900 jiffies). Especially for things like directory listings, where 2.4 generally does io a few blocks at a time, the get_request latency is a big part of the latency an interactive user sees. > >So, the way I see things, we've got a few choices. > > > >1) do nothing. 2.6 isn't that far off. > > > >2) add elevator-lowlatency without q->full. It solves 90% of the > >problem > > > >3) add q->full as well and make it the default. Great latencies, not so > >good throughput. Add userland tunables so people can switch. > > > >4) back port some larger chunk of 2.5 and find a better overall > >solution. > > > >I vote for #3, don't care much if q->full is on or off by default, as > >long as we make an easy way for people to set it. > > > > 5) include the "q->full" starvation fix; add the concept of a > queue owner, the batching process. > I've tried two different approaches to #5, the first is a just a batch_owner where other procs are still allowed to grab requests and the owner was allowed to ignore q->full. The end result was low latencies but not much better throughput. With a small number of procs, you've got a good chance bdflush is going to get ownership and the throughput is pretty good. With more procs the probability of that goes down and the throughput benefit goes away. My second attempt was the batch wakeup patch from yesterday. Overall I don't feel the latencies are significantly better with that patch than with Andrea's elevator-lowlatency and q->full disabled. > I'm a bit busy at the moment and so I won't test this, unfortunately. > I would prefer that if something like #5 doesn't get in, then nothing > be done for .22 unless its backed up by a few decent benchmarks. But > its not my call anyway. > Andrea's code without q->full is a good starting point regardless. The throughput is good and the latencies are better overall. q->full is simple enough that making it available via a tunable is pretty easy. I really do wish I could make one patch that works well for both, but I've honestly run out of ideas ;-) -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] io stalls 2003-06-12 2:41 ` Nick Piggin 2003-06-12 2:46 ` Andrea Arcangeli @ 2003-06-12 11:57 ` Chris Mason 1 sibling, 0 replies; 114+ messages in thread From: Chris Mason @ 2003-06-12 11:57 UTC (permalink / raw) To: Nick Piggin Cc: Andrea Arcangeli, Marc-Christian Petersen, Jens Axboe, Marcelo Tosatti, Georg Nikodym, lkml, Matthias Mueller On Wed, 2003-06-11 at 22:41, Nick Piggin wrote: > >I think the only time we really need to wakeup more than one waiter is > >when we hit the q->batch_request mark. After that, each new request > >that is freed can be matched with a single waiter, and we know that any > >previously finished requests have probably already been matched to their > >own waiter. > > > > > Nope. Not even then. Each retiring request should submit > a wake up, and the process will submit another request. > So the number of requests will be held at the batch_request > mark until no more waiters. > > Now that begs the question, why have batch_requests anymore? > It no longer does anything. > We've got many flavors of the patch discussed in this thread, so this needs a little qualification. When get_request_wait_wakeup wakes one of the waiters (as in the patch I sent yesterday), you want to make sure that after you wake the first waiter there is a request available for the proccess he is going to wake up, and so on for each other waiter. I did a quick test of this yesterday, and under the 20 proc iozone test, turning off batch_requests more than doubled the number of context switches hit during the run, I'm assuming this was from wakeups that failed to find requests. I'm doing a few tests with Andrea's new get_request_wait_wakeup ideas and wake_up_nr. -chris ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:35 ` Marc-Christian Petersen 2003-06-04 10:42 ` Jens Axboe @ 2003-06-04 10:43 ` Andrea Arcangeli 2003-06-04 11:01 ` Marc-Christian Petersen 1 sibling, 1 reply; 114+ messages in thread From: Andrea Arcangeli @ 2003-06-04 10:43 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Marcelo Tosatti, Georg Nikodym, lkml On Wed, Jun 04, 2003 at 12:35:07PM +0200, Marc-Christian Petersen wrote: > On Wednesday 04 June 2003 12:22, Andrea Arcangeli wrote: > > Hi Andrea, > > > are you really sure that it is the right fix? > > I mean, the batching has a basic problem (I was discussing it with Jens > > two days ago and he said he's already addressed in 2.5, I wonder if that > > could also have an influence on the fact 2.5 is so much better in > > fariness) > > the issue with batching in 2.4, is that it is blocking at 0 and waking > > at batch_requests. But it's not blocking new get_request to eat requests > > in the way back from 0 to batch_requests. I mean, there are two > > directions, when we move from batch_requests to 0 get_requests should > > return requests. in the way back from 0 to batch_requests the > > get_request should block (and it doesn't in 2.4, that is the problem) > do you see a chance to fix this up in 2.4? sure, it's just a matter of adding a bit to the blkdev structure. However I'm not 100% sure that it is the real thing that could make the difference, but overall the exclusive wakeup FIFO in theory should provide even an higher degree of fariness, so at the very least the "fix" 2 from Andrew makes very little sense to me, and it seems just an hack meant to hide a real problem in the algorithm. I mean, going wakeall (LIFO btw) rather than wake-one FIFO if something should make things worse unless it is hiding some other issue. As for 1 and 3 they were just included in my tree for ages. BTW, Chris recently spotted a nearly impossible to trigger SMP-only race in the fix pausing patch [great spotting Chris] (to trigger it would need an intersection of two races at the same time), it'll be fixed in my next tree, however nobody ever reproduced it and you certainly can ignore it in practice so it can't explain any issue. Andrea ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: -rc7 Re: Linux 2.4.21-rc6 2003-06-04 10:43 ` -rc7 Re: Linux 2.4.21-rc6 Andrea Arcangeli @ 2003-06-04 11:01 ` Marc-Christian Petersen 0 siblings, 0 replies; 114+ messages in thread From: Marc-Christian Petersen @ 2003-06-04 11:01 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Marcelo Tosatti, Georg Nikodym, lkml On Wednesday 04 June 2003 12:43, Andrea Arcangeli wrote: Hi Andrea, > sure, it's just a matter of adding a bit to the blkdev structure. > However I'm not 100% sure that it is the real thing that could make the > difference, but overall the exclusive wakeup FIFO in theory should > provide even an higher degree of fariness, so at the very least the > "fix" 2 from Andrew makes very little sense to me, and it seems just an > hack meant to hide a real problem in the algorithm. well, at least it reduces pauses/stops ;) > As for 1 and 3 they were just included in my tree for ages. err, 1 yes, but I don't see that 3 is in your tree. Well, ok, a bit different. But hey, your 1+3 are still having pauses ;) > BTW, Chris recently spotted a nearly impossible to trigger SMP-only race > in the fix pausing patch [great spotting Chris] (to trigger it would Cool Chris! > need an intersection of two races at the same time), it'll be fixed in > my next tree, however nobody ever reproduced it and you certainly can > ignore it in practice so it can't explain any issue. Good to know. Thanks. ciao, Marc ^ permalink raw reply [flat|nested] 114+ messages in thread
* Config issue (CONFIG_X86_TSC) Re: Linux 2.4.21-rc6 2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti ` (2 preceding siblings ...) 2003-05-29 18:00 ` Georg Nikodym @ 2003-06-03 19:45 ` Paul 2003-06-03 20:18 ` Jan-Benedict Glaw 3 siblings, 1 reply; 114+ messages in thread From: Paul @ 2003-06-03 19:45 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml Marcelo Tosatti <marcelo@conectiva.com.br>, on Wed May 28, 2003 [09:55:39 PM] said: > > Hi, > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's fix > for the IO stalls/deadlocks. > > Please test it. > Hi; It seems if I run 'make menuconfig', and the only change I make is to change the processor type from its default to 486, "CONFIG_X86_TSC=y", remains in the .config, which results in a kernel that wont boot on a 486. Running 'make oldconfig' seems to fix it up, though... Paul set@pobox.com ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Config issue (CONFIG_X86_TSC) Re: Linux 2.4.21-rc6 2003-06-03 19:45 ` Config issue (CONFIG_X86_TSC) " Paul @ 2003-06-03 20:18 ` Jan-Benedict Glaw 0 siblings, 0 replies; 114+ messages in thread From: Jan-Benedict Glaw @ 2003-06-03 20:18 UTC (permalink / raw) To: lkml; +Cc: Paul, Marcelo Tosatti [-- Attachment #1: Type: text/plain, Size: 1136 bytes --] On Tue, 2003-06-03 15:45:37 -0400, Paul <set@pobox.com> wrote in message <20030603194537.GO22874@squish.home.loc>: > Marcelo Tosatti <marcelo@conectiva.com.br>, on Wed May 28, 2003 [09:55:39 PM] said: > > Here goes -rc6. I've decided to delay 2.4.21 a bit and try Andrew's fix > > for the IO stalls/deadlocks. > > > > Please test it. > > > Hi; > > It seems if I run 'make menuconfig', and the only change > I make is to change the processor type from its default to > 486, "CONFIG_X86_TSC=y", remains in the .config, which results > in a kernel that wont boot on a 486. > Running 'make oldconfig' seems to fix it up, though... Yeah, that's a but hitting i80386 also, I had sent a patch for that some time ago to LKML. There's simply some CONFIG_X86_TSC=n missing in the case of i486 and i486. MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg fuer einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! ret = do_actions((curr | FREE_SPEECH) & ~(IRAQ_WAR_2 | DRM | TCPA)); [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 @ 2003-05-30 19:08 Daniel Goller 2003-05-30 20:52 ` Mike Fedyk 0 siblings, 1 reply; 114+ messages in thread From: Daniel Goller @ 2003-05-30 19:08 UTC (permalink / raw) To: linux-kernel i tried 2.4.21-rc6 as i was told it might fix the mouse stalling on heavy disk IO problem and i would like to report that it DOES fix them for the most part, even certain compiles/benchmarks/stress tests that could stall my pc for seconds now affect the mouse for mere fractions of one second, situations that used to cause short stalls are now a thing of the past 2.4.21-rc6 is the best kernel i have tried to date and i have tried many on my quest to get a smooth mouse i dont subscribe to lkml, so if you have questions please CC me personally hope this input is helpful Daniel morfic ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-30 19:08 Daniel Goller @ 2003-05-30 20:52 ` Mike Fedyk 2003-05-31 7:06 ` Daniel Goller 0 siblings, 1 reply; 114+ messages in thread From: Mike Fedyk @ 2003-05-30 20:52 UTC (permalink / raw) To: Daniel Goller; +Cc: linux-kernel On Fri, May 30, 2003 at 02:08:51PM -0500, Daniel Goller wrote: > i tried 2.4.21-rc6 as i was told it might fix the mouse stalling on > heavy disk IO problem and i would like to report that it DOES fix them > for the most part, even certain compiles/benchmarks/stress tests that > could stall my pc for seconds now affect the mouse for mere fractions of > one second, situations that used to cause short stalls are now a thing > of the past > > 2.4.21-rc6 is the best kernel i have tried to date and i have tried many > on my quest to get a smooth mouse There are reports that 2.4.18 also "fixed" the problems with the mouse. Can you verify? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-30 20:52 ` Mike Fedyk @ 2003-05-31 7:06 ` Daniel Goller 2003-05-31 11:12 ` Michael Frank 0 siblings, 1 reply; 114+ messages in thread From: Daniel Goller @ 2003-05-31 7:06 UTC (permalink / raw) To: Mike Fedyk; +Cc: linux-kernel On Fri, 2003-05-30 at 15:52, Mike Fedyk wrote: > On Fri, May 30, 2003 at 02:08:51PM -0500, Daniel Goller wrote: > > i tried 2.4.21-rc6 as i was told it might fix the mouse stalling on > > heavy disk IO problem and i would like to report that it DOES fix them > > for the most part, even certain compiles/benchmarks/stress tests that > > could stall my pc for seconds now affect the mouse for mere fractions of > > one second, situations that used to cause short stalls are now a thing > > of the past > > > > 2.4.21-rc6 is the best kernel i have tried to date and i have tried many > > on my quest to get a smooth mouse > > There are reports that 2.4.18 also "fixed" the problems with the mouse. Can > you verify? sorry i never ran a 2.4.18 kernel, can't comment on that ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-31 7:06 ` Daniel Goller @ 2003-05-31 11:12 ` Michael Frank 2003-06-01 0:39 ` Daniel Goller 0 siblings, 1 reply; 114+ messages in thread From: Michael Frank @ 2003-05-31 11:12 UTC (permalink / raw) To: Daniel Goller, Mike Fedyk; +Cc: linux-kernel On Saturday 31 May 2003 15:06, Daniel Goller wrote: > On Fri, 2003-05-30 at 15:52, Mike Fedyk wrote: > > On Fri, May 30, 2003 at 02:08:51PM -0500, Daniel Goller wrote: > > > i tried 2.4.21-rc6 as i was told it might fix the > > > mouse stalling on heavy disk IO problem and i would > > > like to report that it DOES fix them for the most > > > part, even certain compiles/benchmarks/stress tests > > > that could stall my pc for seconds now affect the > > > mouse for mere fractions of one second, situations > > > that used to cause short stalls are now a thing of > > > the past > > > > > > 2.4.21-rc6 is the best kernel i have tried to date > > > and i have tried many on my quest to get a smooth > > > mouse > > > > There are reports that 2.4.18 also "fixed" the problems > > with the mouse. Can you verify? > Yes, it performs similar to -rc6 but not nearly as good as 2.5.70. On 2.5.70 the mouse is really smooth all the time, scrollong of large pages in opera is fairly smooth most the time also with large disk io loads such as the script i posted earlier. Regards Michael ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: Linux 2.4.21-rc6 2003-05-31 11:12 ` Michael Frank @ 2003-06-01 0:39 ` Daniel Goller 0 siblings, 0 replies; 114+ messages in thread From: Daniel Goller @ 2003-06-01 0:39 UTC (permalink / raw) To: Michael Frank; +Cc: Mike Fedyk, linux-kernel On Sat, 2003-05-31 at 06:12, Michael Frank wrote: > On Saturday 31 May 2003 15:06, Daniel Goller wrote: > > On Fri, 2003-05-30 at 15:52, Mike Fedyk wrote: > > > On Fri, May 30, 2003 at 02:08:51PM -0500, Daniel Goller > wrote: > > > > i tried 2.4.21-rc6 as i was told it might fix the > > > > mouse stalling on heavy disk IO problem and i would > > > > like to report that it DOES fix them for the most > > > > part, even certain compiles/benchmarks/stress tests > > > > that could stall my pc for seconds now affect the > > > > mouse for mere fractions of one second, situations > > > > that used to cause short stalls are now a thing of > > > > the past > > > > > > > > 2.4.21-rc6 is the best kernel i have tried to date > > > > and i have tried many on my quest to get a smooth > > > > mouse > > > > > > There are reports that 2.4.18 also "fixed" the problems > > > with the mouse. Can you verify? > > > > Yes, it performs similar to -rc6 but not nearly as good as > 2.5.70. > > On 2.5.70 the mouse is really smooth all the time, scrollong > of large pages in opera is fairly smooth most the time also > with large disk io loads such as the script i posted > earlier. > > Regards > Michael > unfortunately the radeon dri is broken in 2.5.70 so i havent tried that much, need to see if someone already suggests a fix for this unused int (it seems unused to me, after a *quick* look through the file) i guess i will have to subscribe now to lkml ^ permalink raw reply [flat|nested] 114+ messages in thread
end of thread, other threads:[~2003-06-27 12:28 UTC | newest] Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti 2003-05-29 1:22 ` Con Kolivas 2003-05-29 5:24 ` Marc Wilson 2003-05-29 5:34 ` Riley Williams 2003-05-29 5:57 ` Marc Wilson 2003-05-29 7:15 ` Riley Williams 2003-05-29 8:38 ` Willy Tarreau 2003-05-29 8:40 ` Willy Tarreau 2003-06-03 16:02 ` Marcelo Tosatti 2003-06-03 16:13 ` Marc-Christian Petersen 2003-06-04 21:54 ` Pavel Machek 2003-06-05 2:10 ` Michael Frank 2003-06-03 16:30 ` Michael Frank 2003-06-03 16:53 ` Matthias Mueller 2003-06-03 16:59 ` Marc-Christian Petersen 2003-06-03 17:03 ` Marc-Christian Petersen 2003-06-03 18:02 ` Anders Karlsson 2003-06-03 21:12 ` J.A. Magallon 2003-06-03 21:18 ` Marc-Christian Petersen 2003-06-03 17:23 ` Michael Frank 2003-06-04 14:56 ` Jakob Oestergaard 2003-06-04 4:04 ` Marc Wilson 2003-05-29 10:02 ` Con Kolivas 2003-05-29 18:00 ` Georg Nikodym 2003-05-29 19:11 ` -rc7 " Marcelo Tosatti 2003-05-29 19:56 ` Krzysiek Taraszka 2003-05-29 20:18 ` Krzysiek Taraszka 2003-06-04 18:17 ` Marcelo Tosatti 2003-06-04 21:41 ` Krzysiek Taraszka 2003-06-04 22:37 ` Alan Cox 2003-06-04 10:22 ` Andrea Arcangeli 2003-06-04 10:35 ` Marc-Christian Petersen 2003-06-04 10:42 ` Jens Axboe 2003-06-04 10:46 ` Marc-Christian Petersen 2003-06-04 10:48 ` Andrea Arcangeli 2003-06-04 11:57 ` Nick Piggin 2003-06-04 12:00 ` Jens Axboe 2003-06-04 12:09 ` Andrea Arcangeli 2003-06-04 12:20 ` Jens Axboe 2003-06-04 20:50 ` Rob Landley 2003-06-04 12:11 ` Nick Piggin 2003-06-04 12:35 ` Miquel van Smoorenburg 2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2003-06-09 22:19 ` Andrea Arcangeli 2003-06-10 0:27 ` Chris Mason 2003-06-10 23:13 ` Chris Mason 2003-06-11 0:16 ` Andrea Arcangeli 2003-06-11 0:44 ` Chris Mason 2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin 2003-06-10 0:32 ` Chris Mason 2003-06-10 0:47 ` Nick Piggin 2003-06-10 1:48 ` Robert White 2003-06-10 2:13 ` Chris Mason 2003-06-10 23:04 ` Robert White 2003-06-11 0:58 ` Chris Mason 2003-06-10 3:22 ` Nick Piggin 2003-06-10 21:17 ` Robert White 2003-06-11 0:40 ` Nick Piggin 2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli 2003-06-11 0:48 ` [PATCH] io stalls Nick Piggin 2003-06-11 1:07 ` Andrea Arcangeli 2003-06-11 0:54 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason 2003-06-11 1:06 ` Andrea Arcangeli 2003-06-11 1:57 ` Chris Mason 2003-06-11 2:10 ` Andrea Arcangeli 2003-06-11 12:24 ` Chris Mason 2003-06-11 17:42 ` Chris Mason 2003-06-11 18:12 ` Andrea Arcangeli 2003-06-11 18:27 ` Chris Mason 2003-06-11 18:35 ` Andrea Arcangeli 2003-06-12 1:04 ` [PATCH] io stalls Nick Piggin 2003-06-12 1:12 ` Chris Mason 2003-06-12 1:29 ` Andrea Arcangeli 2003-06-12 1:37 ` Andrea Arcangeli 2003-06-12 2:22 ` Chris Mason 2003-06-12 2:41 ` Nick Piggin 2003-06-12 2:46 ` Andrea Arcangeli 2003-06-12 2:49 ` Nick Piggin 2003-06-12 2:51 ` Nick Piggin 2003-06-12 2:52 ` Nick Piggin 2003-06-12 3:04 ` Andrea Arcangeli 2003-06-12 2:58 ` Andrea Arcangeli 2003-06-12 3:04 ` Nick Piggin 2003-06-12 3:12 ` Andrea Arcangeli 2003-06-12 3:20 ` Nick Piggin 2003-06-12 3:33 ` Andrea Arcangeli 2003-06-12 3:48 ` Nick Piggin 2003-06-12 4:17 ` Andrea Arcangeli 2003-06-12 4:41 ` Nick Piggin 2003-06-12 16:06 ` Chris Mason 2003-06-12 16:16 ` Nick Piggin 2003-06-25 19:03 ` Chris Mason 2003-06-25 19:25 ` Andrea Arcangeli 2003-06-25 20:18 ` Chris Mason 2003-06-27 8:41 ` write-caches, I/O stalls: MUST-FIX (was: [PATCH] io stalls) Matthias Andree 2003-06-26 5:48 ` [PATCH] io stalls Nick Piggin 2003-06-26 11:48 ` Chris Mason 2003-06-26 13:04 ` Nick Piggin 2003-06-26 13:18 ` Nick Piggin 2003-06-26 15:55 ` Chris Mason 2003-06-27 1:21 ` Nick Piggin 2003-06-27 1:39 ` Chris Mason 2003-06-27 9:45 ` Nick Piggin 2003-06-27 12:41 ` Chris Mason 2003-06-12 11:57 ` Chris Mason 2003-06-04 10:43 ` -rc7 Re: Linux 2.4.21-rc6 Andrea Arcangeli 2003-06-04 11:01 ` Marc-Christian Petersen 2003-06-03 19:45 ` Config issue (CONFIG_X86_TSC) " Paul 2003-06-03 20:18 ` Jan-Benedict Glaw 2003-05-30 19:08 Daniel Goller 2003-05-30 20:52 ` Mike Fedyk 2003-05-31 7:06 ` Daniel Goller 2003-05-31 11:12 ` Michael Frank 2003-06-01 0:39 ` Daniel Goller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).