* Linux v2.4.19-rc5 @ 2002-08-01 6:38 Marcelo Tosatti 2002-08-01 7:49 ` Jens Axboe ` (4 more replies) 0 siblings, 5 replies; 56+ messages in thread From: Marcelo Tosatti @ 2002-08-01 6:38 UTC (permalink / raw) To: lkml One of the -rc4 fixes was not correct and -rc4 missed an important SMP race "fix" on the block layer. Summary of changes from v2.4.19-rc4 to v2.4.19-rc5 ============================================ <davem@redhat.com> (02/08/01 1.662) [PATCH] Correct openprom fix <davem@redhat.com> (02/07/31 1.661) [PATCH] Add missing check to openprom driver <akpm@zip.com.au> (02/08/01 1.663) [PATCH] disable READA <marcelo@plucky.distro.conectiva> (02/08/01 1.664) Change EXTRAVERSION to -rc5 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti @ 2002-08-01 7:49 ` Jens Axboe 2002-08-01 7:14 ` Marcelo Tosatti 2002-08-01 7:55 ` Keith Owens ` (3 subsequent siblings) 4 siblings, 1 reply; 56+ messages in thread From: Jens Axboe @ 2002-08-01 7:49 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml, Andrew Morton On Thu, Aug 01 2002, Marcelo Tosatti wrote: > <akpm@zip.com.au> (02/08/01 1.663) > [PATCH] disable READA Since -rc5 is not to be found yet, I don't know what version of this made it in. Is READA just being disabled on SMP, or was it the general #if 0 change that got included? I'm asking since plain disabling READA might have nasty performance effects. Andrew, I bet you did some numbers on this, care to share? -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 7:49 ` Jens Axboe @ 2002-08-01 7:14 ` Marcelo Tosatti 2002-08-01 8:10 ` Jens Axboe 2002-08-01 20:15 ` Steven Cole 0 siblings, 2 replies; 56+ messages in thread From: Marcelo Tosatti @ 2002-08-01 7:14 UTC (permalink / raw) To: Jens Axboe; +Cc: lkml, Andrew Morton On Thu, 1 Aug 2002, Jens Axboe wrote: > On Thu, Aug 01 2002, Marcelo Tosatti wrote: > > <akpm@zip.com.au> (02/08/01 1.663) > > [PATCH] disable READA > > Since -rc5 is not to be found yet, I don't know what version of this > made it in. Is READA just being disabled on SMP, or was it the general > #if 0 change that got included? Its being disabled on UP and SMP. I dont like having such readahead IO mode working only for UP. > I'm asking since plain disabling READA might have nasty performance > effects. Andrew, I bet you did some numbers on this, care to share? If thats true (the performance effects) I'll release -final with IMO not very coherent READA semantics :) Anyway, lets wait for the numbers. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 7:14 ` Marcelo Tosatti @ 2002-08-01 8:10 ` Jens Axboe 2002-08-01 9:02 ` Andrew Morton 2002-08-01 20:15 ` Steven Cole 1 sibling, 1 reply; 56+ messages in thread From: Jens Axboe @ 2002-08-01 8:10 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml, Andrew Morton On Thu, Aug 01 2002, Marcelo Tosatti wrote: > > On Thu, 1 Aug 2002, Jens Axboe wrote: > > > On Thu, Aug 01 2002, Marcelo Tosatti wrote: > > > <akpm@zip.com.au> (02/08/01 1.663) > > > [PATCH] disable READA > > > > Since -rc5 is not to be found yet, I don't know what version of this > > made it in. Is READA just being disabled on SMP, or was it the general > > #if 0 change that got included? > > Its being disabled on UP and SMP. I dont like having such readahead IO > mode working only for UP. You are right, that would be ugly. Should only be the last resort. > > I'm asking since plain disabling READA might have nasty performance > > effects. Andrew, I bet you did some numbers on this, care to share? > > If thats true (the performance effects) I'll release -final with IMO not > very coherent READA semantics :) > > Anyway, lets wait for the numbers. It just 'feels' like the sort of change that might have odd side effects. -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 8:10 ` Jens Axboe @ 2002-08-01 9:02 ` Andrew Morton 2002-08-01 8:58 ` Jens Axboe 2002-08-01 14:45 ` Steven Cole 0 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2002-08-01 9:02 UTC (permalink / raw) To: Jens Axboe; +Cc: Marcelo Tosatti, lkml Jens Axboe wrote: > > ... > > Anyway, lets wait for the numbers. > > It just 'feels' like the sort of change that might have odd side > effects. It's almost impossible to get READA to do anything. For example, in current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync reports a "buffer I/O error". Every time. And nobody has reported this. It _is_ possible to hit this in 2.5, because of ext2_preread_inode(). Probably, also it's possible to hit it in 2.4 with hundreds of processes all issuing ext3 directory readahead. But it's pretty remote. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 9:02 ` Andrew Morton @ 2002-08-01 8:58 ` Jens Axboe 2002-08-01 14:45 ` Steven Cole 1 sibling, 0 replies; 56+ messages in thread From: Jens Axboe @ 2002-08-01 8:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Marcelo Tosatti, lkml On Thu, Aug 01 2002, Andrew Morton wrote: > Jens Axboe wrote: > > > > ... > > > Anyway, lets wait for the numbers. > > > > It just 'feels' like the sort of change that might have odd side > > effects. > > It's almost impossible to get READA to do anything. For example, in > current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync > reports a "buffer I/O error". Every time. And nobody has reported this. Ahem, I've actually seen that happen :-). But maybe a total of 20 times or so. > It _is_ possible to hit this in 2.5, because of ext2_preread_inode(). > > Probably, also it's possible to hit it in 2.4 with hundreds of processes > all issuing ext3 directory readahead. But it's pretty remote. Alright, I'm happy then. -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 9:02 ` Andrew Morton 2002-08-01 8:58 ` Jens Axboe @ 2002-08-01 14:45 ` Steven Cole 2002-08-01 18:57 ` Andrew Morton 1 sibling, 1 reply; 56+ messages in thread From: Steven Cole @ 2002-08-01 14:45 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole On Thu, 2002-08-01 at 03:02, Andrew Morton wrote: > Jens Axboe wrote: > > > > ... > > > Anyway, lets wait for the numbers. > > > > It just 'feels' like the sort of change that might have odd side > > effects. > > It's almost impossible to get READA to do anything. For example, in > current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync > reports a "buffer I/O error". Every time. And nobody has reported this. > > It _is_ possible to hit this in 2.5, because of ext2_preread_inode(). > > Probably, also it's possible to hit it in 2.4 with hundreds of processes > all issuing ext3 directory readahead. But it's pretty remote. I've never seen this on 2.4.19-rc3 and I've been beating on it pretty hard, running dbench 128 many times. However, 2.5 is another story. This might not be the best thread to report this, but since the subject came up, I'm getting the following message with recent 2.5.x kernels whenever I run relatively large numbers of dbench clients. Buffer I/O error on device sd(8,8), logical block XXXXXXX where logical block repeats 0-6 times. This behavior is repeatable, but only occurs under fairly high load. I ran dbench with increasing numbers of clients, with the following results: dbench clients Buffer I/O error messages >=48 0 52 1 56 0 64 0 80 11 96 9 112 7 128 4 This particular run was with 2.5.29 with rmap13b and slabLRU patches, but the behavior with 2.5.29-vanilla was similar. Kernel is SMP, no preempt, and /dev/sda8 where dbench was running was mounted ext2. The test box is 2-way p3, SCSI, 1GB memory. Time to go beat on -rc5 and see if anything falls out. Steven ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 14:45 ` Steven Cole @ 2002-08-01 18:57 ` Andrew Morton 0 siblings, 0 replies; 56+ messages in thread From: Andrew Morton @ 2002-08-01 18:57 UTC (permalink / raw) To: Steven Cole; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole Steven Cole wrote: > > ... > I've never seen this on 2.4.19-rc3 and I've been beating on it pretty > hard, running dbench 128 many times. However, 2.5 is another story. > > This might not be the best thread to report this, but since the subject > came up, I'm getting the following message with recent 2.5.x kernels > whenever I run relatively large numbers of dbench clients. > > Buffer I/O error on device sd(8,8), logical block XXXXXXX > > where logical block repeats 0-6 times. This behavior is repeatable, but > only occurs under fairly high load. I ran dbench with increasing numbers > of clients, with the following results: > > dbench clients Buffer I/O error messages > >=48 0 > 52 1 > 56 0 > 64 0 > 80 11 > 96 9 > 112 7 > 128 4 Yup. The printk is bogus - I thought I'd removed it a couple of kernels ago. It's a bit sad that an abandoned readahead attempt is indistinguishable from a dead disk. - ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 7:14 ` Marcelo Tosatti 2002-08-01 8:10 ` Jens Axboe @ 2002-08-01 20:15 ` Steven Cole 2002-08-06 3:46 ` Bill Davidsen 1 sibling, 1 reply; 56+ messages in thread From: Steven Cole @ 2002-08-01 20:15 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Jens Axboe, lkml, Andrew Morton, Steven Cole On Thu, 2002-08-01 at 01:14, Marcelo Tosatti wrote: > > On Thu, 1 Aug 2002, Jens Axboe wrote: > > > On Thu, Aug 01 2002, Marcelo Tosatti wrote: > > > <akpm@zip.com.au> (02/08/01 1.663) > > > [PATCH] disable READA > > > > Since -rc5 is not to be found yet, I don't know what version of this > > made it in. Is READA just being disabled on SMP, or was it the general > > #if 0 change that got included? > > Its being disabled on UP and SMP. I dont like having such readahead IO > mode working only for UP. > > > I'm asking since plain disabling READA might have nasty performance > > effects. Andrew, I bet you did some numbers on this, care to share? > > If thats true (the performance effects) I'll release -final with IMO not > very coherent READA semantics :) > > Anyway, lets wait for the numbers. Marcelo, Here are some dbench numbers, from the "for what it's worth" department. This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2. The first column is dbench clients. The numbers are throughput in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes. Looks like for this limited test, 2.4.19-rc5 holds up pretty well. I've also ran this set of tests several times on -rc5 using ext3 and data=writeback, and everything looks fine. Steven 2.4.19-rc2 2.4.19-rc5 2.5.29 1 114.616 113.402 112.668 2 173.234 183.829 175.148 3 185.995 187.411 184.63 4 185.447 186.891 188.199 6 191.115 191.439 191.787 8 191.962 191.551 191.53 10 192.984 194.036 194.923 12 183.847 185.73 195.328 16 183.609 183.439 196.224 20 181.519 179.956 193.681 24 183.509 183.387 194.09 28 176.04 175.832 169.326 32 174.583 163.09 137.815 36 155.04 164.154 121.861 40 155.37 156.028 102.014 44 152.546 138.171 91.6088 48 146.419 135.447 84.3884 52 139.788 125.968 89.2374 56 113.933 122.592 81.021 64 110.792 106.484 84.648 80 87.4692 60.6054 96 87.7201 57.9622 112 74.9503 49.468 128 67.2649 47.0254 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 20:15 ` Steven Cole @ 2002-08-06 3:46 ` Bill Davidsen 2002-08-06 4:30 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Bill Davidsen @ 2002-08-06 3:46 UTC (permalink / raw) To: Steven Cole; +Cc: Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On 1 Aug 2002, Steven Cole wrote: > Here are some dbench numbers, from the "for what it's worth" department. > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2. > The first column is dbench clients. The numbers are throughput > in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes. > Looks like for this limited test, 2.4.19-rc5 holds up pretty well. > I've also ran this set of tests several times on -rc5 using ext3 > and data=writeback, and everything looks fine. > > Steven Call me an optimist, but after all the reliability problems we had win the 2.5 series, I sort of hoped it would be better in performance, not increasingly worse. Am I misreading this? Can we fall back to the faster 2.4 code :-( > 2.4.19-rc2 2.4.19-rc5 2.5.29 > > 1 114.616 113.402 112.668 > 2 173.234 183.829 175.148 > 3 185.995 187.411 184.63 > 4 185.447 186.891 188.199 > 6 191.115 191.439 191.787 > 8 191.962 191.551 191.53 > 10 192.984 194.036 194.923 > 12 183.847 185.73 195.328 > 16 183.609 183.439 196.224 > 20 181.519 179.956 193.681 > 24 183.509 183.387 194.09 > 28 176.04 175.832 169.326 > 32 174.583 163.09 137.815 > 36 155.04 164.154 121.861 > 40 155.37 156.028 102.014 > 44 152.546 138.171 91.6088 > 48 146.419 135.447 84.3884 > 52 139.788 125.968 89.2374 > 56 113.933 122.592 81.021 > 64 110.792 106.484 84.648 > 80 87.4692 60.6054 > 96 87.7201 57.9622 > 112 74.9503 49.468 > 128 67.2649 47.0254 -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 3:46 ` Bill Davidsen @ 2002-08-06 4:30 ` Andrew Morton 2002-08-06 14:07 ` Steven Cole 2002-08-06 5:42 ` Jens Axboe 2002-08-06 12:59 ` Rik van Riel 2 siblings, 1 reply; 56+ messages in thread From: Andrew Morton @ 2002-08-06 4:30 UTC (permalink / raw) To: Bill Davidsen; +Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Steven Cole Bill Davidsen wrote: > > On 1 Aug 2002, Steven Cole wrote: > > > Here are some dbench numbers, from the "for what it's worth" department. > > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2. > > The first column is dbench clients. The numbers are throughput > > in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes. > > Looks like for this limited test, 2.4.19-rc5 holds up pretty well. > > I've also ran this set of tests several times on -rc5 using ext3 > > and data=writeback, and everything looks fine. > > > > Steven > > Call me an optimist, but after all the reliability problems we had win the > 2.5 series, I sort of hoped it would be better in performance, not > increasingly worse. Am I misreading this? Can we fall back to the faster > 2.4 code :-( IO in 2.5 is much more CPU efficient that in 2.4, and straight-line bandwidth is better as well. The scheduling of that IO has a few problems, so in wildly seeky loads like dbench the kernel still falls over its own feet a bit. The two main culprits here are the lock_buffer() in block_write_full_page() against the blockdev mapping, and the writeback of dirty pages from the tail of the LRU in page reclaim. And no, the eventual dbench numbers will not be a measure of the success of the tuning which will happen on the run in to 2.6. Dbench throughput may well be lower, because we probably should be starting writeback at lower dirty thresholds. If you want good dbench numbers: echo 70 > /proc/sys/vm/dirty_background_ratio echo 75 > /proc/sys/vm/dirty_async_ratio echo 80 > /proc/sys/vm/dirty_sync_ratio echo 30000 > /proc/sys/vm/dirty_expire_centisecs ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 4:30 ` Andrew Morton @ 2002-08-06 14:07 ` Steven Cole 2002-08-06 14:20 ` Rik van Riel 2002-08-06 17:12 ` Andrew Morton 0 siblings, 2 replies; 56+ messages in thread From: Steven Cole @ 2002-08-06 14:07 UTC (permalink / raw) To: Andrew Morton; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml On Mon, 2002-08-05 at 22:30, Andrew Morton wrote: [snipped] > > IO in 2.5 is much more CPU efficient that in 2.4, and straight-line > bandwidth is better as well. > > The scheduling of that IO has a few problems, so in wildly seeky loads > like dbench the kernel still falls over its own feet a bit. The > two main culprits here are the lock_buffer() in block_write_full_page() > against the blockdev mapping, and the writeback of dirty pages from the > tail of the LRU in page reclaim. > > And no, the eventual dbench numbers will not be a measure of the success > of the tuning which will happen on the run in to 2.6. Dbench throughput > may well be lower, because we probably should be starting writeback > at lower dirty thresholds. > > If you want good dbench numbers: > > echo 70 > /proc/sys/vm/dirty_background_ratio > echo 75 > /proc/sys/vm/dirty_async_ratio > echo 80 > /proc/sys/vm/dirty_sync_ratio > echo 30000 > /proc/sys/vm/dirty_expire_centisecs That last one looks like the biggest cheat. Rather than optimizing for dbench, is there a set of pessimizing numbers which would optimally turn dbench into a semi-useful tool for measuring meaningful IO performance? Or is dbench really only useful for stress testing? Thanks for the explanations. Steven ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 14:07 ` Steven Cole @ 2002-08-06 14:20 ` Rik van Riel 2002-08-06 17:12 ` Andrew Morton 1 sibling, 0 replies; 56+ messages in thread From: Rik van Riel @ 2002-08-06 14:20 UTC (permalink / raw) To: Steven Cole Cc: Andrew Morton, Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml On 6 Aug 2002, Steven Cole wrote: > That last one looks like the biggest cheat. Rather than optimizing for > dbench, is there a set of pessimizing numbers which would optimally turn > dbench into a semi-useful tool for measuring meaningful IO performance? > Or is dbench really only useful for stress testing? Yes, dbench is only useful as a stress testing tool. A minor varation in kernel behaviour can change dbench throughput by an order of magnitude and I'm not talking about any specific kernel component here ... ANY kernel component could trigger it. While it is easy to measure dbench throughput, it is nearly impossible to: 1) analyse why dbench throughput changed from kernel to kernel 2) predict the relation (if any) these changes in dbench throughput have with changes in performance of real applications, if any 3) identify which kernel subsystem was responsible for the change in dbench performance regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 14:07 ` Steven Cole 2002-08-06 14:20 ` Rik van Riel @ 2002-08-06 17:12 ` Andrew Morton 1 sibling, 0 replies; 56+ messages in thread From: Andrew Morton @ 2002-08-06 17:12 UTC (permalink / raw) To: Steven Cole; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml Steven Cole wrote: > > ... > > If you want good dbench numbers: > > > > echo 70 > /proc/sys/vm/dirty_background_ratio > > echo 75 > /proc/sys/vm/dirty_async_ratio > > echo 80 > /proc/sys/vm/dirty_sync_ratio > > echo 30000 > /proc/sys/vm/dirty_expire_centisecs > > That last one looks like the biggest cheat. Rather than optimizing for > dbench, is there a set of pessimizing numbers which would optimally turn > dbench into a semi-useful tool for measuring meaningful IO performance? > Or is dbench really only useful for stress testing? > We tend to use dbench in two modes nowadays. One is the "RAM only" mode, where the run completes before hitting disk at all. That's a very useful and repeatable test for CPU efficiency and lock contention. The other mode is of course when there are enough clients and enough dirty data for the test to go to disk. As Rik says, this tends to be subject to chaotic effects, and it is also extremely non linear. Because when the run slows down a little bit, it takes longer, so more data becomes eligible for time-expiry-based writeback, which causes more IO, which causes the run to take longer, etc, etc. Yes, one does tend still to keep one's eye on the "heavy" dbench throughput, but I suspect that tuning for this workload is a bad thing overall. This is because good dbench numbers come from allowing a large amount of dirty data to float about in memory (it will never get written out). But for real workloads which don't delete their own output 30 seconds later, we want to start writeback earlier. To use the disk bandwidth more smoothly and to decrease memory allocation latency. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 3:46 ` Bill Davidsen 2002-08-06 4:30 ` Andrew Morton @ 2002-08-06 5:42 ` Jens Axboe 2002-08-06 8:30 ` Adrian Bunk 2002-08-06 10:31 ` Lincoln Dale 2002-08-06 12:59 ` Rik van Riel 2 siblings, 2 replies; 56+ messages in thread From: Jens Axboe @ 2002-08-06 5:42 UTC (permalink / raw) To: Bill Davidsen Cc: Steven Cole, Marcelo Tosatti, lkml, Andrew Morton, Steven Cole On Mon, Aug 05 2002, Bill Davidsen wrote: > On 1 Aug 2002, Steven Cole wrote: > > > Here are some dbench numbers, from the "for what it's worth" department. > > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2. > > The first column is dbench clients. The numbers are throughput > > in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes. > > Looks like for this limited test, 2.4.19-rc5 holds up pretty well. > > I've also ran this set of tests several times on -rc5 using ext3 > > and data=writeback, and everything looks fine. > > > > Steven > > Call me an optimist, but after all the reliability problems we had win the > 2.5 series, I sort of hoped it would be better in performance, not > increasingly worse. Am I misreading this? Can we fall back to the faster > 2.4 code :-( try a work load that excercises the block i/o layer alone (O_DIRECT, raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this from ols, unfortunately I don't know if they have then online. please don't put too much wait in dbench numbers for this sort of thing :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 5:42 ` Jens Axboe @ 2002-08-06 8:30 ` Adrian Bunk 2002-08-06 8:48 ` Jens Axboe 2002-08-06 10:31 ` Lincoln Dale 1 sibling, 1 reply; 56+ messages in thread From: Adrian Bunk @ 2002-08-06 8:30 UTC (permalink / raw) To: Jens Axboe; +Cc: Bill Davidsen, lkml On Tue, 6 Aug 2002, Jens Axboe wrote: >... > try a work load that excercises the block i/o layer alone (O_DIRECT, > raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this > from ols, unfortunately I don't know if they have then online. >... Pages 390-406 in http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz or are you talking about something different? cu Adrian -- You only think this is a free country. Like the US the UK spends a lot of time explaining its a free country because its a police state. Alan Cox ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 8:30 ` Adrian Bunk @ 2002-08-06 8:48 ` Jens Axboe 0 siblings, 0 replies; 56+ messages in thread From: Jens Axboe @ 2002-08-06 8:48 UTC (permalink / raw) To: Adrian Bunk; +Cc: Bill Davidsen, lkml On Tue, Aug 06 2002, Adrian Bunk wrote: > On Tue, 6 Aug 2002, Jens Axboe wrote: > > >... > > try a work load that excercises the block i/o layer alone (O_DIRECT, > > raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this > > from ols, unfortunately I don't know if they have then online. > >... > > Pages 390-406 in > > http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz > > or are you talking about something different? Right thanks, exactly those. Table 3 on page 395 is the one I noted. Forget readv, as that hasn't been done in 2.5 yet. I'd say a 2.5.17 untweaked kernel beating 2.4 tweaked beyond recognition isn't too shabby for a devel series kernel. -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 5:42 ` Jens Axboe 2002-08-06 8:30 ` Adrian Bunk @ 2002-08-06 10:31 ` Lincoln Dale 1 sibling, 0 replies; 56+ messages in thread From: Lincoln Dale @ 2002-08-06 10:31 UTC (permalink / raw) To: Jens Axboe Cc: Bill Davidsen, Steven Cole, Marcelo Tosatti, lkml, Andrew Morton, Steven Cole At 07:42 AM 6/08/2002 +0200, Jens Axboe wrote: > > Call me an optimist, but after all the reliability problems we had win the > > 2.5 series, I sort of hoped it would be better in performance, not > > increasingly worse. Am I misreading this? Can we fall back to the faster > > 2.4 code :-( > >try a work load that excercises the block i/o layer alone (O_DIRECT, >raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this >from ols, unfortunately I don't know if they have then online. the BIO in 2.5 kicks butt over the 2.4 BIO - both in terms of increased throughput and decreased cpu utilization. see some testing i previously did: http://marc.theaimsgroup.com/?l=linux-kernel&m=102635456620627&w=2 cheers, lincoln. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 3:46 ` Bill Davidsen 2002-08-06 4:30 ` Andrew Morton 2002-08-06 5:42 ` Jens Axboe @ 2002-08-06 12:59 ` Rik van Riel 2002-08-07 1:09 ` Bill Davidsen 2 siblings, 1 reply; 56+ messages in thread From: Rik van Riel @ 2002-08-06 12:59 UTC (permalink / raw) To: Bill Davidsen Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Mon, 5 Aug 2002, Bill Davidsen wrote: > > Here are some dbench numbers, from the "for what it's worth" department. > > Call me an optimist, but after all the reliability problems we had win the > 2.5 series, I sort of hoped it would be better in performance, not > increasingly worse. Am I misreading this? Can we fall back to the faster > 2.4 code :-( Dbench is at its best when half (or more) of the dbench processes are stuck semi-infinitely in __get_request_wait and the others can operate in RAM without ever touching the disk. In effect, if you want the best dbench throughput you should make the system completely unsuitable for real world applications ;) There are a few things that are good for both real world performance and dbench performance, but those are easily dwarved by random factors like IO scheduling, timeslice length, etc... regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-06 12:59 ` Rik van Riel @ 2002-08-07 1:09 ` Bill Davidsen 2002-08-07 2:54 ` Steven Cole 0 siblings, 1 reply; 56+ messages in thread From: Bill Davidsen @ 2002-08-07 1:09 UTC (permalink / raw) To: Rik van Riel Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Tue, 6 Aug 2002, Rik van Riel wrote: > On Mon, 5 Aug 2002, Bill Davidsen wrote: > > > > Here are some dbench numbers, from the "for what it's worth" department. > > > > Call me an optimist, but after all the reliability problems we had win the > > 2.5 series, I sort of hoped it would be better in performance, not > > increasingly worse. Am I misreading this? Can we fall back to the faster > > 2.4 code :-( > > Dbench is at its best when half (or more) of the dbench processes > are stuck semi-infinitely in __get_request_wait and the others can > operate in RAM without ever touching the disk. > > In effect, if you want the best dbench throughput you should make > the system completely unsuitable for real world applications ;) I assumed that the posted results were apples and apples. That may not be the case. If this was one kernel tuned for dbench and one for something else, then the information content is pretty low, to me at least. But if it is both tuned or both stock, then I would hope 2.5 would be better. If the text said that and I read past it, I apologise. > There are a few things that are good for both real world performance > and dbench performance, but those are easily dwarved by random factors > like IO scheduling, timeslice length, etc... I confess to being a kernel junkie when I have the time, I have run into the limitation of 19 boot stanzas in LILO :-( I have a case statement in rc.local to tune -aa VM, stock, and -ac rmap a little differently, since this machine is fairly fast and has bigish memory (2GB this week) and getting several ISO images in RAM and then having bdflush kick them out is bad. Looking forward to the io scheduler. I like to see 2.4.19 vs. 2.5.{29+} both tuned and untuned, but I have no days off in the next ten. By then there will be more new stuff, but the fast machine will be several area codes away, perhaps one of the people who like to do benchmarks might be too curious to wait. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 1:09 ` Bill Davidsen @ 2002-08-07 2:54 ` Steven Cole 2002-08-07 22:30 ` Bill Davidsen 0 siblings, 1 reply; 56+ messages in thread From: Steven Cole @ 2002-08-07 2:54 UTC (permalink / raw) To: Bill Davidsen Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote: > On Tue, 6 Aug 2002, Rik van Riel wrote: > > > On Mon, 5 Aug 2002, Bill Davidsen wrote: > > > > > > Here are some dbench numbers, from the "for what it's worth" department. > > > > > > Call me an optimist, but after all the reliability problems we had win the > > > 2.5 series, I sort of hoped it would be better in performance, not > > > increasingly worse. Am I misreading this? Can we fall back to the faster > > > 2.4 code :-( > > > > Dbench is at its best when half (or more) of the dbench processes > > are stuck semi-infinitely in __get_request_wait and the others can > > operate in RAM without ever touching the disk. > > > > In effect, if you want the best dbench throughput you should make > > the system completely unsuitable for real world applications ;) > > I assumed that the posted results were apples and apples. That may not be Well, maybe Granny Smiths and Red Delicious. The problem with dbench is that it checks how well they roll and bounce. But even that can be important sometimes. ;) > the case. If this was one kernel tuned for dbench and one for something > else, then the information content is pretty low, to me at least. But if > it is both tuned or both stock, then I would hope 2.5 would be better. If > the text said that and I read past it, I apologise. All kernels were stock as patched with no special changes to /proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x. Sorry, I didn't explicitly state that in the initial report. Steven ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 2:54 ` Steven Cole @ 2002-08-07 22:30 ` Bill Davidsen 2002-08-07 22:39 ` Rik van Riel 0 siblings, 1 reply; 56+ messages in thread From: Bill Davidsen @ 2002-08-07 22:30 UTC (permalink / raw) To: Steven Cole Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On 6 Aug 2002, Steven Cole wrote: > On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote: > > I assumed that the posted results were apples and apples. That may not be > > Well, maybe Granny Smiths and Red Delicious. The problem with dbench is > that it checks how well they roll and bounce. But even that can be > important sometimes. ;) > > > the case. If this was one kernel tuned for dbench and one for something > > else, then the information content is pretty low, to me at least. But if > > it is both tuned or both stock, then I would hope 2.5 would be better. If > > the text said that and I read past it, I apologise. > > All kernels were stock as patched with no special changes to > /proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x. > Sorry, I didn't explicitly state that in the initial report. Actually that was what I was assuming when I noted that the 2.5 appeared to be slower by a good bit for some high load values of dbench. In a perfect world the kernel would hit the hardware spped, guess no one is claiming that until 2.7 ;-) The initial results from the io scheduler, as posted here, look as if there will be a way to "take it up another notch" in the future. Thanks much for the clarification, the data are useful even if they do show room for improvement in the corner case. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 22:30 ` Bill Davidsen @ 2002-08-07 22:39 ` Rik van Riel 2002-08-07 23:44 ` Bill Davidsen 0 siblings, 1 reply; 56+ messages in thread From: Rik van Riel @ 2002-08-07 22:39 UTC (permalink / raw) To: Bill Davidsen Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Wed, 7 Aug 2002, Bill Davidsen wrote: > Thanks much for the clarification, the data are useful even if they do > show room for improvement in the corner case. If dbench numbers are meaningful to you, maybe you could translate them into something kernel developers can understand ? ;) Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 22:39 ` Rik van Riel @ 2002-08-07 23:44 ` Bill Davidsen 2002-08-07 23:53 ` Rik van Riel 0 siblings, 1 reply; 56+ messages in thread From: Bill Davidsen @ 2002-08-07 23:44 UTC (permalink / raw) To: Rik van Riel Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Wed, 7 Aug 2002, Rik van Riel wrote: > On Wed, 7 Aug 2002, Bill Davidsen wrote: > > > Thanks much for the clarification, the data are useful even if they do > > show room for improvement in the corner case. > > If dbench numbers are meaningful to you, maybe you could > translate them into something kernel developers can > understand ? ;) Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing isn't working as well, another problem, go have a beer to drown your sorrow. On the other hand if it runs much better, you have done a great job and can go have a beer to celebrate. Seriously, I would read the reasonably smooth curve of values as good sign, as opposed to "gets real badd and improves under more load" or similar. And the fact that it stayed up, and presumably didn't eat all the filesystems indicates that the system is getting more stable IDE. One more thing, if you have been fighting bad machines for 15 hours and no one is looking, it's time to go get a beer. And cashews, and cheddar. I am out of here (as in where I am working right now, not my office). -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 23:44 ` Bill Davidsen @ 2002-08-07 23:53 ` Rik van Riel 2002-08-09 17:46 ` Bill Davidsen 0 siblings, 1 reply; 56+ messages in thread From: Rik van Riel @ 2002-08-07 23:53 UTC (permalink / raw) To: Bill Davidsen Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole On Wed, 7 Aug 2002, Bill Davidsen wrote: > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing > isn't working as well, Are you volunteering to identify that "something" for us ? regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-07 23:53 ` Rik van Riel @ 2002-08-09 17:46 ` Bill Davidsen 2002-08-09 19:27 ` Rik van Riel 0 siblings, 1 reply; 56+ messages in thread From: Bill Davidsen @ 2002-08-09 17:46 UTC (permalink / raw) To: Rik van Riel; +Cc: lkml On Wed, 7 Aug 2002, Rik van Riel wrote: > On Wed, 7 Aug 2002, Bill Davidsen wrote: > > > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing > > isn't working as well, > > Are you volunteering to identify that "something" for us ? Hell no. I was simply commenting that there is some general qualitative information available from those numbers, even if it is hard to quantify them. Not working as well for a benchmark may indicate much better typical performance, and as I understand dbench the io scheduler may improve that significantly as well. No, clearly there are other, probably a lot more representative numbers, which show 2.5 is better. "Isn't working as well" for one thing doesn't mean "in general," but might be of interest to the primary developers. The fact that the curve doesn't end in a reload from backup tells me that the IDE code is much better that it was ;-) What time I have for diddling kernel code is spent on making network code changes, and is all against 2.4 base. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-09 17:46 ` Bill Davidsen @ 2002-08-09 19:27 ` Rik van Riel 0 siblings, 0 replies; 56+ messages in thread From: Rik van Riel @ 2002-08-09 19:27 UTC (permalink / raw) To: Bill Davidsen; +Cc: lkml On Fri, 9 Aug 2002, Bill Davidsen wrote: > On Wed, 7 Aug 2002, Rik van Riel wrote: > > On Wed, 7 Aug 2002, Bill Davidsen wrote: > > > > > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing > > > isn't working as well, > > > > Are you volunteering to identify that "something" for us ? > > Hell no. I was simply commenting that there is some general qualitative > information available from those numbers, even if it is hard to quantify > them. As long as there is nobody to interpret what the dbench numbers actually mean, why are we treating them as the most important thing around ? ;) Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti 2002-08-01 7:49 ` Jens Axboe @ 2002-08-01 7:55 ` Keith Owens 2002-08-01 8:10 ` Jens Axboe 2002-08-04 6:50 ` H. Peter Anvin 2002-08-01 11:32 ` Willy TARREAU ` (2 subsequent siblings) 4 siblings, 2 replies; 56+ messages in thread From: Keith Owens @ 2002-08-01 7:55 UTC (permalink / raw) To: Marcelo Tosatti, ftpadmin; +Cc: lkml patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and the signature have not been created yet. Is there a problem with the automatic conversion and signing code on master? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 7:55 ` Keith Owens @ 2002-08-01 8:10 ` Jens Axboe 2002-08-04 6:50 ` H. Peter Anvin 1 sibling, 0 replies; 56+ messages in thread From: Jens Axboe @ 2002-08-01 8:10 UTC (permalink / raw) To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml On Thu, Aug 01 2002, Keith Owens wrote: > patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and > the signature have not been created yet. Is there a problem with the > automatic conversion and signing code on master? that is slow, hwoever it's there now. -- Jens Axboe ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 7:55 ` Keith Owens 2002-08-01 8:10 ` Jens Axboe @ 2002-08-04 6:50 ` H. Peter Anvin 1 sibling, 0 replies; 56+ messages in thread From: H. Peter Anvin @ 2002-08-04 6:50 UTC (permalink / raw) To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml Keith Owens wrote: > patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and > the signature have not been created yet. Is there a problem with the > automatic conversion and signing code on master? The sign/convert/upload machinery is sometimes slow when it is either transferring large files, or doing its daily "rsync --checksum" for paranoia's sake. The latter happens at 00:00 local time, currently 17:00 UTC. -hpa ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti 2002-08-01 7:49 ` Jens Axboe 2002-08-01 7:55 ` Keith Owens @ 2002-08-01 11:32 ` Willy TARREAU 2002-08-01 13:54 ` Alan Cox 2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU 2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov 4 siblings, 1 reply; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 11:32 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml Hi Marcello, This is just a cleanup for the network devices configuration. Basically, the TOSHIBA TC35815 configuration entry appears just between DECchip Tulip, and the 2 Tulip-specific config lines which are indented so we could think that they are related to the TC35815 instead of the Tulip. You only see them when Tulip is enabled though. Here is the obvious fix against -rc5 which avoids this confusion : Cheers, Willy --- linux-2.4.19-rc5/drivers/net/Config.in.orig Thu Aug 1 13:26:58 2002 +++ linux-2.4.19-rc5/drivers/net/Config.in Thu Aug 1 13:27:14 2002 @@ -162,8 +162,8 @@ dep_tristate ' Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA dep_tristate ' CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA - dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI + dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then dep_bool ' New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL bool ' Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 11:32 ` Willy TARREAU @ 2002-08-01 13:54 ` Alan Cox 2002-08-01 12:48 ` Willy TARREAU 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-01 13:54 UTC (permalink / raw) To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote: > Hi Marcello, > > This is just a cleanup for the network devices configuration. > Basically, the TOSHIBA TC35815 configuration entry appears > just between DECchip Tulip, and the 2 Tulip-specific config lines > which are indented so we could think that they are related to > the TC35815 instead of the Tulip. This is true, but the fix wants tweaking - the file is supposed to bein basically Alphabetical order. Can you move the toshiba one down instead ? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 2002-08-01 13:54 ` Alan Cox @ 2002-08-01 12:48 ` Willy TARREAU 0 siblings, 0 replies; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 12:48 UTC (permalink / raw) To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 02:54:04PM +0100, Alan Cox wrote: > On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote: > > Hi Marcello, > > > > This is just a cleanup for the network devices configuration. > > Basically, the TOSHIBA TC35815 configuration entry appears > > just between DECchip Tulip, and the 2 Tulip-specific config lines > > which are indented so we could think that they are related to > > the TC35815 instead of the Tulip. > > This is true, but the fix wants tweaking - the file is supposed to bein > basically Alphabetical order. Can you move the toshiba one down instead > ? OK, in this case it goes just before VIA rhine. (BTW, [P]CI NE2000 is before [N]ovell, but I assume we're talking about [N]E2000). Marcelo, please ignore my previous patch in favor of this one. Cheers, Willy --- linux-2.4.19-rc5/drivers/net/Config.in.orig Thu Aug 1 14:43:09 2002 +++ linux-2.4.19-rc5/drivers/net/Config.in Thu Aug 1 14:44:29 2002 @@ -163,7 +163,6 @@ dep_tristate ' Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA dep_tristate ' CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI - dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then dep_bool ' New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL bool ' Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO @@ -195,6 +194,7 @@ if [ "$CONFIG_PCI" = "y" -o "$CONFIG_EISA" = "y" ]; then tristate ' TI ThunderLAN support' CONFIG_TLAN fi + dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI dep_tristate ' VIA Rhine support' CONFIG_VIA_RHINE $CONFIG_PCI dep_mbool ' Use MMIO instead of PIO (EXPERIMENTAL)' CONFIG_VIA_RHINE_MMIO $CONFIG_VIA_RHINE $CONFIG_EXPERIMENTAL dep_tristate ' Winbond W89c840 Ethernet support' CONFIG_WINBOND_840 $CONFIG_PCI ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 - APM bug 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti ` (2 preceding siblings ...) 2002-08-01 11:32 ` Willy TARREAU @ 2002-08-01 12:12 ` Willy TARREAU 2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU 2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov 4 siblings, 1 reply; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 12:12 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml Marcelo, I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off. This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought the problem. This is rather strange, since the crash occurs in do_softirq, but 2 bytes after the beginning of an instruction : c0120d09 fa cli c0120d0a 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer somewhere. Regards, Willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PANIC] APM bug with -rc4 and -rc5 2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU @ 2002-08-01 13:32 ` Willy TARREAU 2002-08-01 14:55 ` Alan Cox 0 siblings, 1 reply; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 13:32 UTC (permalink / raw) To: Alan Cox; +Cc: Marcelo Tosatti, lkml Marcelo, I've narrowed down the APM problem encountered in -rc5. In fact, it also affects -rc4, but not -rc3. I'm a bit stumped since the changes are not too heavy... The crash happens at the same place with -rc4 and -rc5 : 0xc0120d0c c0120cec: fb sti c0120ced: bb 40 a2 39 c0 mov $0xc039a240,%ebx c0120cf2: f7 c6 01 00 00 00 test $0x1,%esi c0120cf8: 74 08 je c0120d02 <do_softirq+0x72> c0120cfa: 53 push %ebx c0120cfb: 8b 03 mov (%ebx),%eax c0120cfd: ff d0 call *%eax c0120cff: 83 c4 04 add $0x4,%esp c0120d02: 83 c3 08 add $0x8,%ebx c0120d05: d1 ee shr %esi c0120d07: 75 e9 jne c0120cf2 <do_softirq+0x62> c0120d09: fa cli c0120d0a: 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi ^^ ^^ ^^ ^^ The processor branches here (2 bytes after local_irq_disable()) !! c0120d10: 85 fe test %edi,%esi c0120d12: 74 0c je c0120d20 <do_softirq+0x90> c0120d14: 89 f0 mov %esi,%eax c0120d16: f7 d0 not %eax c0120d18: 21 c7 and %eax,%edi c0120d1a: eb c6 jmp c0120ce2 <do_softirq+0x52> This code is from do_softirq() in kernel/softirq.c, lines 84-95 : local_irq_enable(); h = softirq_vec; do { if (pending & 1) h->action(h); h++; pending >>= 1; } while (pending); local_irq_disable(); The hand-written traces show that this function was correctly called by ksoftirqd(), which in turn was called by kernel_thread(). Part of the hand-written oops shows : EFLAGS=00010057 eax=00000900 ebx=c039a260 ecx=00000000 edx=c0390000 esi=00000000 edi=fffffff7 ebp=00000000 esp=c15b1fc8 Since softirq_vec is c039a240 in my System.map, I can deduce that h->action(h) has been called 4 times because it's 8 bytes long. <pending> is represented by %esi here, which is null. So this implies that it's not the call to h->action(h) which branched to this place. But int this case, I don't see how the CPU can branch here (a ret prehaps ?). I don't see in what this can be related to the "apm=power-off" case either. Alan, I believe you have the same mobo, but with two MPs on it. Although I've never had any SMP problem with XPs, did you notice anything strange with APM on 2.4.19-rc[45] ? I will check 2.4.19-rc3-ac5 to see if it hangs too... Cheers, Willy > Marcelo, > > I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off. > This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on > 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought > the problem. > > This is rather strange, since the crash occurs in do_softirq, but 2 bytes after > the beginning of an instruction : > c0120d09 fa cli > c0120d0a 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi > > The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer > somewhere. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5 2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU @ 2002-08-01 14:55 ` Alan Cox 2002-08-01 13:56 ` Willy Tarreau 0 siblings, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-01 14:55 UTC (permalink / raw) To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml On Thu, 2002-08-01 at 14:32, Willy TARREAU wrote: > > I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off. > > This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on > > 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought > > the problem. > > > > This is rather strange, since the crash occurs in do_softirq, but 2 bytes after I've only run -ac on the box (I need the IDE) and that has subtly different APM code. I do not however understand why it has changed behaviour. I could understand if it did it at the actual poweroff point but not earlier ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5 2002-08-01 14:55 ` Alan Cox @ 2002-08-01 13:56 ` Willy Tarreau 2002-08-01 15:24 ` Willy Tarreau 0 siblings, 1 reply; 56+ messages in thread From: Willy Tarreau @ 2002-08-01 13:56 UTC (permalink / raw) To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 03:55:32PM +0100, Alan Cox wrote: > I've only run -ac on the box (I need the IDE) and that has subtly > different APM code. I do not however understand why it has changed > behaviour. I could understand if it did it at the actual poweroff point > but not earlier Ok, thanks. I'll try to revert some patches from -rc4. But it looks more like a side effect IMHO. Perhaps the APM initialization code triggers one of the numerous bugs in the bios :-/ If I enable APM in the bios, the crash is somewhat different. I get about two pages of call traces looping back every 8 pointers. Seems like a memory corruption to me... 2.4.19-rc3-ac5 is OK, BTW. Cheers, Willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5 2002-08-01 13:56 ` Willy Tarreau @ 2002-08-01 15:24 ` Willy Tarreau 2002-08-01 16:53 ` Alan Cox 0 siblings, 1 reply; 56+ messages in thread From: Willy Tarreau @ 2002-08-01 15:24 UTC (permalink / raw) To: Willy Tarreau; +Cc: Alan Cox, Marcelo Tosatti, lkml > Ok, thanks. I'll try to revert some patches from -rc4. But it looks > more like a side effect IMHO. Perhaps the APM initialization code > triggers one of the numerous bugs in the bios :-/ It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c to the state of -rc3. Reverting clear_AC doesn't change anything, but the rest of the patch does. I don't know why, it seems correct at first glance. Perhaps old code hides a bug in the bios... Well, i don't know, I'm not enough aware of apm or vm86 internals to understand what's happening. Cheers, Willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5 2002-08-01 15:24 ` Willy Tarreau @ 2002-08-01 16:53 ` Alan Cox 2002-08-01 16:41 ` Willy Tarreau 2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU 0 siblings, 2 replies; 56+ messages in thread From: Alan Cox @ 2002-08-01 16:53 UTC (permalink / raw) To: Willy Tarreau; +Cc: Marcelo Tosatti, lkml On Thu, 2002-08-01 at 16:24, Willy Tarreau wrote: > > Ok, thanks. I'll try to revert some patches from -rc4. But it looks > > more like a side effect IMHO. Perhaps the APM initialization code > > triggers one of the numerous bugs in the bios :-/ > > It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c > to the state of -rc3. Reverting clear_AC doesn't change anything, but the > rest of the patch does. I don't know why, it seems correct at first glance. > Perhaps old code hides a bug in the bios... Well, i don't know, I'm not > enough aware of apm or vm86 internals to understand what's happening. Very curious indeed because someone else reported that rc3-ac5 works (which has the same vm86 code). In addition the vm86 handler in the kernel isnt actually used for APM. We make 32bit APM calls and the one 16bit case we do is a true return to real mode. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5 2002-08-01 16:53 ` Alan Cox @ 2002-08-01 16:41 ` Willy Tarreau 2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU 1 sibling, 0 replies; 56+ messages in thread From: Willy Tarreau @ 2002-08-01 16:41 UTC (permalink / raw) To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote: > Very curious indeed because someone else reported that rc3-ac5 works > (which has the same vm86 code). In addition the vm86 handler in the > kernel isnt actually used for APM. We make 32bit APM calls and the one > 16bit case we do is a true return to real mode. well, I saw it wrong. In fact, sometimes the system boots OK if it is after a warm boot, and it seems that all the tests I've done with "old" vm86 code were done from a warm boot. Now I can confirm that from a cold boot, it also panics. And you're right about rc3-ac5, since it also works for me. Still searching... Willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] solved APM bug with -rc5 2002-08-01 16:53 ` Alan Cox 2002-08-01 16:41 ` Willy Tarreau @ 2002-08-01 20:35 ` Willy TARREAU 2002-08-01 20:52 ` Richard Gooch 2002-08-01 22:16 ` Alan Cox 1 sibling, 2 replies; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 20:35 UTC (permalink / raw) To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote: > Very curious indeed because someone else reported that rc3-ac5 works > (which has the same vm86 code). In addition the vm86 handler in the > kernel isnt actually used for APM. We make 32bit APM calls and the one > 16bit case we do is a true return to real mode. I finally got rid of it ! I now understand why it hanged randomly, and why I spent lots of time adding/removing unrelated patches. It's because in apm=power-off mode (SMP), a kernel thread is started for the apm() function, which does bios calls. And sometimes, the bios is called from CPU >0, which my bios doesn't like at all, thus explaining why the oopses were corrupted. By copying a piece of code somewhere else in the same file, I could force apm() to be used only by CPU0. I could verify that it doesn't crash anymore, and that I can also crash it on demand if I force CPU1. The bonus is that I could re-enable the debug code in this function even in SMP mode since we're sure that it runs on CPU0. Here is the patch against 2.4.19-rc5. Marcelo, Alan, please review and apply. Cheers, Willy diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002 +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002 @@ -1661,6 +1661,17 @@ strcpy(current->comm, "kapmd"); sigfillset(¤t->blocked); +#ifdef CONFIG_SMP + /* 2002/08/01 - WT + * This is to avoid random crashes at boot time during initialization + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D. + * Some bioses don't like being called from CPU != 0. + */ + while (cpu_number_map(smp_processor_id()) != 0) { + schedule(); + } +#endif + if (apm_info.connection_version == 0) { apm_info.connection_version = apm_info.bios.version; if (apm_info.connection_version > 0x100) { @@ -1707,7 +1718,7 @@ } } - if (debug && (smp_num_cpus == 1)) { + if (debug) { error = apm_get_power_status(&bx, &cx, &dx); if (error) printk(KERN_INFO "apm: power status not available\n"); ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU @ 2002-08-01 20:52 ` Richard Gooch 2002-08-01 20:54 ` Richard Gooch 2002-08-01 20:58 ` Dave Jones 2002-08-01 22:16 ` Alan Cox 1 sibling, 2 replies; 56+ messages in thread From: Richard Gooch @ 2002-08-01 20:52 UTC (permalink / raw) To: Willy TARREAU; +Cc: Alan Cox, Marcelo Tosatti, lkml Willy TARREAU writes: > I finally got rid of it ! I now understand why it hanged randomly, and > why I spent lots of time adding/removing unrelated patches. It's because > in apm=power-off mode (SMP), a kernel thread is started for the apm() > function, which does bios calls. And sometimes, the bios is called from > CPU >0, which my bios doesn't like at all, thus explaining why the oopses > were corrupted. [...] > diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c > --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002 > +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002 > @@ -1661,6 +1661,17 @@ > strcpy(current->comm, "kapmd"); > sigfillset(¤t->blocked); > > +#ifdef CONFIG_SMP > + /* 2002/08/01 - WT > + * This is to avoid random crashes at boot time during initialization > + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D. > + * Some bioses don't like being called from CPU != 0. > + */ > + while (cpu_number_map(smp_processor_id()) != 0) { > + schedule(); > + } > +#endif > + Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the wonderful world of preemption means that you can get rescheduled on another CPU without warning, unless you take a lock or explicitely disable preemption. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 20:52 ` Richard Gooch @ 2002-08-01 20:54 ` Richard Gooch 2002-08-01 21:17 ` Willy TARREAU 2002-08-01 20:58 ` Dave Jones 1 sibling, 1 reply; 56+ messages in thread From: Richard Gooch @ 2002-08-01 20:54 UTC (permalink / raw) To: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml Richard Gooch writes: > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the > wonderful world of preemption means that you can get rescheduled on > another CPU without warning, unless you take a lock or explicitely > disable preemption. Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and doesn't exist on 2.4 (thankfully). Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 20:54 ` Richard Gooch @ 2002-08-01 21:17 ` Willy TARREAU 2002-08-01 22:37 ` Alan Cox 0 siblings, 1 reply; 56+ messages in thread From: Willy TARREAU @ 2002-08-01 21:17 UTC (permalink / raw) To: Richard Gooch; +Cc: lkml On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote: > Richard Gooch writes: > > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the > > wonderful world of preemption means that you can get rescheduled on > > another CPU without warning, unless you take a lock or explicitely > > disable preemption. > > Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and > doesn't exist on 2.4 (thankfully). Never mind, your comment is interesting anyway because it shows that preemption patch for 2.4 needs to adapt to such updates. Thanks, willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 21:17 ` Willy TARREAU @ 2002-08-01 22:37 ` Alan Cox 0 siblings, 0 replies; 56+ messages in thread From: Alan Cox @ 2002-08-01 22:37 UTC (permalink / raw) To: Willy TARREAU; +Cc: Richard Gooch, lkml On Thu, 2002-08-01 at 22:17, Willy TARREAU wrote: > On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote: > > Richard Gooch writes: > > > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the > > > wonderful world of preemption means that you can get rescheduled on > > > another CPU without warning, unless you take a lock or explicitely > > > disable preemption. > > > > Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and > > doesn't exist on 2.4 (thankfully). > > Never mind, your comment is interesting anyway because it shows that > preemption patch for 2.4 needs to adapt to such updates. Pre-emption for 2.4 needs to do a lot of work on raid and even athlon compiles to fix the FPU stuff, let alone corner cases ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 20:52 ` Richard Gooch 2002-08-01 20:54 ` Richard Gooch @ 2002-08-01 20:58 ` Dave Jones 1 sibling, 0 replies; 56+ messages in thread From: Dave Jones @ 2002-08-01 20:58 UTC (permalink / raw) To: Richard Gooch; +Cc: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 02:52:16PM -0600, Richard Gooch wrote: > > diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c > > --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002 > > +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002 > > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the > wonderful world of preemption means that you can get rescheduled on > another CPU without warning, unless you take a lock or explicitely > disable preemption. It's a 2.4 patch. Leave preemption problems to those insane enough to run 2.4+preempt. Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU 2002-08-01 20:52 ` Richard Gooch @ 2002-08-01 22:16 ` Alan Cox 2002-08-01 21:07 ` Willy Tarreau 2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU 1 sibling, 2 replies; 56+ messages in thread From: Alan Cox @ 2002-08-01 22:16 UTC (permalink / raw) To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote: > > +#ifdef CONFIG_SMP > + /* 2002/08/01 - WT > + * This is to avoid random crashes at boot time during initialization > + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D. > + * Some bioses don't like being called from CPU != 0. > + */ > + while (cpu_number_map(smp_processor_id()) != 0) { > + schedule(); > + } > +#endif What guarantees that loop will ever exit ? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 22:16 ` Alan Cox @ 2002-08-01 21:07 ` Willy Tarreau 2002-08-01 21:47 ` Linus Torvalds 2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU 1 sibling, 1 reply; 56+ messages in thread From: Willy Tarreau @ 2002-08-01 21:07 UTC (permalink / raw) To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote: > On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote: > > + while (cpu_number_map(smp_processor_id()) != 0) { > > + schedule(); > > + } > What guarantees that loop will ever exit ? none, as in the already existing other implementation. But at least, I'd prefer an infinite loop instead of some random code being executed without noticing it. Do you know a better way of doing that ? The other implementation used a fake thread which also did a schedule(). I wonder if this is to make the scheduler work a bit more so that we get more chances to swap the CPU. Cheers, Willy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5 2002-08-01 21:07 ` Willy Tarreau @ 2002-08-01 21:47 ` Linus Torvalds 0 siblings, 0 replies; 56+ messages in thread From: Linus Torvalds @ 2002-08-01 21:47 UTC (permalink / raw) To: linux-kernel In article <20020801210745.GA20387@alpha.home.local>, Willy Tarreau <willy@w.ods.org> wrote: >On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote: >> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote: >> > + while (cpu_number_map(smp_processor_id()) != 0) { >> > + schedule(); >> > + } > >> What guarantees that loop will ever exit ? > >none, as in the already existing other implementation. But at least, I'd >prefer an infinite loop instead of some random code being executed without >noticing it. > >Do you know a better way of doing that ? It should set its CPU affinity to be cpu0. I don't know how well that works in 2.4.x, though. Ask Ingo.. Linus ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] solved APM bug with -rc5 (take 2) 2002-08-01 22:16 ` Alan Cox 2002-08-01 21:07 ` Willy Tarreau @ 2002-08-02 0:12 ` Willy TARREAU 1 sibling, 0 replies; 56+ messages in thread From: Willy TARREAU @ 2002-08-02 0:12 UTC (permalink / raw) To: Alan Cox, Marcelo Tosatti; +Cc: Linus Torvalds, Ingo Molnar, lkml On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote: > On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote: > > +#ifdef CONFIG_SMP > > + /* 2002/08/01 - WT > > + * This is to avoid random crashes at boot time during initialization > > + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D. > > + * Some bioses don't like being called from CPU != 0. > > + */ > > + while (cpu_number_map(smp_processor_id()) != 0) { > > + schedule(); > > + } > > +#endif > > What guarantees that loop will ever exit ? I asked Ingo for some advice, and he gently sent me a piece of code as an example of how to reliably bind a task to a CPU. I tried it, and it's OK here. I could reliably switch several times from cpu0 to cpu1, then back to cpu0. Since it was cleaner than the previous method, I also did the same for apm_power_off(), thus getting rid of apm_magic() and its dedicated thread. Then again, I tested with multiple cpu switches, and every time, my system correctly handles the case. I'm writing this mail under 2.4.19-rc5. So here is the patch against 2.4.19-rc5, hoping it will get in this time. I think it should apply without a glitch to 2.4.19-rc5-ac1, but don't know about 2.5, nor even if it is needed. Feedback welcome, of course ;-) Cheers, Willy --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002 +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Fri Aug 2 01:52:55 2002 @@ -862,14 +862,6 @@ apm_do_busy(); } -#ifdef CONFIG_SMP -static int apm_magic(void * unused) -{ - while (1) - schedule(); -} -#endif - /** * apm_power_off - ask the BIOS to power off * @@ -897,10 +889,11 @@ */ #ifdef CONFIG_SMP /* Some bioses don't like being called from CPU != 0 */ - while (cpu_number_map(smp_processor_id()) != 0) { - kernel_thread(apm_magic, NULL, - CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD); + if (cpu_number_map(smp_processor_id()) != 0) { + current->cpus_allowed = 1; schedule(); + if (unlikely(cpu_number_map(smp_processor_id()) != 0)) + BUG(); } #endif if (apm_info.realmode_power_off) @@ -1661,6 +1654,21 @@ strcpy(current->comm, "kapmd"); sigfillset(¤t->blocked); +#ifdef CONFIG_SMP + /* 2002/08/01 - WT + * This is to avoid random crashes at boot time during initialization + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D. + * Some bioses don't like being called from CPU != 0. + * Method suggested by Ingo Molnar. + */ + if (cpu_number_map(smp_processor_id()) != 0) { + current->cpus_allowed = 1; + schedule(); + if (unlikely(cpu_number_map(smp_processor_id()) != 0)) + BUG(); + } +#endif + if (apm_info.connection_version == 0) { apm_info.connection_version = apm_info.bios.version; if (apm_info.connection_version > 0x100) { @@ -1707,7 +1715,7 @@ } } - if (debug && (smp_num_cpus == 1)) { + if (debug) { error = apm_get_power_status(&bx, &cx, &dx); if (error) printk(KERN_INFO "apm: power status not available\n"); ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] pdc20265 problem. 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti ` (3 preceding siblings ...) 2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU @ 2002-08-02 1:47 ` Nick Orlov 2002-08-02 2:29 ` Nick Orlov 2002-08-02 12:27 ` Alan Cox 4 siblings, 2 replies; 56+ messages in thread From: Nick Orlov @ 2002-08-02 1:47 UTC (permalink / raw) To: lkml [-- Attachment #1: Type: text/plain, Size: 329 bytes --] > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak Because of this fix my Promise 20265 became ide0 instead of ide2. Is there any reason to mark pdc20265 as ON_BOARD controller? Anyway, attached patch fix it for me :) -- With best wishes, Nick Orlov. [-- Attachment #2: pcd20265.patch --] [-- Type: text/plain, Size: 301 bytes --] 408c408 < {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, ON_BOARD, 48 }, --- > {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 }, ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem. 2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov @ 2002-08-02 2:29 ` Nick Orlov 2002-08-02 12:27 ` Alan Cox 1 sibling, 0 replies; 56+ messages in thread From: Nick Orlov @ 2002-08-02 2:29 UTC (permalink / raw) To: lkml [-- Attachment #1: Type: text/plain, Size: 458 bytes --] On Thu, Aug 01, 2002 at 09:47:28PM -0400, Nick Orlov wrote: > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak > > Because of this fix my Promise 20265 became ide0 instead of ide2. > Is there any reason to mark pdc20265 as ON_BOARD controller? > > Anyway, attached patch fix it for me :) > Sorry, wrong diff format. Rediffed and attached. -- With best wishes, Nick Orlov. [-- Attachment #2: pdc20265.patch --] [-- Type: text/plain, Size: 1067 bytes --] --- linux/drivers/ide/ide-pci.c.orig 2002-08-01 21:41:29.000000000 -0400 +++ linux/drivers/ide/ide-pci.c 2002-08-01 21:10:27.000000000 -0400 @@ -405,7 +405,7 @@ #ifndef CONFIG_PDC202XX_FORCE {DEVID_PDC20246,"PDC20246", PCI_PDC202XX, NULL, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 16 }, {DEVID_PDC20262,"PDC20262", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 }, - {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, ON_BOARD, 48 }, + {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 }, {DEVID_PDC20267,"PDC20267", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 }, #else /* !CONFIG_PDC202XX_FORCE */ {DEVID_PDC20246,"PDC20246", PCI_PDC202XX, NULL, INIT_PDC202XX, NULL, {{0x50,0x02,0x02}, {0x50,0x04,0x04}}, OFF_BOARD, 16 }, ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem. 2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov 2002-08-02 2:29 ` Nick Orlov @ 2002-08-02 12:27 ` Alan Cox 2002-08-02 12:52 ` Nick Orlov 1 sibling, 1 reply; 56+ messages in thread From: Alan Cox @ 2002-08-02 12:27 UTC (permalink / raw) To: Nick Orlov; +Cc: lkml On Fri, 2002-08-02 at 02:47, Nick Orlov wrote: > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak > > Because of this fix my Promise 20265 became ide0 instead of ide2. > Is there any reason to mark pdc20265 as ON_BOARD controller? How about because it can be and it should be checked. I don't know what is going on with the ifdef in your case to cause this but its not as simple as it seems ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem. 2002-08-02 12:27 ` Alan Cox @ 2002-08-02 12:52 ` Nick Orlov 2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz 0 siblings, 1 reply; 56+ messages in thread From: Nick Orlov @ 2002-08-02 12:52 UTC (permalink / raw) To: lkml On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote: > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote: > > > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak > > > > Because of this fix my Promise 20265 became ide0 instead of ide2. > > Is there any reason to mark pdc20265 as ON_BOARD controller? > > How about because it can be and it should be checked. I don't know what > is going on with the ifdef in your case to cause this but its not as > simple as it seems Why pdc20265 is so special ? All other Promises marked as OFF_BOARD... And what determines how id will be assigned to controllers if both of them are ON_BOARD ? -- With best wishes, Nick Orlov. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem. 2002-08-02 12:52 ` Nick Orlov @ 2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz 2002-08-02 14:45 ` Nick Orlov 0 siblings, 1 reply; 56+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2002-08-02 14:00 UTC (permalink / raw) To: Nick Orlov; +Cc: lkml On Fri, 2 Aug 2002, Nick Orlov wrote: > On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote: > > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote: > > > > > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > > > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak > > > > > > Because of this fix my Promise 20265 became ide0 instead of ide2. > > > Is there any reason to mark pdc20265 as ON_BOARD controller? > > > > How about because it can be and it should be checked. I don't know what > > is going on with the ifdef in your case to cause this but its not as > > simple as it seems > > Why pdc20265 is so special ? All other Promises marked as OFF_BOARD... > > And what determines how id will be assigned to controllers if both of > them are ON_BOARD ? AFAIR problem is that some vendors included onboard 20265 as primary device (playing tricks for that) and to be consistent we have to treat it as onboard, we have right now no way to check if it is on or offboard. EDD support will probably help here. Regards -- Bartlomiej > -- > With best wishes, > Nick Orlov. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem. 2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz @ 2002-08-02 14:45 ` Nick Orlov 0 siblings, 0 replies; 56+ messages in thread From: Nick Orlov @ 2002-08-02 14:45 UTC (permalink / raw) To: lkml On Fri, Aug 02, 2002 at 04:00:32PM +0200, Bartlomiej Zolnierkiewicz wrote: > > On Fri, 2 Aug 2002, Nick Orlov wrote: > > > On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote: > > > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote: > > > > > > > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646) > > > > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak > > > > > > > > Because of this fix my Promise 20265 became ide0 instead of ide2. > > > > Is there any reason to mark pdc20265 as ON_BOARD controller? > > > > > > How about because it can be and it should be checked. I don't know what > > > is going on with the ifdef in your case to cause this but its not as > > > simple as it seems > > > > Why pdc20265 is so special ? All other Promises marked as OFF_BOARD... > > > > And what determines how id will be assigned to controllers if both of > > them are ON_BOARD ? > > AFAIR problem is that some vendors included onboard 20265 as primary > device (playing tricks for that) and to be consistent we have to treat it as > onboard, we have right now no way to check if it is on or offboard. > EDD support will probably help here. > Just FYI, before these "#ifdef" fixes it was treated as OFF_BOARD unless CONFIG_PDC202XX_FORCE is set. (now it's inverted) And my point is that it does not matter how physically this controller installed - onboard or offboard. Idea is that we should have control which controller should be treated as "primary" (ide0/1) and which as "secondary" (ide2/3). I don't see/know how we can do it unless we mark one of controllers ON_BOARD and another OFF_BOARD and play with CONFIG_BLK_DEV_OFFBOARD. And also I don't believe that this is good idea to treat one of Promises so differently. -- With best wishes, Nick Orlov. ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2002-08-09 19:24 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti 2002-08-01 7:49 ` Jens Axboe 2002-08-01 7:14 ` Marcelo Tosatti 2002-08-01 8:10 ` Jens Axboe 2002-08-01 9:02 ` Andrew Morton 2002-08-01 8:58 ` Jens Axboe 2002-08-01 14:45 ` Steven Cole 2002-08-01 18:57 ` Andrew Morton 2002-08-01 20:15 ` Steven Cole 2002-08-06 3:46 ` Bill Davidsen 2002-08-06 4:30 ` Andrew Morton 2002-08-06 14:07 ` Steven Cole 2002-08-06 14:20 ` Rik van Riel 2002-08-06 17:12 ` Andrew Morton 2002-08-06 5:42 ` Jens Axboe 2002-08-06 8:30 ` Adrian Bunk 2002-08-06 8:48 ` Jens Axboe 2002-08-06 10:31 ` Lincoln Dale 2002-08-06 12:59 ` Rik van Riel 2002-08-07 1:09 ` Bill Davidsen 2002-08-07 2:54 ` Steven Cole 2002-08-07 22:30 ` Bill Davidsen 2002-08-07 22:39 ` Rik van Riel 2002-08-07 23:44 ` Bill Davidsen 2002-08-07 23:53 ` Rik van Riel 2002-08-09 17:46 ` Bill Davidsen 2002-08-09 19:27 ` Rik van Riel 2002-08-01 7:55 ` Keith Owens 2002-08-01 8:10 ` Jens Axboe 2002-08-04 6:50 ` H. Peter Anvin 2002-08-01 11:32 ` Willy TARREAU 2002-08-01 13:54 ` Alan Cox 2002-08-01 12:48 ` Willy TARREAU 2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU 2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU 2002-08-01 14:55 ` Alan Cox 2002-08-01 13:56 ` Willy Tarreau 2002-08-01 15:24 ` Willy Tarreau 2002-08-01 16:53 ` Alan Cox 2002-08-01 16:41 ` Willy Tarreau 2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU 2002-08-01 20:52 ` Richard Gooch 2002-08-01 20:54 ` Richard Gooch 2002-08-01 21:17 ` Willy TARREAU 2002-08-01 22:37 ` Alan Cox 2002-08-01 20:58 ` Dave Jones 2002-08-01 22:16 ` Alan Cox 2002-08-01 21:07 ` Willy Tarreau 2002-08-01 21:47 ` Linus Torvalds 2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU 2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov 2002-08-02 2:29 ` Nick Orlov 2002-08-02 12:27 ` Alan Cox 2002-08-02 12:52 ` Nick Orlov 2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz 2002-08-02 14:45 ` Nick Orlov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).