* Linux v2.4.19-rc5
@ 2002-08-01 6:38 Marcelo Tosatti
2002-08-01 7:49 ` Jens Axboe
` (4 more replies)
0 siblings, 5 replies; 56+ messages in thread
From: Marcelo Tosatti @ 2002-08-01 6:38 UTC (permalink / raw)
To: lkml
One of the -rc4 fixes was not correct and -rc4 missed an important SMP
race "fix" on the block layer.
Summary of changes from v2.4.19-rc4 to v2.4.19-rc5
============================================
<davem@redhat.com> (02/08/01 1.662)
[PATCH] Correct openprom fix
<davem@redhat.com> (02/07/31 1.661)
[PATCH] Add missing check to openprom driver
<akpm@zip.com.au> (02/08/01 1.663)
[PATCH] disable READA
<marcelo@plucky.distro.conectiva> (02/08/01 1.664)
Change EXTRAVERSION to -rc5
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 7:49 ` Jens Axboe
@ 2002-08-01 7:14 ` Marcelo Tosatti
2002-08-01 8:10 ` Jens Axboe
2002-08-01 20:15 ` Steven Cole
0 siblings, 2 replies; 56+ messages in thread
From: Marcelo Tosatti @ 2002-08-01 7:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: lkml, Andrew Morton
On Thu, 1 Aug 2002, Jens Axboe wrote:
> On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > <akpm@zip.com.au> (02/08/01 1.663)
> > [PATCH] disable READA
>
> Since -rc5 is not to be found yet, I don't know what version of this
> made it in. Is READA just being disabled on SMP, or was it the general
> #if 0 change that got included?
Its being disabled on UP and SMP. I dont like having such readahead IO
mode working only for UP.
> I'm asking since plain disabling READA might have nasty performance
> effects. Andrew, I bet you did some numbers on this, care to share?
If thats true (the performance effects) I'll release -final with IMO not
very coherent READA semantics :)
Anyway, lets wait for the numbers.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
@ 2002-08-01 7:49 ` Jens Axboe
2002-08-01 7:14 ` Marcelo Tosatti
2002-08-01 7:55 ` Keith Owens
` (3 subsequent siblings)
4 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2002-08-01 7:49 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml, Andrew Morton
On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> <akpm@zip.com.au> (02/08/01 1.663)
> [PATCH] disable READA
Since -rc5 is not to be found yet, I don't know what version of this
made it in. Is READA just being disabled on SMP, or was it the general
#if 0 change that got included? I'm asking since plain disabling READA
might have nasty performance effects. Andrew, I bet you did some numbers
on this, care to share?
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
2002-08-01 7:49 ` Jens Axboe
@ 2002-08-01 7:55 ` Keith Owens
2002-08-01 8:10 ` Jens Axboe
2002-08-04 6:50 ` H. Peter Anvin
2002-08-01 11:32 ` Willy TARREAU
` (2 subsequent siblings)
4 siblings, 2 replies; 56+ messages in thread
From: Keith Owens @ 2002-08-01 7:55 UTC (permalink / raw)
To: Marcelo Tosatti, ftpadmin; +Cc: lkml
patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
the signature have not been created yet. Is there a problem with the
automatic conversion and signing code on master?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 7:14 ` Marcelo Tosatti
@ 2002-08-01 8:10 ` Jens Axboe
2002-08-01 9:02 ` Andrew Morton
2002-08-01 20:15 ` Steven Cole
1 sibling, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2002-08-01 8:10 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml, Andrew Morton
On Thu, Aug 01 2002, Marcelo Tosatti wrote:
>
> On Thu, 1 Aug 2002, Jens Axboe wrote:
>
> > On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > > <akpm@zip.com.au> (02/08/01 1.663)
> > > [PATCH] disable READA
> >
> > Since -rc5 is not to be found yet, I don't know what version of this
> > made it in. Is READA just being disabled on SMP, or was it the general
> > #if 0 change that got included?
>
> Its being disabled on UP and SMP. I dont like having such readahead IO
> mode working only for UP.
You are right, that would be ugly. Should only be the last resort.
> > I'm asking since plain disabling READA might have nasty performance
> > effects. Andrew, I bet you did some numbers on this, care to share?
>
> If thats true (the performance effects) I'll release -final with IMO not
> very coherent READA semantics :)
>
> Anyway, lets wait for the numbers.
It just 'feels' like the sort of change that might have odd side
effects.
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 7:55 ` Keith Owens
@ 2002-08-01 8:10 ` Jens Axboe
2002-08-04 6:50 ` H. Peter Anvin
1 sibling, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-01 8:10 UTC (permalink / raw)
To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml
On Thu, Aug 01 2002, Keith Owens wrote:
> patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
> the signature have not been created yet. Is there a problem with the
> automatic conversion and signing code on master?
that is slow, hwoever it's there now.
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 9:02 ` Andrew Morton
@ 2002-08-01 8:58 ` Jens Axboe
2002-08-01 14:45 ` Steven Cole
1 sibling, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-01 8:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: Marcelo Tosatti, lkml
On Thu, Aug 01 2002, Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > ...
> > > Anyway, lets wait for the numbers.
> >
> > It just 'feels' like the sort of change that might have odd side
> > effects.
>
> It's almost impossible to get READA to do anything. For example, in
> current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
> reports a "buffer I/O error". Every time. And nobody has reported this.
Ahem, I've actually seen that happen :-). But maybe a total of 20 times
or so.
> It _is_ possible to hit this in 2.5, because of ext2_preread_inode().
>
> Probably, also it's possible to hit it in 2.4 with hundreds of processes
> all issuing ext3 directory readahead. But it's pretty remote.
Alright, I'm happy then.
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 8:10 ` Jens Axboe
@ 2002-08-01 9:02 ` Andrew Morton
2002-08-01 8:58 ` Jens Axboe
2002-08-01 14:45 ` Steven Cole
0 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-01 9:02 UTC (permalink / raw)
To: Jens Axboe; +Cc: Marcelo Tosatti, lkml
Jens Axboe wrote:
>
> ...
> > Anyway, lets wait for the numbers.
>
> It just 'feels' like the sort of change that might have odd side
> effects.
It's almost impossible to get READA to do anything. For example, in
current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
reports a "buffer I/O error". Every time. And nobody has reported this.
It _is_ possible to hit this in 2.5, because of ext2_preread_inode().
Probably, also it's possible to hit it in 2.4 with hundreds of processes
all issuing ext3 directory readahead. But it's pretty remote.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
2002-08-01 7:49 ` Jens Axboe
2002-08-01 7:55 ` Keith Owens
@ 2002-08-01 11:32 ` Willy TARREAU
2002-08-01 13:54 ` Alan Cox
2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov
4 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 11:32 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml
Hi Marcello,
This is just a cleanup for the network devices configuration.
Basically, the TOSHIBA TC35815 configuration entry appears
just between DECchip Tulip, and the 2 Tulip-specific config lines
which are indented so we could think that they are related to
the TC35815 instead of the Tulip.
You only see them when Tulip is enabled though.
Here is the obvious fix against -rc5 which avoids this confusion :
Cheers,
Willy
--- linux-2.4.19-rc5/drivers/net/Config.in.orig Thu Aug 1 13:26:58 2002
+++ linux-2.4.19-rc5/drivers/net/Config.in Thu Aug 1 13:27:14 2002
@@ -162,8 +162,8 @@
dep_tristate ' Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA
dep_tristate ' CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA
- dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
+ dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then
dep_bool ' New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL
bool ' Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5 - APM bug
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
` (2 preceding siblings ...)
2002-08-01 11:32 ` Willy TARREAU
@ 2002-08-01 12:12 ` Willy TARREAU
2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov
4 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 12:12 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml
Marcelo,
I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
the problem.
This is rather strange, since the crash occurs in do_softirq, but 2 bytes after
the beginning of an instruction :
c0120d09 fa cli
c0120d0a 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi
The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer
somewhere.
Regards,
Willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 13:54 ` Alan Cox
@ 2002-08-01 12:48 ` Willy TARREAU
0 siblings, 0 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 12:48 UTC (permalink / raw)
To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 02:54:04PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote:
> > Hi Marcello,
> >
> > This is just a cleanup for the network devices configuration.
> > Basically, the TOSHIBA TC35815 configuration entry appears
> > just between DECchip Tulip, and the 2 Tulip-specific config lines
> > which are indented so we could think that they are related to
> > the TC35815 instead of the Tulip.
>
> This is true, but the fix wants tweaking - the file is supposed to bein
> basically Alphabetical order. Can you move the toshiba one down instead
> ?
OK, in this case it goes just before VIA rhine. (BTW, [P]CI NE2000 is before
[N]ovell, but I assume we're talking about [N]E2000).
Marcelo, please ignore my previous patch in favor of this one.
Cheers,
Willy
--- linux-2.4.19-rc5/drivers/net/Config.in.orig Thu Aug 1 14:43:09 2002
+++ linux-2.4.19-rc5/drivers/net/Config.in Thu Aug 1 14:44:29 2002
@@ -163,7 +163,6 @@
dep_tristate ' Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA
dep_tristate ' CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA
dep_tristate ' DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
- dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then
dep_bool ' New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL
bool ' Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO
@@ -195,6 +194,7 @@
if [ "$CONFIG_PCI" = "y" -o "$CONFIG_EISA" = "y" ]; then
tristate ' TI ThunderLAN support' CONFIG_TLAN
fi
+ dep_tristate ' TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
dep_tristate ' VIA Rhine support' CONFIG_VIA_RHINE $CONFIG_PCI
dep_mbool ' Use MMIO instead of PIO (EXPERIMENTAL)' CONFIG_VIA_RHINE_MMIO $CONFIG_VIA_RHINE $CONFIG_EXPERIMENTAL
dep_tristate ' Winbond W89c840 Ethernet support' CONFIG_WINBOND_840 $CONFIG_PCI
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PANIC] APM bug with -rc4 and -rc5
2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
@ 2002-08-01 13:32 ` Willy TARREAU
2002-08-01 14:55 ` Alan Cox
0 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 13:32 UTC (permalink / raw)
To: Alan Cox; +Cc: Marcelo Tosatti, lkml
Marcelo,
I've narrowed down the APM problem encountered in -rc5. In fact, it also
affects -rc4, but not -rc3. I'm a bit stumped since the changes are not
too heavy...
The crash happens at the same place with -rc4 and -rc5 : 0xc0120d0c
c0120cec: fb sti
c0120ced: bb 40 a2 39 c0 mov $0xc039a240,%ebx
c0120cf2: f7 c6 01 00 00 00 test $0x1,%esi
c0120cf8: 74 08 je c0120d02 <do_softirq+0x72>
c0120cfa: 53 push %ebx
c0120cfb: 8b 03 mov (%ebx),%eax
c0120cfd: ff d0 call *%eax
c0120cff: 83 c4 04 add $0x4,%esp
c0120d02: 83 c3 08 add $0x8,%ebx
c0120d05: d1 ee shr %esi
c0120d07: 75 e9 jne c0120cf2 <do_softirq+0x62>
c0120d09: fa cli
c0120d0a: 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi
^^ ^^ ^^ ^^
The processor branches here (2 bytes after local_irq_disable()) !!
c0120d10: 85 fe test %edi,%esi
c0120d12: 74 0c je c0120d20 <do_softirq+0x90>
c0120d14: 89 f0 mov %esi,%eax
c0120d16: f7 d0 not %eax
c0120d18: 21 c7 and %eax,%edi
c0120d1a: eb c6 jmp c0120ce2 <do_softirq+0x52>
This code is from do_softirq() in kernel/softirq.c, lines 84-95 :
local_irq_enable();
h = softirq_vec;
do {
if (pending & 1)
h->action(h);
h++;
pending >>= 1;
} while (pending);
local_irq_disable();
The hand-written traces show that this function was correctly called by
ksoftirqd(), which in turn was called by kernel_thread().
Part of the hand-written oops shows :
EFLAGS=00010057
eax=00000900 ebx=c039a260 ecx=00000000 edx=c0390000
esi=00000000 edi=fffffff7 ebp=00000000 esp=c15b1fc8
Since softirq_vec is c039a240 in my System.map, I can deduce that h->action(h)
has been called 4 times because it's 8 bytes long. <pending> is represented
by %esi here, which is null. So this implies that it's not the call to h->action(h)
which branched to this place. But int this case, I don't see how the CPU
can branch here (a ret prehaps ?). I don't see in what this can be related to
the "apm=power-off" case either.
Alan, I believe you have the same mobo, but with two MPs on it. Although I've
never had any SMP problem with XPs, did you notice anything strange with APM
on 2.4.19-rc[45] ? I will check 2.4.19-rc3-ac5 to see if it hangs too...
Cheers,
Willy
> Marcelo,
>
> I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
> This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
> 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
> the problem.
>
> This is rather strange, since the crash occurs in do_softirq, but 2 bytes after
> the beginning of an instruction :
> c0120d09 fa cli
> c0120d0a 8b b5 80 17 3c c0 mov 0xc03c1780(%ebp),%esi
>
> The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer
> somewhere.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 11:32 ` Willy TARREAU
@ 2002-08-01 13:54 ` Alan Cox
2002-08-01 12:48 ` Willy TARREAU
0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-01 13:54 UTC (permalink / raw)
To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml
On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote:
> Hi Marcello,
>
> This is just a cleanup for the network devices configuration.
> Basically, the TOSHIBA TC35815 configuration entry appears
> just between DECchip Tulip, and the 2 Tulip-specific config lines
> which are indented so we could think that they are related to
> the TC35815 instead of the Tulip.
This is true, but the fix wants tweaking - the file is supposed to bein
basically Alphabetical order. Can you move the toshiba one down instead
?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5
2002-08-01 14:55 ` Alan Cox
@ 2002-08-01 13:56 ` Willy Tarreau
2002-08-01 15:24 ` Willy Tarreau
0 siblings, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 13:56 UTC (permalink / raw)
To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 03:55:32PM +0100, Alan Cox wrote:
> I've only run -ac on the box (I need the IDE) and that has subtly
> different APM code. I do not however understand why it has changed
> behaviour. I could understand if it did it at the actual poweroff point
> but not earlier
Ok, thanks. I'll try to revert some patches from -rc4. But it looks
more like a side effect IMHO. Perhaps the APM initialization code
triggers one of the numerous bugs in the bios :-/
If I enable APM in the bios, the crash is somewhat different. I get
about two pages of call traces looping back every 8 pointers.
Seems like a memory corruption to me...
2.4.19-rc3-ac5 is OK, BTW.
Cheers,
Willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 9:02 ` Andrew Morton
2002-08-01 8:58 ` Jens Axboe
@ 2002-08-01 14:45 ` Steven Cole
2002-08-01 18:57 ` Andrew Morton
1 sibling, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-01 14:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole
On Thu, 2002-08-01 at 03:02, Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > ...
> > > Anyway, lets wait for the numbers.
> >
> > It just 'feels' like the sort of change that might have odd side
> > effects.
>
> It's almost impossible to get READA to do anything. For example, in
> current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
> reports a "buffer I/O error". Every time. And nobody has reported this.
>
> It _is_ possible to hit this in 2.5, because of ext2_preread_inode().
>
> Probably, also it's possible to hit it in 2.4 with hundreds of processes
> all issuing ext3 directory readahead. But it's pretty remote.
I've never seen this on 2.4.19-rc3 and I've been beating on it pretty
hard, running dbench 128 many times. However, 2.5 is another story.
This might not be the best thread to report this, but since the subject
came up, I'm getting the following message with recent 2.5.x kernels
whenever I run relatively large numbers of dbench clients.
Buffer I/O error on device sd(8,8), logical block XXXXXXX
where logical block repeats 0-6 times. This behavior is repeatable, but
only occurs under fairly high load. I ran dbench with increasing numbers
of clients, with the following results:
dbench clients Buffer I/O error messages
>=48 0
52 1
56 0
64 0
80 11
96 9
112 7
128 4
This particular run was with 2.5.29 with rmap13b and slabLRU patches, but the behavior with 2.5.29-vanilla was similar. Kernel is SMP, no preempt,
and /dev/sda8 where dbench was running was mounted ext2.
The test box is 2-way p3, SCSI, 1GB memory.
Time to go beat on -rc5 and see if anything falls out.
Steven
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5
2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
@ 2002-08-01 14:55 ` Alan Cox
2002-08-01 13:56 ` Willy Tarreau
0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-01 14:55 UTC (permalink / raw)
To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml
On Thu, 2002-08-01 at 14:32, Willy TARREAU wrote:
> > I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
> > This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
> > 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
> > the problem.
> >
> > This is rather strange, since the crash occurs in do_softirq, but 2 bytes after
I've only run -ac on the box (I need the IDE) and that has subtly
different APM code. I do not however understand why it has changed
behaviour. I could understand if it did it at the actual poweroff point
but not earlier
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5
2002-08-01 13:56 ` Willy Tarreau
@ 2002-08-01 15:24 ` Willy Tarreau
2002-08-01 16:53 ` Alan Cox
0 siblings, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 15:24 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Alan Cox, Marcelo Tosatti, lkml
> Ok, thanks. I'll try to revert some patches from -rc4. But it looks
> more like a side effect IMHO. Perhaps the APM initialization code
> triggers one of the numerous bugs in the bios :-/
It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c
to the state of -rc3. Reverting clear_AC doesn't change anything, but the
rest of the patch does. I don't know why, it seems correct at first glance.
Perhaps old code hides a bug in the bios... Well, i don't know, I'm not
enough aware of apm or vm86 internals to understand what's happening.
Cheers,
Willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5
2002-08-01 16:53 ` Alan Cox
@ 2002-08-01 16:41 ` Willy Tarreau
2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU
1 sibling, 0 replies; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 16:41 UTC (permalink / raw)
To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote:
> Very curious indeed because someone else reported that rc3-ac5 works
> (which has the same vm86 code). In addition the vm86 handler in the
> kernel isnt actually used for APM. We make 32bit APM calls and the one
> 16bit case we do is a true return to real mode.
well, I saw it wrong. In fact, sometimes the system boots OK if it
is after a warm boot, and it seems that all the tests I've done with
"old" vm86 code were done from a warm boot. Now I can confirm that
from a cold boot, it also panics. And you're right about rc3-ac5,
since it also works for me.
Still searching...
Willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PANIC] APM bug with -rc4 and -rc5
2002-08-01 15:24 ` Willy Tarreau
@ 2002-08-01 16:53 ` Alan Cox
2002-08-01 16:41 ` Willy Tarreau
2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU
0 siblings, 2 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 16:53 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Marcelo Tosatti, lkml
On Thu, 2002-08-01 at 16:24, Willy Tarreau wrote:
> > Ok, thanks. I'll try to revert some patches from -rc4. But it looks
> > more like a side effect IMHO. Perhaps the APM initialization code
> > triggers one of the numerous bugs in the bios :-/
>
> It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c
> to the state of -rc3. Reverting clear_AC doesn't change anything, but the
> rest of the patch does. I don't know why, it seems correct at first glance.
> Perhaps old code hides a bug in the bios... Well, i don't know, I'm not
> enough aware of apm or vm86 internals to understand what's happening.
Very curious indeed because someone else reported that rc3-ac5 works
(which has the same vm86 code). In addition the vm86 handler in the
kernel isnt actually used for APM. We make 32bit APM calls and the one
16bit case we do is a true return to real mode.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 14:45 ` Steven Cole
@ 2002-08-01 18:57 ` Andrew Morton
0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-01 18:57 UTC (permalink / raw)
To: Steven Cole; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole
Steven Cole wrote:
>
> ...
> I've never seen this on 2.4.19-rc3 and I've been beating on it pretty
> hard, running dbench 128 many times. However, 2.5 is another story.
>
> This might not be the best thread to report this, but since the subject
> came up, I'm getting the following message with recent 2.5.x kernels
> whenever I run relatively large numbers of dbench clients.
>
> Buffer I/O error on device sd(8,8), logical block XXXXXXX
>
> where logical block repeats 0-6 times. This behavior is repeatable, but
> only occurs under fairly high load. I ran dbench with increasing numbers
> of clients, with the following results:
>
> dbench clients Buffer I/O error messages
> >=48 0
> 52 1
> 56 0
> 64 0
> 80 11
> 96 9
> 112 7
> 128 4
Yup. The printk is bogus - I thought I'd removed it a couple of
kernels ago.
It's a bit sad that an abandoned readahead attempt is indistinguishable
from a dead disk.
-
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 7:14 ` Marcelo Tosatti
2002-08-01 8:10 ` Jens Axboe
@ 2002-08-01 20:15 ` Steven Cole
2002-08-06 3:46 ` Bill Davidsen
1 sibling, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-01 20:15 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Jens Axboe, lkml, Andrew Morton, Steven Cole
On Thu, 2002-08-01 at 01:14, Marcelo Tosatti wrote:
>
> On Thu, 1 Aug 2002, Jens Axboe wrote:
>
> > On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > > <akpm@zip.com.au> (02/08/01 1.663)
> > > [PATCH] disable READA
> >
> > Since -rc5 is not to be found yet, I don't know what version of this
> > made it in. Is READA just being disabled on SMP, or was it the general
> > #if 0 change that got included?
>
> Its being disabled on UP and SMP. I dont like having such readahead IO
> mode working only for UP.
>
> > I'm asking since plain disabling READA might have nasty performance
> > effects. Andrew, I bet you did some numbers on this, care to share?
>
> If thats true (the performance effects) I'll release -final with IMO not
> very coherent READA semantics :)
>
> Anyway, lets wait for the numbers.
Marcelo,
Here are some dbench numbers, from the "for what it's worth" department.
This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
The first column is dbench clients. The numbers are throughput
in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes.
Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
I've also ran this set of tests several times on -rc5 using ext3
and data=writeback, and everything looks fine.
Steven
2.4.19-rc2 2.4.19-rc5 2.5.29
1 114.616 113.402 112.668
2 173.234 183.829 175.148
3 185.995 187.411 184.63
4 185.447 186.891 188.199
6 191.115 191.439 191.787
8 191.962 191.551 191.53
10 192.984 194.036 194.923
12 183.847 185.73 195.328
16 183.609 183.439 196.224
20 181.519 179.956 193.681
24 183.509 183.387 194.09
28 176.04 175.832 169.326
32 174.583 163.09 137.815
36 155.04 164.154 121.861
40 155.37 156.028 102.014
44 152.546 138.171 91.6088
48 146.419 135.447 84.3884
52 139.788 125.968 89.2374
56 113.933 122.592 81.021
64 110.792 106.484 84.648
80 87.4692 60.6054
96 87.7201 57.9622
112 74.9503 49.468
128 67.2649 47.0254
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] solved APM bug with -rc5
2002-08-01 16:53 ` Alan Cox
2002-08-01 16:41 ` Willy Tarreau
@ 2002-08-01 20:35 ` Willy TARREAU
2002-08-01 20:52 ` Richard Gooch
2002-08-01 22:16 ` Alan Cox
1 sibling, 2 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 20:35 UTC (permalink / raw)
To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote:
> Very curious indeed because someone else reported that rc3-ac5 works
> (which has the same vm86 code). In addition the vm86 handler in the
> kernel isnt actually used for APM. We make 32bit APM calls and the one
> 16bit case we do is a true return to real mode.
I finally got rid of it ! I now understand why it hanged randomly, and
why I spent lots of time adding/removing unrelated patches. It's because
in apm=power-off mode (SMP), a kernel thread is started for the apm()
function, which does bios calls. And sometimes, the bios is called from
CPU >0, which my bios doesn't like at all, thus explaining why the oopses
were corrupted.
By copying a piece of code somewhere else in the same file, I could force
apm() to be used only by CPU0. I could verify that it doesn't crash anymore,
and that I can also crash it on demand if I force CPU1.
The bonus is that I could re-enable the debug code in this function even
in SMP mode since we're sure that it runs on CPU0.
Here is the patch against 2.4.19-rc5. Marcelo, Alan, please review and apply.
Cheers,
Willy
diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
--- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002
+++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002
@@ -1661,6 +1661,17 @@
strcpy(current->comm, "kapmd");
sigfillset(¤t->blocked);
+#ifdef CONFIG_SMP
+ /* 2002/08/01 - WT
+ * This is to avoid random crashes at boot time during initialization
+ * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
+ * Some bioses don't like being called from CPU != 0.
+ */
+ while (cpu_number_map(smp_processor_id()) != 0) {
+ schedule();
+ }
+#endif
+
if (apm_info.connection_version == 0) {
apm_info.connection_version = apm_info.bios.version;
if (apm_info.connection_version > 0x100) {
@@ -1707,7 +1718,7 @@
}
}
- if (debug && (smp_num_cpus == 1)) {
+ if (debug) {
error = apm_get_power_status(&bx, &cx, &dx);
if (error)
printk(KERN_INFO "apm: power status not available\n");
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU
@ 2002-08-01 20:52 ` Richard Gooch
2002-08-01 20:54 ` Richard Gooch
2002-08-01 20:58 ` Dave Jones
2002-08-01 22:16 ` Alan Cox
1 sibling, 2 replies; 56+ messages in thread
From: Richard Gooch @ 2002-08-01 20:52 UTC (permalink / raw)
To: Willy TARREAU; +Cc: Alan Cox, Marcelo Tosatti, lkml
Willy TARREAU writes:
> I finally got rid of it ! I now understand why it hanged randomly, and
> why I spent lots of time adding/removing unrelated patches. It's because
> in apm=power-off mode (SMP), a kernel thread is started for the apm()
> function, which does bios calls. And sometimes, the bios is called from
> CPU >0, which my bios doesn't like at all, thus explaining why the oopses
> were corrupted.
[...]
> diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
> --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002
> +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002
> @@ -1661,6 +1661,17 @@
> strcpy(current->comm, "kapmd");
> sigfillset(¤t->blocked);
>
> +#ifdef CONFIG_SMP
> + /* 2002/08/01 - WT
> + * This is to avoid random crashes at boot time during initialization
> + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> + * Some bioses don't like being called from CPU != 0.
> + */
> + while (cpu_number_map(smp_processor_id()) != 0) {
> + schedule();
> + }
> +#endif
> +
Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
wonderful world of preemption means that you can get rescheduled on
another CPU without warning, unless you take a lock or explicitely
disable preemption.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 20:52 ` Richard Gooch
@ 2002-08-01 20:54 ` Richard Gooch
2002-08-01 21:17 ` Willy TARREAU
2002-08-01 20:58 ` Dave Jones
1 sibling, 1 reply; 56+ messages in thread
From: Richard Gooch @ 2002-08-01 20:54 UTC (permalink / raw)
To: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml
Richard Gooch writes:
> Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> wonderful world of preemption means that you can get rescheduled on
> another CPU without warning, unless you take a lock or explicitely
> disable preemption.
Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
doesn't exist on 2.4 (thankfully).
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 20:52 ` Richard Gooch
2002-08-01 20:54 ` Richard Gooch
@ 2002-08-01 20:58 ` Dave Jones
1 sibling, 0 replies; 56+ messages in thread
From: Dave Jones @ 2002-08-01 20:58 UTC (permalink / raw)
To: Richard Gooch; +Cc: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 02:52:16PM -0600, Richard Gooch wrote:
> > diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
> > --- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002
> > +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Thu Aug 1 22:26:56 2002
>
> Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> wonderful world of preemption means that you can get rescheduled on
> another CPU without warning, unless you take a lock or explicitely
> disable preemption.
It's a 2.4 patch. Leave preemption problems to those insane
enough to run 2.4+preempt.
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 22:16 ` Alan Cox
@ 2002-08-01 21:07 ` Willy Tarreau
2002-08-01 21:47 ` Linus Torvalds
2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
1 sibling, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 21:07 UTC (permalink / raw)
To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml
On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
> > + while (cpu_number_map(smp_processor_id()) != 0) {
> > + schedule();
> > + }
> What guarantees that loop will ever exit ?
none, as in the already existing other implementation. But at least, I'd
prefer an infinite loop instead of some random code being executed without
noticing it.
Do you know a better way of doing that ? The other implementation
used a fake thread which also did a schedule(). I wonder if this
is to make the scheduler work a bit more so that we get more
chances to swap the CPU.
Cheers,
Willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 20:54 ` Richard Gooch
@ 2002-08-01 21:17 ` Willy TARREAU
2002-08-01 22:37 ` Alan Cox
0 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 21:17 UTC (permalink / raw)
To: Richard Gooch; +Cc: lkml
On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote:
> Richard Gooch writes:
> > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> > wonderful world of preemption means that you can get rescheduled on
> > another CPU without warning, unless you take a lock or explicitely
> > disable preemption.
>
> Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
> doesn't exist on 2.4 (thankfully).
Never mind, your comment is interesting anyway because it shows that
preemption patch for 2.4 needs to adapt to such updates.
Thanks,
willy
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 21:07 ` Willy Tarreau
@ 2002-08-01 21:47 ` Linus Torvalds
0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2002-08-01 21:47 UTC (permalink / raw)
To: linux-kernel
In article <20020801210745.GA20387@alpha.home.local>,
Willy Tarreau <willy@w.ods.org> wrote:
>On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
>> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
>> > + while (cpu_number_map(smp_processor_id()) != 0) {
>> > + schedule();
>> > + }
>
>> What guarantees that loop will ever exit ?
>
>none, as in the already existing other implementation. But at least, I'd
>prefer an infinite loop instead of some random code being executed without
>noticing it.
>
>Do you know a better way of doing that ?
It should set its CPU affinity to be cpu0. I don't know how well that
works in 2.4.x, though. Ask Ingo..
Linus
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU
2002-08-01 20:52 ` Richard Gooch
@ 2002-08-01 22:16 ` Alan Cox
2002-08-01 21:07 ` Willy Tarreau
2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
1 sibling, 2 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 22:16 UTC (permalink / raw)
To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml
On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
>
> +#ifdef CONFIG_SMP
> + /* 2002/08/01 - WT
> + * This is to avoid random crashes at boot time during initialization
> + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> + * Some bioses don't like being called from CPU != 0.
> + */
> + while (cpu_number_map(smp_processor_id()) != 0) {
> + schedule();
> + }
> +#endif
What guarantees that loop will ever exit ?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] solved APM bug with -rc5
2002-08-01 21:17 ` Willy TARREAU
@ 2002-08-01 22:37 ` Alan Cox
0 siblings, 0 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 22:37 UTC (permalink / raw)
To: Willy TARREAU; +Cc: Richard Gooch, lkml
On Thu, 2002-08-01 at 22:17, Willy TARREAU wrote:
> On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote:
> > Richard Gooch writes:
> > > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> > > wonderful world of preemption means that you can get rescheduled on
> > > another CPU without warning, unless you take a lock or explicitely
> > > disable preemption.
> >
> > Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
> > doesn't exist on 2.4 (thankfully).
>
> Never mind, your comment is interesting anyway because it shows that
> preemption patch for 2.4 needs to adapt to such updates.
Pre-emption for 2.4 needs to do a lot of work on raid and even athlon
compiles to fix the FPU stuff, let alone corner cases
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] solved APM bug with -rc5 (take 2)
2002-08-01 22:16 ` Alan Cox
2002-08-01 21:07 ` Willy Tarreau
@ 2002-08-02 0:12 ` Willy TARREAU
1 sibling, 0 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-02 0:12 UTC (permalink / raw)
To: Alan Cox, Marcelo Tosatti; +Cc: Linus Torvalds, Ingo Molnar, lkml
On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
> > +#ifdef CONFIG_SMP
> > + /* 2002/08/01 - WT
> > + * This is to avoid random crashes at boot time during initialization
> > + * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> > + * Some bioses don't like being called from CPU != 0.
> > + */
> > + while (cpu_number_map(smp_processor_id()) != 0) {
> > + schedule();
> > + }
> > +#endif
>
> What guarantees that loop will ever exit ?
I asked Ingo for some advice, and he gently sent me a piece of code as an
example of how to reliably bind a task to a CPU. I tried it, and it's OK here.
I could reliably switch several times from cpu0 to cpu1, then back to cpu0.
Since it was cleaner than the previous method, I also did the same for
apm_power_off(), thus getting rid of apm_magic() and its dedicated thread.
Then again, I tested with multiple cpu switches, and every time, my system
correctly handles the case. I'm writing this mail under 2.4.19-rc5.
So here is the patch against 2.4.19-rc5, hoping it will get in this time.
I think it should apply without a glitch to 2.4.19-rc5-ac1, but don't
know about 2.5, nor even if it is needed.
Feedback welcome, of course ;-)
Cheers,
Willy
--- linux-2.4.19-rc5/arch/i386/kernel/apm.c Thu Aug 1 22:07:39 2002
+++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c Fri Aug 2 01:52:55 2002
@@ -862,14 +862,6 @@
apm_do_busy();
}
-#ifdef CONFIG_SMP
-static int apm_magic(void * unused)
-{
- while (1)
- schedule();
-}
-#endif
-
/**
* apm_power_off - ask the BIOS to power off
*
@@ -897,10 +889,11 @@
*/
#ifdef CONFIG_SMP
/* Some bioses don't like being called from CPU != 0 */
- while (cpu_number_map(smp_processor_id()) != 0) {
- kernel_thread(apm_magic, NULL,
- CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
+ if (cpu_number_map(smp_processor_id()) != 0) {
+ current->cpus_allowed = 1;
schedule();
+ if (unlikely(cpu_number_map(smp_processor_id()) != 0))
+ BUG();
}
#endif
if (apm_info.realmode_power_off)
@@ -1661,6 +1654,21 @@
strcpy(current->comm, "kapmd");
sigfillset(¤t->blocked);
+#ifdef CONFIG_SMP
+ /* 2002/08/01 - WT
+ * This is to avoid random crashes at boot time during initialization
+ * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
+ * Some bioses don't like being called from CPU != 0.
+ * Method suggested by Ingo Molnar.
+ */
+ if (cpu_number_map(smp_processor_id()) != 0) {
+ current->cpus_allowed = 1;
+ schedule();
+ if (unlikely(cpu_number_map(smp_processor_id()) != 0))
+ BUG();
+ }
+#endif
+
if (apm_info.connection_version == 0) {
apm_info.connection_version = apm_info.bios.version;
if (apm_info.connection_version > 0x100) {
@@ -1707,7 +1715,7 @@
}
}
- if (debug && (smp_num_cpus == 1)) {
+ if (debug) {
error = apm_get_power_status(&bx, &cx, &dx);
if (error)
printk(KERN_INFO "apm: power status not available\n");
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] pdc20265 problem.
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
` (3 preceding siblings ...)
2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
@ 2002-08-02 1:47 ` Nick Orlov
2002-08-02 2:29 ` Nick Orlov
2002-08-02 12:27 ` Alan Cox
4 siblings, 2 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 1:47 UTC (permalink / raw)
To: lkml
[-- Attachment #1: Type: text/plain, Size: 329 bytes --]
> <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
Because of this fix my Promise 20265 became ide0 instead of ide2.
Is there any reason to mark pdc20265 as ON_BOARD controller?
Anyway, attached patch fix it for me :)
--
With best wishes,
Nick Orlov.
[-- Attachment #2: pcd20265.patch --]
[-- Type: text/plain, Size: 301 bytes --]
408c408
< {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, ON_BOARD, 48 },
---
> {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 },
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem.
2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov
@ 2002-08-02 2:29 ` Nick Orlov
2002-08-02 12:27 ` Alan Cox
1 sibling, 0 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 2:29 UTC (permalink / raw)
To: lkml
[-- Attachment #1: Type: text/plain, Size: 458 bytes --]
On Thu, Aug 01, 2002 at 09:47:28PM -0400, Nick Orlov wrote:
>
> > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
>
> Because of this fix my Promise 20265 became ide0 instead of ide2.
> Is there any reason to mark pdc20265 as ON_BOARD controller?
>
> Anyway, attached patch fix it for me :)
>
Sorry, wrong diff format. Rediffed and attached.
--
With best wishes,
Nick Orlov.
[-- Attachment #2: pdc20265.patch --]
[-- Type: text/plain, Size: 1067 bytes --]
--- linux/drivers/ide/ide-pci.c.orig 2002-08-01 21:41:29.000000000 -0400
+++ linux/drivers/ide/ide-pci.c 2002-08-01 21:10:27.000000000 -0400
@@ -405,7 +405,7 @@
#ifndef CONFIG_PDC202XX_FORCE
{DEVID_PDC20246,"PDC20246", PCI_PDC202XX, NULL, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 16 },
{DEVID_PDC20262,"PDC20262", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 },
- {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, ON_BOARD, 48 },
+ {DEVID_PDC20265,"PDC20265", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 },
{DEVID_PDC20267,"PDC20267", PCI_PDC202XX, ATA66_PDC202XX, INIT_PDC202XX, NULL, {{0x00,0x00,0x00}, {0x00,0x00,0x00}}, OFF_BOARD, 48 },
#else /* !CONFIG_PDC202XX_FORCE */
{DEVID_PDC20246,"PDC20246", PCI_PDC202XX, NULL, INIT_PDC202XX, NULL, {{0x50,0x02,0x02}, {0x50,0x04,0x04}}, OFF_BOARD, 16 },
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem.
2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov
2002-08-02 2:29 ` Nick Orlov
@ 2002-08-02 12:27 ` Alan Cox
2002-08-02 12:52 ` Nick Orlov
1 sibling, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02 12:27 UTC (permalink / raw)
To: Nick Orlov; +Cc: lkml
On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
>
> > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
>
> Because of this fix my Promise 20265 became ide0 instead of ide2.
> Is there any reason to mark pdc20265 as ON_BOARD controller?
How about because it can be and it should be checked. I don't know what
is going on with the ifdef in your case to cause this but its not as
simple as it seems
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem.
2002-08-02 12:27 ` Alan Cox
@ 2002-08-02 12:52 ` Nick Orlov
2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz
0 siblings, 1 reply; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 12:52 UTC (permalink / raw)
To: lkml
On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> >
> > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> >
> > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > Is there any reason to mark pdc20265 as ON_BOARD controller?
>
> How about because it can be and it should be checked. I don't know what
> is going on with the ifdef in your case to cause this but its not as
> simple as it seems
Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...
And what determines how id will be assigned to controllers if both of
them are ON_BOARD ?
--
With best wishes,
Nick Orlov.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem.
2002-08-02 12:52 ` Nick Orlov
@ 2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz
2002-08-02 14:45 ` Nick Orlov
0 siblings, 1 reply; 56+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2002-08-02 14:00 UTC (permalink / raw)
To: Nick Orlov; +Cc: lkml
On Fri, 2 Aug 2002, Nick Orlov wrote:
> On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> > >
> > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> > >
> > > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > > Is there any reason to mark pdc20265 as ON_BOARD controller?
> >
> > How about because it can be and it should be checked. I don't know what
> > is going on with the ifdef in your case to cause this but its not as
> > simple as it seems
>
> Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...
>
> And what determines how id will be assigned to controllers if both of
> them are ON_BOARD ?
AFAIR problem is that some vendors included onboard 20265 as primary
device (playing tricks for that) and to be consistent we have to treat it as
onboard, we have right now no way to check if it is on or offboard.
EDD support will probably help here.
Regards
--
Bartlomiej
> --
> With best wishes,
> Nick Orlov.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] pdc20265 problem.
2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz
@ 2002-08-02 14:45 ` Nick Orlov
0 siblings, 0 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 14:45 UTC (permalink / raw)
To: lkml
On Fri, Aug 02, 2002 at 04:00:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
>
> On Fri, 2 Aug 2002, Nick Orlov wrote:
>
> > On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> > > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> > > >
> > > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > > > Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> > > >
> > > > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > > > Is there any reason to mark pdc20265 as ON_BOARD controller?
> > >
> > > How about because it can be and it should be checked. I don't know what
> > > is going on with the ifdef in your case to cause this but its not as
> > > simple as it seems
> >
> > Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...
> >
> > And what determines how id will be assigned to controllers if both of
> > them are ON_BOARD ?
>
> AFAIR problem is that some vendors included onboard 20265 as primary
> device (playing tricks for that) and to be consistent we have to treat it as
> onboard, we have right now no way to check if it is on or offboard.
> EDD support will probably help here.
>
Just FYI,
before these "#ifdef" fixes it was treated as OFF_BOARD unless
CONFIG_PDC202XX_FORCE is set. (now it's inverted)
And my point is that it does not matter how physically this controller
installed - onboard or offboard. Idea is that we should have control
which controller should be treated as "primary" (ide0/1) and which as
"secondary" (ide2/3). I don't see/know how we can do it unless we mark
one of controllers ON_BOARD and another OFF_BOARD and play with
CONFIG_BLK_DEV_OFFBOARD.
And also I don't believe that this is good idea to treat one of Promises so
differently.
--
With best wishes,
Nick Orlov.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 7:55 ` Keith Owens
2002-08-01 8:10 ` Jens Axboe
@ 2002-08-04 6:50 ` H. Peter Anvin
1 sibling, 0 replies; 56+ messages in thread
From: H. Peter Anvin @ 2002-08-04 6:50 UTC (permalink / raw)
To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml
Keith Owens wrote:
> patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
> the signature have not been created yet. Is there a problem with the
> automatic conversion and signing code on master?
The sign/convert/upload machinery is sometimes slow when it is either
transferring large files, or doing its daily "rsync --checksum" for
paranoia's sake. The latter happens at 00:00 local time, currently
17:00 UTC.
-hpa
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-01 20:15 ` Steven Cole
@ 2002-08-06 3:46 ` Bill Davidsen
2002-08-06 4:30 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Bill Davidsen @ 2002-08-06 3:46 UTC (permalink / raw)
To: Steven Cole; +Cc: Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole
On 1 Aug 2002, Steven Cole wrote:
> Here are some dbench numbers, from the "for what it's worth" department.
> This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> The first column is dbench clients. The numbers are throughput
> in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes.
> Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> I've also ran this set of tests several times on -rc5 using ext3
> and data=writeback, and everything looks fine.
>
> Steven
Call me an optimist, but after all the reliability problems we had win the
2.5 series, I sort of hoped it would be better in performance, not
increasingly worse. Am I misreading this? Can we fall back to the faster
2.4 code :-(
> 2.4.19-rc2 2.4.19-rc5 2.5.29
>
> 1 114.616 113.402 112.668
> 2 173.234 183.829 175.148
> 3 185.995 187.411 184.63
> 4 185.447 186.891 188.199
> 6 191.115 191.439 191.787
> 8 191.962 191.551 191.53
> 10 192.984 194.036 194.923
> 12 183.847 185.73 195.328
> 16 183.609 183.439 196.224
> 20 181.519 179.956 193.681
> 24 183.509 183.387 194.09
> 28 176.04 175.832 169.326
> 32 174.583 163.09 137.815
> 36 155.04 164.154 121.861
> 40 155.37 156.028 102.014
> 44 152.546 138.171 91.6088
> 48 146.419 135.447 84.3884
> 52 139.788 125.968 89.2374
> 56 113.933 122.592 81.021
> 64 110.792 106.484 84.648
> 80 87.4692 60.6054
> 96 87.7201 57.9622
> 112 74.9503 49.468
> 128 67.2649 47.0254
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 3:46 ` Bill Davidsen
@ 2002-08-06 4:30 ` Andrew Morton
2002-08-06 14:07 ` Steven Cole
2002-08-06 5:42 ` Jens Axboe
2002-08-06 12:59 ` Rik van Riel
2 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2002-08-06 4:30 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Steven Cole
Bill Davidsen wrote:
>
> On 1 Aug 2002, Steven Cole wrote:
>
> > Here are some dbench numbers, from the "for what it's worth" department.
> > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> > The first column is dbench clients. The numbers are throughput
> > in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes.
> > Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> > I've also ran this set of tests several times on -rc5 using ext3
> > and data=writeback, and everything looks fine.
> >
> > Steven
>
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(
IO in 2.5 is much more CPU efficient that in 2.4, and straight-line
bandwidth is better as well.
The scheduling of that IO has a few problems, so in wildly seeky loads
like dbench the kernel still falls over its own feet a bit. The
two main culprits here are the lock_buffer() in block_write_full_page()
against the blockdev mapping, and the writeback of dirty pages from the
tail of the LRU in page reclaim.
And no, the eventual dbench numbers will not be a measure of the success
of the tuning which will happen on the run in to 2.6. Dbench throughput
may well be lower, because we probably should be starting writeback
at lower dirty thresholds.
If you want good dbench numbers:
echo 70 > /proc/sys/vm/dirty_background_ratio
echo 75 > /proc/sys/vm/dirty_async_ratio
echo 80 > /proc/sys/vm/dirty_sync_ratio
echo 30000 > /proc/sys/vm/dirty_expire_centisecs
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 3:46 ` Bill Davidsen
2002-08-06 4:30 ` Andrew Morton
@ 2002-08-06 5:42 ` Jens Axboe
2002-08-06 8:30 ` Adrian Bunk
2002-08-06 10:31 ` Lincoln Dale
2002-08-06 12:59 ` Rik van Riel
2 siblings, 2 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-06 5:42 UTC (permalink / raw)
To: Bill Davidsen
Cc: Steven Cole, Marcelo Tosatti, lkml, Andrew Morton, Steven Cole
On Mon, Aug 05 2002, Bill Davidsen wrote:
> On 1 Aug 2002, Steven Cole wrote:
>
> > Here are some dbench numbers, from the "for what it's worth" department.
> > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> > The first column is dbench clients. The numbers are throughput
> > in MB/sec. The 2.5.29 kernel had a few RR-supplied smp fixes.
> > Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> > I've also ran this set of tests several times on -rc5 using ext3
> > and data=writeback, and everything looks fine.
> >
> > Steven
>
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(
try a work load that excercises the block i/o layer alone (O_DIRECT,
raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
from ols, unfortunately I don't know if they have then online.
please don't put too much wait in dbench numbers for this sort of thing
:-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 5:42 ` Jens Axboe
@ 2002-08-06 8:30 ` Adrian Bunk
2002-08-06 8:48 ` Jens Axboe
2002-08-06 10:31 ` Lincoln Dale
1 sibling, 1 reply; 56+ messages in thread
From: Adrian Bunk @ 2002-08-06 8:30 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bill Davidsen, lkml
On Tue, 6 Aug 2002, Jens Axboe wrote:
>...
> try a work load that excercises the block i/o layer alone (O_DIRECT,
> raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
> from ols, unfortunately I don't know if they have then online.
>...
Pages 390-406 in
http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
or are you talking about something different?
cu
Adrian
--
You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 8:30 ` Adrian Bunk
@ 2002-08-06 8:48 ` Jens Axboe
0 siblings, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-06 8:48 UTC (permalink / raw)
To: Adrian Bunk; +Cc: Bill Davidsen, lkml
On Tue, Aug 06 2002, Adrian Bunk wrote:
> On Tue, 6 Aug 2002, Jens Axboe wrote:
>
> >...
> > try a work load that excercises the block i/o layer alone (O_DIRECT,
> > raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
> > from ols, unfortunately I don't know if they have then online.
> >...
>
> Pages 390-406 in
>
> http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
>
> or are you talking about something different?
Right thanks, exactly those. Table 3 on page 395 is the one I noted.
Forget readv, as that hasn't been done in 2.5 yet. I'd say a 2.5.17
untweaked kernel beating 2.4 tweaked beyond recognition isn't too shabby
for a devel series kernel.
--
Jens Axboe
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 5:42 ` Jens Axboe
2002-08-06 8:30 ` Adrian Bunk
@ 2002-08-06 10:31 ` Lincoln Dale
1 sibling, 0 replies; 56+ messages in thread
From: Lincoln Dale @ 2002-08-06 10:31 UTC (permalink / raw)
To: Jens Axboe
Cc: Bill Davidsen, Steven Cole, Marcelo Tosatti, lkml, Andrew Morton,
Steven Cole
At 07:42 AM 6/08/2002 +0200, Jens Axboe wrote:
> > Call me an optimist, but after all the reliability problems we had win the
> > 2.5 series, I sort of hoped it would be better in performance, not
> > increasingly worse. Am I misreading this? Can we fall back to the faster
> > 2.4 code :-(
>
>try a work load that excercises the block i/o layer alone (O_DIRECT,
>raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
>from ols, unfortunately I don't know if they have then online.
the BIO in 2.5 kicks butt over the 2.4 BIO - both in terms of increased
throughput and decreased cpu utilization.
see some testing i previously did:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102635456620627&w=2
cheers,
lincoln.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 3:46 ` Bill Davidsen
2002-08-06 4:30 ` Andrew Morton
2002-08-06 5:42 ` Jens Axboe
@ 2002-08-06 12:59 ` Rik van Riel
2002-08-07 1:09 ` Bill Davidsen
2 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-06 12:59 UTC (permalink / raw)
To: Bill Davidsen
Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Mon, 5 Aug 2002, Bill Davidsen wrote:
> > Here are some dbench numbers, from the "for what it's worth" department.
>
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(
Dbench is at its best when half (or more) of the dbench processes
are stuck semi-infinitely in __get_request_wait and the others can
operate in RAM without ever touching the disk.
In effect, if you want the best dbench throughput you should make
the system completely unsuitable for real world applications ;)
There are a few things that are good for both real world performance
and dbench performance, but those are easily dwarved by random factors
like IO scheduling, timeslice length, etc...
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 4:30 ` Andrew Morton
@ 2002-08-06 14:07 ` Steven Cole
2002-08-06 14:20 ` Rik van Riel
2002-08-06 17:12 ` Andrew Morton
0 siblings, 2 replies; 56+ messages in thread
From: Steven Cole @ 2002-08-06 14:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml
On Mon, 2002-08-05 at 22:30, Andrew Morton wrote:
[snipped]
>
> IO in 2.5 is much more CPU efficient that in 2.4, and straight-line
> bandwidth is better as well.
>
> The scheduling of that IO has a few problems, so in wildly seeky loads
> like dbench the kernel still falls over its own feet a bit. The
> two main culprits here are the lock_buffer() in block_write_full_page()
> against the blockdev mapping, and the writeback of dirty pages from the
> tail of the LRU in page reclaim.
>
> And no, the eventual dbench numbers will not be a measure of the success
> of the tuning which will happen on the run in to 2.6. Dbench throughput
> may well be lower, because we probably should be starting writeback
> at lower dirty thresholds.
>
> If you want good dbench numbers:
>
> echo 70 > /proc/sys/vm/dirty_background_ratio
> echo 75 > /proc/sys/vm/dirty_async_ratio
> echo 80 > /proc/sys/vm/dirty_sync_ratio
> echo 30000 > /proc/sys/vm/dirty_expire_centisecs
That last one looks like the biggest cheat. Rather than optimizing for
dbench, is there a set of pessimizing numbers which would optimally turn
dbench into a semi-useful tool for measuring meaningful IO performance?
Or is dbench really only useful for stress testing?
Thanks for the explanations.
Steven
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 14:07 ` Steven Cole
@ 2002-08-06 14:20 ` Rik van Riel
2002-08-06 17:12 ` Andrew Morton
1 sibling, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2002-08-06 14:20 UTC (permalink / raw)
To: Steven Cole
Cc: Andrew Morton, Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml
On 6 Aug 2002, Steven Cole wrote:
> That last one looks like the biggest cheat. Rather than optimizing for
> dbench, is there a set of pessimizing numbers which would optimally turn
> dbench into a semi-useful tool for measuring meaningful IO performance?
> Or is dbench really only useful for stress testing?
Yes, dbench is only useful as a stress testing tool.
A minor varation in kernel behaviour can change dbench
throughput by an order of magnitude and I'm not talking
about any specific kernel component here ... ANY kernel
component could trigger it.
While it is easy to measure dbench throughput, it is
nearly impossible to:
1) analyse why dbench throughput changed from kernel to kernel
2) predict the relation (if any) these changes in dbench
throughput have with changes in performance of real
applications, if any
3) identify which kernel subsystem was responsible for the
change in dbench performance
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 14:07 ` Steven Cole
2002-08-06 14:20 ` Rik van Riel
@ 2002-08-06 17:12 ` Andrew Morton
1 sibling, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-06 17:12 UTC (permalink / raw)
To: Steven Cole; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml
Steven Cole wrote:
>
> ...
> > If you want good dbench numbers:
> >
> > echo 70 > /proc/sys/vm/dirty_background_ratio
> > echo 75 > /proc/sys/vm/dirty_async_ratio
> > echo 80 > /proc/sys/vm/dirty_sync_ratio
> > echo 30000 > /proc/sys/vm/dirty_expire_centisecs
>
> That last one looks like the biggest cheat. Rather than optimizing for
> dbench, is there a set of pessimizing numbers which would optimally turn
> dbench into a semi-useful tool for measuring meaningful IO performance?
> Or is dbench really only useful for stress testing?
>
We tend to use dbench in two modes nowadays. One is the "RAM only"
mode, where the run completes before hitting disk at all. That's
a very useful and repeatable test for CPU efficiency and lock contention.
The other mode is of course when there are enough clients and
enough dirty data for the test to go to disk. As Rik says, this
tends to be subject to chaotic effects, and it is also extremely
non linear.
Because when the run slows down a little bit, it takes longer, so
more data becomes eligible for time-expiry-based writeback, which
causes more IO, which causes the run to take longer, etc, etc.
Yes, one does tend still to keep one's eye on the "heavy" dbench
throughput, but I suspect that tuning for this workload is a bad
thing overall. This is because good dbench numbers come from
allowing a large amount of dirty data to float about in memory
(it will never get written out). But for real workloads which
don't delete their own output 30 seconds later, we want to start
writeback earlier. To use the disk bandwidth more smoothly
and to decrease memory allocation latency.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-06 12:59 ` Rik van Riel
@ 2002-08-07 1:09 ` Bill Davidsen
2002-08-07 2:54 ` Steven Cole
0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07 1:09 UTC (permalink / raw)
To: Rik van Riel
Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Tue, 6 Aug 2002, Rik van Riel wrote:
> On Mon, 5 Aug 2002, Bill Davidsen wrote:
>
> > > Here are some dbench numbers, from the "for what it's worth" department.
> >
> > Call me an optimist, but after all the reliability problems we had win the
> > 2.5 series, I sort of hoped it would be better in performance, not
> > increasingly worse. Am I misreading this? Can we fall back to the faster
> > 2.4 code :-(
>
> Dbench is at its best when half (or more) of the dbench processes
> are stuck semi-infinitely in __get_request_wait and the others can
> operate in RAM without ever touching the disk.
>
> In effect, if you want the best dbench throughput you should make
> the system completely unsuitable for real world applications ;)
I assumed that the posted results were apples and apples. That may not be
the case. If this was one kernel tuned for dbench and one for something
else, then the information content is pretty low, to me at least. But if
it is both tuned or both stock, then I would hope 2.5 would be better. If
the text said that and I read past it, I apologise.
> There are a few things that are good for both real world performance
> and dbench performance, but those are easily dwarved by random factors
> like IO scheduling, timeslice length, etc...
I confess to being a kernel junkie when I have the time, I have run into
the limitation of 19 boot stanzas in LILO :-( I have a case statement in
rc.local to tune -aa VM, stock, and -ac rmap a little differently, since
this machine is fairly fast and has bigish memory (2GB this week) and
getting several ISO images in RAM and then having bdflush kick them out is
bad. Looking forward to the io scheduler.
I like to see 2.4.19 vs. 2.5.{29+} both tuned and untuned, but I have no
days off in the next ten. By then there will be more new stuff, but the
fast machine will be several area codes away, perhaps one of the people
who like to do benchmarks might be too curious to wait.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 1:09 ` Bill Davidsen
@ 2002-08-07 2:54 ` Steven Cole
2002-08-07 22:30 ` Bill Davidsen
0 siblings, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-07 2:54 UTC (permalink / raw)
To: Bill Davidsen
Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote:
> On Tue, 6 Aug 2002, Rik van Riel wrote:
>
> > On Mon, 5 Aug 2002, Bill Davidsen wrote:
> >
> > > > Here are some dbench numbers, from the "for what it's worth" department.
> > >
> > > Call me an optimist, but after all the reliability problems we had win the
> > > 2.5 series, I sort of hoped it would be better in performance, not
> > > increasingly worse. Am I misreading this? Can we fall back to the faster
> > > 2.4 code :-(
> >
> > Dbench is at its best when half (or more) of the dbench processes
> > are stuck semi-infinitely in __get_request_wait and the others can
> > operate in RAM without ever touching the disk.
> >
> > In effect, if you want the best dbench throughput you should make
> > the system completely unsuitable for real world applications ;)
>
> I assumed that the posted results were apples and apples. That may not be
Well, maybe Granny Smiths and Red Delicious. The problem with dbench is
that it checks how well they roll and bounce. But even that can be
important sometimes. ;)
> the case. If this was one kernel tuned for dbench and one for something
> else, then the information content is pretty low, to me at least. But if
> it is both tuned or both stock, then I would hope 2.5 would be better. If
> the text said that and I read past it, I apologise.
All kernels were stock as patched with no special changes to
/proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x.
Sorry, I didn't explicitly state that in the initial report.
Steven
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 2:54 ` Steven Cole
@ 2002-08-07 22:30 ` Bill Davidsen
2002-08-07 22:39 ` Rik van Riel
0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07 22:30 UTC (permalink / raw)
To: Steven Cole
Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On 6 Aug 2002, Steven Cole wrote:
> On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote:
> > I assumed that the posted results were apples and apples. That may not be
>
> Well, maybe Granny Smiths and Red Delicious. The problem with dbench is
> that it checks how well they roll and bounce. But even that can be
> important sometimes. ;)
>
> > the case. If this was one kernel tuned for dbench and one for something
> > else, then the information content is pretty low, to me at least. But if
> > it is both tuned or both stock, then I would hope 2.5 would be better. If
> > the text said that and I read past it, I apologise.
>
> All kernels were stock as patched with no special changes to
> /proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x.
> Sorry, I didn't explicitly state that in the initial report.
Actually that was what I was assuming when I noted that the 2.5 appeared
to be slower by a good bit for some high load values of dbench. In a
perfect world the kernel would hit the hardware spped, guess no one is
claiming that until 2.7 ;-)
The initial results from the io scheduler, as posted here, look as if
there will be a way to "take it up another notch" in the future.
Thanks much for the clarification, the data are useful even if they do
show room for improvement in the corner case.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 22:30 ` Bill Davidsen
@ 2002-08-07 22:39 ` Rik van Riel
2002-08-07 23:44 ` Bill Davidsen
0 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-07 22:39 UTC (permalink / raw)
To: Bill Davidsen
Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Wed, 7 Aug 2002, Bill Davidsen wrote:
> Thanks much for the clarification, the data are useful even if they do
> show room for improvement in the corner case.
If dbench numbers are meaningful to you, maybe you could
translate them into something kernel developers can
understand ? ;)
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 22:39 ` Rik van Riel
@ 2002-08-07 23:44 ` Bill Davidsen
2002-08-07 23:53 ` Rik van Riel
0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07 23:44 UTC (permalink / raw)
To: Rik van Riel
Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Wed, 7 Aug 2002, Rik van Riel wrote:
> On Wed, 7 Aug 2002, Bill Davidsen wrote:
>
> > Thanks much for the clarification, the data are useful even if they do
> > show room for improvement in the corner case.
>
> If dbench numbers are meaningful to you, maybe you could
> translate them into something kernel developers can
> understand ? ;)
Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing isn't
working as well, another problem, go have a beer to drown your sorrow. On
the other hand if it runs much better, you have done a great job and can
go have a beer to celebrate.
Seriously, I would read the reasonably smooth curve of values as good
sign, as opposed to "gets real badd and improves under more load" or
similar. And the fact that it stayed up, and presumably didn't eat all the
filesystems indicates that the system is getting more stable IDE.
One more thing, if you have been fighting bad machines for 15 hours and no
one is looking, it's time to go get a beer. And cashews, and cheddar. I am
out of here (as in where I am working right now, not my office).
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 23:44 ` Bill Davidsen
@ 2002-08-07 23:53 ` Rik van Riel
2002-08-09 17:46 ` Bill Davidsen
0 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-07 23:53 UTC (permalink / raw)
To: Bill Davidsen
Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
Steven Cole
On Wed, 7 Aug 2002, Bill Davidsen wrote:
> Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> isn't working as well,
Are you volunteering to identify that "something" for us ?
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-07 23:53 ` Rik van Riel
@ 2002-08-09 17:46 ` Bill Davidsen
2002-08-09 19:27 ` Rik van Riel
0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-09 17:46 UTC (permalink / raw)
To: Rik van Riel; +Cc: lkml
On Wed, 7 Aug 2002, Rik van Riel wrote:
> On Wed, 7 Aug 2002, Bill Davidsen wrote:
>
> > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> > isn't working as well,
>
> Are you volunteering to identify that "something" for us ?
Hell no. I was simply commenting that there is some general qualitative
information available from those numbers, even if it is hard to quantify
them. Not working as well for a benchmark may indicate much better typical
performance, and as I understand dbench the io scheduler may improve that
significantly as well.
No, clearly there are other, probably a lot more representative numbers,
which show 2.5 is better. "Isn't working as well" for one thing doesn't
mean "in general," but might be of interest to the primary developers.
The fact that the curve doesn't end in a reload from backup tells me that
the IDE code is much better that it was ;-)
What time I have for diddling kernel code is spent on making network code
changes, and is all against 2.4 base.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Linux v2.4.19-rc5
2002-08-09 17:46 ` Bill Davidsen
@ 2002-08-09 19:27 ` Rik van Riel
0 siblings, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2002-08-09 19:27 UTC (permalink / raw)
To: Bill Davidsen; +Cc: lkml
On Fri, 9 Aug 2002, Bill Davidsen wrote:
> On Wed, 7 Aug 2002, Rik van Riel wrote:
> > On Wed, 7 Aug 2002, Bill Davidsen wrote:
> >
> > > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> > > isn't working as well,
> >
> > Are you volunteering to identify that "something" for us ?
>
> Hell no. I was simply commenting that there is some general qualitative
> information available from those numbers, even if it is hard to quantify
> them.
As long as there is nobody to interpret what the dbench
numbers actually mean, why are we treating them as the
most important thing around ? ;)
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2002-08-09 19:24 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-01 6:38 Linux v2.4.19-rc5 Marcelo Tosatti
2002-08-01 7:49 ` Jens Axboe
2002-08-01 7:14 ` Marcelo Tosatti
2002-08-01 8:10 ` Jens Axboe
2002-08-01 9:02 ` Andrew Morton
2002-08-01 8:58 ` Jens Axboe
2002-08-01 14:45 ` Steven Cole
2002-08-01 18:57 ` Andrew Morton
2002-08-01 20:15 ` Steven Cole
2002-08-06 3:46 ` Bill Davidsen
2002-08-06 4:30 ` Andrew Morton
2002-08-06 14:07 ` Steven Cole
2002-08-06 14:20 ` Rik van Riel
2002-08-06 17:12 ` Andrew Morton
2002-08-06 5:42 ` Jens Axboe
2002-08-06 8:30 ` Adrian Bunk
2002-08-06 8:48 ` Jens Axboe
2002-08-06 10:31 ` Lincoln Dale
2002-08-06 12:59 ` Rik van Riel
2002-08-07 1:09 ` Bill Davidsen
2002-08-07 2:54 ` Steven Cole
2002-08-07 22:30 ` Bill Davidsen
2002-08-07 22:39 ` Rik van Riel
2002-08-07 23:44 ` Bill Davidsen
2002-08-07 23:53 ` Rik van Riel
2002-08-09 17:46 ` Bill Davidsen
2002-08-09 19:27 ` Rik van Riel
2002-08-01 7:55 ` Keith Owens
2002-08-01 8:10 ` Jens Axboe
2002-08-04 6:50 ` H. Peter Anvin
2002-08-01 11:32 ` Willy TARREAU
2002-08-01 13:54 ` Alan Cox
2002-08-01 12:48 ` Willy TARREAU
2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
2002-08-01 13:32 ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
2002-08-01 14:55 ` Alan Cox
2002-08-01 13:56 ` Willy Tarreau
2002-08-01 15:24 ` Willy Tarreau
2002-08-01 16:53 ` Alan Cox
2002-08-01 16:41 ` Willy Tarreau
2002-08-01 20:35 ` [PATCH] solved APM bug with -rc5 Willy TARREAU
2002-08-01 20:52 ` Richard Gooch
2002-08-01 20:54 ` Richard Gooch
2002-08-01 21:17 ` Willy TARREAU
2002-08-01 22:37 ` Alan Cox
2002-08-01 20:58 ` Dave Jones
2002-08-01 22:16 ` Alan Cox
2002-08-01 21:07 ` Willy Tarreau
2002-08-01 21:47 ` Linus Torvalds
2002-08-02 0:12 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
2002-08-02 1:47 ` [PATCH] pdc20265 problem Nick Orlov
2002-08-02 2:29 ` Nick Orlov
2002-08-02 12:27 ` Alan Cox
2002-08-02 12:52 ` Nick Orlov
2002-08-02 14:00 ` Bartlomiej Zolnierkiewicz
2002-08-02 14:45 ` Nick Orlov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).