linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux v2.4.19-rc5
@ 2002-08-01  6:38 Marcelo Tosatti
  2002-08-01  7:49 ` Jens Axboe
                   ` (4 more replies)
  0 siblings, 5 replies; 56+ messages in thread
From: Marcelo Tosatti @ 2002-08-01  6:38 UTC (permalink / raw)
  To: lkml


One of the -rc4 fixes was not correct and -rc4 missed an important SMP
race "fix" on the block layer.


Summary of changes from v2.4.19-rc4 to v2.4.19-rc5
============================================

<davem@redhat.com> (02/08/01 1.662)
	[PATCH] Correct openprom fix

	   <davem@redhat.com> (02/07/31 1.661)
	   	[PATCH] Add missing check to openprom driver

<akpm@zip.com.au> (02/08/01 1.663)
	[PATCH] disable READA

<marcelo@plucky.distro.conectiva> (02/08/01 1.664)
	Change EXTRAVERSION to -rc5


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  7:49 ` Jens Axboe
@ 2002-08-01  7:14   ` Marcelo Tosatti
  2002-08-01  8:10     ` Jens Axboe
  2002-08-01 20:15     ` Steven Cole
  0 siblings, 2 replies; 56+ messages in thread
From: Marcelo Tosatti @ 2002-08-01  7:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: lkml, Andrew Morton


On Thu, 1 Aug 2002, Jens Axboe wrote:

> On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > <akpm@zip.com.au> (02/08/01 1.663)
> > 	[PATCH] disable READA
>
> Since -rc5 is not to be found yet, I don't know what version of this
> made it in. Is READA just being disabled on SMP, or was it the general
> #if 0 change that got included?

Its being disabled on UP and SMP. I dont like having such readahead IO
mode working only for UP.

> I'm asking since plain disabling READA might have nasty performance
> effects. Andrew, I bet you did some numbers on this, care to share?

If thats true (the performance effects) I'll release -final with IMO not
very coherent READA semantics :)

Anyway, lets wait for the numbers.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
@ 2002-08-01  7:49 ` Jens Axboe
  2002-08-01  7:14   ` Marcelo Tosatti
  2002-08-01  7:55 ` Keith Owens
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2002-08-01  7:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml, Andrew Morton

On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> <akpm@zip.com.au> (02/08/01 1.663)
> 	[PATCH] disable READA

Since -rc5 is not to be found yet, I don't know what version of this
made it in. Is READA just being disabled on SMP, or was it the general
#if 0 change that got included? I'm asking since plain disabling READA
might have nasty performance effects. Andrew, I bet you did some numbers
on this, care to share?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
  2002-08-01  7:49 ` Jens Axboe
@ 2002-08-01  7:55 ` Keith Owens
  2002-08-01  8:10   ` Jens Axboe
  2002-08-04  6:50   ` H. Peter Anvin
  2002-08-01 11:32 ` Willy TARREAU
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 56+ messages in thread
From: Keith Owens @ 2002-08-01  7:55 UTC (permalink / raw)
  To: Marcelo Tosatti, ftpadmin; +Cc: lkml

patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
the signature have not been created yet.  Is there a problem with the
automatic conversion and signing code on master?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  7:14   ` Marcelo Tosatti
@ 2002-08-01  8:10     ` Jens Axboe
  2002-08-01  9:02       ` Andrew Morton
  2002-08-01 20:15     ` Steven Cole
  1 sibling, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2002-08-01  8:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml, Andrew Morton

On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> 
> On Thu, 1 Aug 2002, Jens Axboe wrote:
> 
> > On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > > <akpm@zip.com.au> (02/08/01 1.663)
> > > 	[PATCH] disable READA
> >
> > Since -rc5 is not to be found yet, I don't know what version of this
> > made it in. Is READA just being disabled on SMP, or was it the general
> > #if 0 change that got included?
> 
> Its being disabled on UP and SMP. I dont like having such readahead IO
> mode working only for UP.

You are right, that would be ugly. Should only be the last resort.

> > I'm asking since plain disabling READA might have nasty performance
> > effects. Andrew, I bet you did some numbers on this, care to share?
> 
> If thats true (the performance effects) I'll release -final with IMO not
> very coherent READA semantics :)
> 
> Anyway, lets wait for the numbers.

It just 'feels' like the sort of change that might have odd side
effects.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  7:55 ` Keith Owens
@ 2002-08-01  8:10   ` Jens Axboe
  2002-08-04  6:50   ` H. Peter Anvin
  1 sibling, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-01  8:10 UTC (permalink / raw)
  To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml

On Thu, Aug 01 2002, Keith Owens wrote:
> patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
> the signature have not been created yet.  Is there a problem with the
> automatic conversion and signing code on master?

that is slow, hwoever it's there now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  9:02       ` Andrew Morton
@ 2002-08-01  8:58         ` Jens Axboe
  2002-08-01 14:45         ` Steven Cole
  1 sibling, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-01  8:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, lkml

On Thu, Aug 01 2002, Andrew Morton wrote:
> Jens Axboe wrote:
> > 
> > ...
> > > Anyway, lets wait for the numbers.
> > 
> > It just 'feels' like the sort of change that might have odd side
> > effects.
> 
> It's almost impossible to get READA to do anything.  For example, in
> current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
> reports a "buffer I/O error".  Every time. And nobody has reported this.

Ahem, I've actually seen that happen :-). But maybe a total of 20 times
or so.

> It _is_ possible to hit this in 2.5, because of ext2_preread_inode().
> 
> Probably, also it's possible to hit it in 2.4 with hundreds of processes
> all issuing ext3 directory readahead.  But it's pretty remote.

Alright, I'm happy then.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  8:10     ` Jens Axboe
@ 2002-08-01  9:02       ` Andrew Morton
  2002-08-01  8:58         ` Jens Axboe
  2002-08-01 14:45         ` Steven Cole
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-01  9:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Marcelo Tosatti, lkml

Jens Axboe wrote:
> 
> ...
> > Anyway, lets wait for the numbers.
> 
> It just 'feels' like the sort of change that might have odd side
> effects.

It's almost impossible to get READA to do anything.  For example, in
current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
reports a "buffer I/O error".  Every time. And nobody has reported this.

It _is_ possible to hit this in 2.5, because of ext2_preread_inode().

Probably, also it's possible to hit it in 2.4 with hundreds of processes
all issuing ext3 directory readahead.  But it's pretty remote.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
  2002-08-01  7:49 ` Jens Axboe
  2002-08-01  7:55 ` Keith Owens
@ 2002-08-01 11:32 ` Willy TARREAU
  2002-08-01 13:54   ` Alan Cox
  2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
  2002-08-02  1:47 ` [PATCH] pdc20265 problem Nick Orlov
  4 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 11:32 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

Hi Marcello,

This is just a cleanup for the network devices configuration.
Basically, the TOSHIBA TC35815 configuration entry appears
just between DECchip Tulip, and the 2 Tulip-specific config lines
which are indented so we could think that they are related to
the TC35815 instead of the Tulip.

You only see them when Tulip is enabled though.

Here is the obvious fix against -rc5 which avoids this confusion :

Cheers,
Willy

--- linux-2.4.19-rc5/drivers/net/Config.in.orig	Thu Aug  1 13:26:58 2002
+++ linux-2.4.19-rc5/drivers/net/Config.in	Thu Aug  1 13:27:14 2002
@@ -162,8 +162,8 @@
 
       dep_tristate '    Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA
       dep_tristate '    CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA
-      dep_tristate '    DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
       dep_tristate '    TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
+      dep_tristate '    DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
       if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then
          dep_bool '      New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL
          bool '      Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5 - APM bug
  2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2002-08-01 11:32 ` Willy TARREAU
@ 2002-08-01 12:12 ` Willy TARREAU
  2002-08-01 13:32   ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
  2002-08-02  1:47 ` [PATCH] pdc20265 problem Nick Orlov
  4 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 12:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

Marcelo,

I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
the problem.

This is rather strange, since the crash occurs in do_softirq, but 2 bytes after
the beginning of an instruction :
c0120d09 fa			cli
c0120d0a 8b b5 80 17 3c c0	mov 0xc03c1780(%ebp),%esi

The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer
somewhere.

Regards,
Willy


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01 13:54   ` Alan Cox
@ 2002-08-01 12:48     ` Willy TARREAU
  0 siblings, 0 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 12:48 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 02:54:04PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote:
> > Hi Marcello,
> > 
> > This is just a cleanup for the network devices configuration.
> > Basically, the TOSHIBA TC35815 configuration entry appears
> > just between DECchip Tulip, and the 2 Tulip-specific config lines
> > which are indented so we could think that they are related to
> > the TC35815 instead of the Tulip.
> 
> This is true, but the fix wants tweaking - the file is supposed to bein
> basically Alphabetical order. Can you move the toshiba one down instead
> ?

OK, in this case it goes just before VIA rhine. (BTW, [P]CI NE2000 is before
[N]ovell, but I assume we're talking about [N]E2000).

Marcelo, please ignore my previous patch in favor of this one.

Cheers,
Willy

--- linux-2.4.19-rc5/drivers/net/Config.in.orig	Thu Aug  1 14:43:09 2002
+++ linux-2.4.19-rc5/drivers/net/Config.in	Thu Aug  1 14:44:29 2002
@@ -163,7 +163,6 @@
       dep_tristate '    Apricot Xen-II on board Ethernet' CONFIG_APRICOT $CONFIG_ISA
       dep_tristate '    CS89x0 support' CONFIG_CS89x0 $CONFIG_ISA
       dep_tristate '    DECchip Tulip (dc21x4x) PCI support' CONFIG_TULIP $CONFIG_PCI
-      dep_tristate '    TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
       if [ "$CONFIG_TULIP" = "y" -o "$CONFIG_TULIP" = "m" ]; then
          dep_bool '      New bus configuration (EXPERIMENTAL)' CONFIG_TULIP_MWI $CONFIG_EXPERIMENTAL
          bool '      Use PCI shared mem for NIC registers' CONFIG_TULIP_MMIO
@@ -195,6 +194,7 @@
       if [ "$CONFIG_PCI" = "y" -o "$CONFIG_EISA" = "y" ]; then
          tristate '    TI ThunderLAN support' CONFIG_TLAN
       fi
+      dep_tristate '    TOSHIBA TC35815 Ethernet support' CONFIG_TC35815 $CONFIG_PCI
       dep_tristate '    VIA Rhine support' CONFIG_VIA_RHINE $CONFIG_PCI
       dep_mbool '      Use MMIO instead of PIO (EXPERIMENTAL)' CONFIG_VIA_RHINE_MMIO $CONFIG_VIA_RHINE $CONFIG_EXPERIMENTAL
       dep_tristate '    Winbond W89c840 Ethernet support' CONFIG_WINBOND_840 $CONFIG_PCI


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
@ 2002-08-01 13:32   ` Willy TARREAU
  2002-08-01 14:55     ` Alan Cox
  0 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 13:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: Marcelo Tosatti, lkml

Marcelo,

I've narrowed down the APM problem encountered in -rc5. In fact, it also
affects -rc4, but not -rc3. I'm a bit stumped since the changes are not
too heavy...

The crash happens at the same place with -rc4 and -rc5 : 0xc0120d0c

c0120cec:       fb                      sti
c0120ced:       bb 40 a2 39 c0          mov    $0xc039a240,%ebx
c0120cf2:       f7 c6 01 00 00 00       test   $0x1,%esi
c0120cf8:       74 08                   je     c0120d02 <do_softirq+0x72>
c0120cfa:       53                      push   %ebx
c0120cfb:       8b 03                   mov    (%ebx),%eax
c0120cfd:       ff d0                   call   *%eax
c0120cff:       83 c4 04                add    $0x4,%esp
c0120d02:       83 c3 08                add    $0x8,%ebx
c0120d05:       d1 ee                   shr    %esi
c0120d07:       75 e9                   jne    c0120cf2 <do_softirq+0x62>
c0120d09:       fa                      cli
c0120d0a:       8b b5 80 17 3c c0       mov    0xc03c1780(%ebp),%esi
                      ^^ ^^ ^^ ^^
The processor branches here (2 bytes after  local_irq_disable()) !!

c0120d10:       85 fe                   test   %edi,%esi
c0120d12:       74 0c                   je     c0120d20 <do_softirq+0x90>
c0120d14:       89 f0                   mov    %esi,%eax
c0120d16:       f7 d0                   not    %eax
c0120d18:       21 c7                   and    %eax,%edi
c0120d1a:       eb c6                   jmp    c0120ce2 <do_softirq+0x52>

This code is from do_softirq() in kernel/softirq.c, lines 84-95 :

                local_irq_enable();

                h = softirq_vec;

                do {
                        if (pending & 1)
                                h->action(h);
                        h++;
                        pending >>= 1;
                } while (pending);

                local_irq_disable();

The hand-written traces show that this function was correctly called by
ksoftirqd(), which in turn was called by kernel_thread().

Part of the hand-written oops shows :
EFLAGS=00010057
eax=00000900 ebx=c039a260 ecx=00000000 edx=c0390000
esi=00000000 edi=fffffff7 ebp=00000000 esp=c15b1fc8

Since softirq_vec is c039a240 in my System.map, I can deduce that h->action(h)
has been called 4 times because it's 8 bytes long. <pending> is represented
by %esi here, which is null. So this implies that it's not the call to h->action(h)
which branched to this place. But int this case, I don't see how the CPU
can branch here (a ret prehaps ?). I don't see in what this can be related to
the "apm=power-off" case either.


Alan, I believe you have the same mobo, but with two MPs on it. Although I've
never had any SMP problem with XPs, did you notice anything strange with APM
on 2.4.19-rc[45] ? I will check 2.4.19-rc3-ac5 to see if it hangs too...

Cheers,
Willy

> Marcelo,
> 
> I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
> This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
> 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
> the problem.
> 
> This is rather strange, since the crash occurs in do_softirq, but 2 bytes after
> the beginning of an instruction :
> c0120d09 fa			cli
> c0120d0a 8b b5 80 17 3c c0	mov 0xc03c1780(%ebp),%esi
> 
> The crash occurs at c0120d0c (80 17 3c c0 ...). Seems like a bad pointer
> somewhere.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01 11:32 ` Willy TARREAU
@ 2002-08-01 13:54   ` Alan Cox
  2002-08-01 12:48     ` Willy TARREAU
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-01 13:54 UTC (permalink / raw)
  To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml

On Thu, 2002-08-01 at 12:32, Willy TARREAU wrote:
> Hi Marcello,
> 
> This is just a cleanup for the network devices configuration.
> Basically, the TOSHIBA TC35815 configuration entry appears
> just between DECchip Tulip, and the 2 Tulip-specific config lines
> which are indented so we could think that they are related to
> the TC35815 instead of the Tulip.

This is true, but the fix wants tweaking - the file is supposed to bein
basically Alphabetical order. Can you move the toshiba one down instead
?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 14:55     ` Alan Cox
@ 2002-08-01 13:56       ` Willy Tarreau
  2002-08-01 15:24         ` Willy Tarreau
  0 siblings, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 13:56 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 03:55:32PM +0100, Alan Cox wrote:
> I've only run -ac on the box (I need the IDE) and that has subtly
> different APM code. I do not however understand why it has changed
> behaviour. I could understand if it did it at the actual poweroff point
> but not earlier

Ok, thanks. I'll try to revert some patches from -rc4. But it looks
more like a side effect IMHO. Perhaps the APM initialization code
triggers one of the numerous bugs in the bios :-/

If I enable APM in the bios, the crash is somewhat different. I get
about two pages of call traces looping back every 8 pointers.

Seems like a memory corruption to me...

2.4.19-rc3-ac5 is OK, BTW.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  9:02       ` Andrew Morton
  2002-08-01  8:58         ` Jens Axboe
@ 2002-08-01 14:45         ` Steven Cole
  2002-08-01 18:57           ` Andrew Morton
  1 sibling, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-01 14:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole

On Thu, 2002-08-01 at 03:02, Andrew Morton wrote:
> Jens Axboe wrote:
> > 
> > ...
> > > Anyway, lets wait for the numbers.
> > 
> > It just 'feels' like the sort of change that might have odd side
> > effects.
> 
> It's almost impossible to get READA to do anything.  For example, in
> current 2.5, if a READA attempt is actually aborted, end_buffer_io_sync
> reports a "buffer I/O error".  Every time. And nobody has reported this.
> 
> It _is_ possible to hit this in 2.5, because of ext2_preread_inode().
> 
> Probably, also it's possible to hit it in 2.4 with hundreds of processes
> all issuing ext3 directory readahead.  But it's pretty remote.

I've never seen this on 2.4.19-rc3 and I've been beating on it pretty
hard, running dbench 128 many times.  However, 2.5 is another story.

This might not be the best thread to report this, but since the subject 
came up, I'm getting the following message with recent 2.5.x kernels
whenever I run relatively large numbers of dbench clients.  

Buffer I/O error on device sd(8,8), logical block XXXXXXX

where logical block repeats 0-6 times.  This behavior is repeatable, but
only occurs under fairly high load.  I ran dbench with increasing numbers
of clients, with the following results:

dbench clients	Buffer I/O error messages
>=48		0
52		1
56		0
64		0
80		11
96		9
112		7
128		4

This particular run was with 2.5.29 with rmap13b and slabLRU patches, but the behavior with 2.5.29-vanilla was similar.  Kernel is SMP, no preempt,
and /dev/sda8 where dbench was running was mounted ext2.
The test box is 2-way p3, SCSI, 1GB memory.

Time to go beat on -rc5 and see if anything falls out.

Steven





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 13:32   ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
@ 2002-08-01 14:55     ` Alan Cox
  2002-08-01 13:56       ` Willy Tarreau
  0 siblings, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-01 14:55 UTC (permalink / raw)
  To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml

On Thu, 2002-08-01 at 14:32, Willy TARREAU wrote:
> > I observe a kernel panic at boot time if I set apm=power-off. OK with apm=off.
> > This is on an ASUS A7M266D with two Athlon XP 1800+. Since it works well on
> > 2.4.19-pre10, I'm recompiling intermediate versions to check which one brought
> > the problem.
> > 
> > This is rather strange, since the crash occurs in do_softirq, but 2 bytes after

I've only run -ac on the box (I need the IDE) and that has subtly
different APM code. I do not however understand why it has changed
behaviour. I could understand if it did it at the actual poweroff point
but not earlier


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 13:56       ` Willy Tarreau
@ 2002-08-01 15:24         ` Willy Tarreau
  2002-08-01 16:53           ` Alan Cox
  0 siblings, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 15:24 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Alan Cox, Marcelo Tosatti, lkml

> Ok, thanks. I'll try to revert some patches from -rc4. But it looks
> more like a side effect IMHO. Perhaps the APM initialization code
> triggers one of the numerous bugs in the bios :-/

It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c
to the state of -rc3. Reverting clear_AC doesn't change anything, but the
rest of the patch does. I don't know why, it seems correct at first glance.
Perhaps old code hides a bug in the bios... Well, i don't know, I'm not
enough aware of apm or vm86 internals to understand what's happening.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 16:53           ` Alan Cox
@ 2002-08-01 16:41             ` Willy Tarreau
  2002-08-01 20:35             ` [PATCH] solved APM bug with -rc5 Willy TARREAU
  1 sibling, 0 replies; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 16:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote:
> Very curious indeed because someone else reported that rc3-ac5 works
> (which has the same vm86 code). In addition the vm86 handler in the
> kernel isnt actually used for APM. We make 32bit APM calls and the one
> 16bit case we do is a true return to real mode.

well, I saw it wrong. In fact, sometimes the system boots OK if it
is after a warm boot, and it seems that all the tests I've done with
"old" vm86 code were done from a warm boot. Now I can confirm that
from a cold boot, it also panics. And you're right about rc3-ac5,
since it also works for me.

Still searching...
Willy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PANIC] APM bug with -rc4 and -rc5
  2002-08-01 15:24         ` Willy Tarreau
@ 2002-08-01 16:53           ` Alan Cox
  2002-08-01 16:41             ` Willy Tarreau
  2002-08-01 20:35             ` [PATCH] solved APM bug with -rc5 Willy TARREAU
  0 siblings, 2 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 16:53 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Marcelo Tosatti, lkml

On Thu, 2002-08-01 at 16:24, Willy Tarreau wrote:
> > Ok, thanks. I'll try to revert some patches from -rc4. But it looks
> > more like a side effect IMHO. Perhaps the APM initialization code
> > triggers one of the numerous bugs in the bios :-/
> 
> It seems that I cannot reproduce it anymore if I revert arch/i386/kernel/vm86.c
> to the state of -rc3. Reverting clear_AC doesn't change anything, but the
> rest of the patch does. I don't know why, it seems correct at first glance.
> Perhaps old code hides a bug in the bios... Well, i don't know, I'm not
> enough aware of apm or vm86 internals to understand what's happening.

Very curious indeed because someone else reported that rc3-ac5 works
(which has the same vm86 code). In addition the vm86 handler in the
kernel isnt actually used for APM. We make 32bit APM calls and the one
16bit case we do is a true return to real mode.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01 14:45         ` Steven Cole
@ 2002-08-01 18:57           ` Andrew Morton
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-01 18:57 UTC (permalink / raw)
  To: Steven Cole; +Cc: Jens Axboe, Marcelo Tosatti, lkml, Steven Cole

Steven Cole wrote:
> 
> ...
> I've never seen this on 2.4.19-rc3 and I've been beating on it pretty
> hard, running dbench 128 many times.  However, 2.5 is another story.
> 
> This might not be the best thread to report this, but since the subject
> came up, I'm getting the following message with recent 2.5.x kernels
> whenever I run relatively large numbers of dbench clients.
> 
> Buffer I/O error on device sd(8,8), logical block XXXXXXX
> 
> where logical block repeats 0-6 times.  This behavior is repeatable, but
> only occurs under fairly high load.  I ran dbench with increasing numbers
> of clients, with the following results:
> 
> dbench clients  Buffer I/O error messages
> >=48            0
> 52              1
> 56              0
> 64              0
> 80              11
> 96              9
> 112             7
> 128             4

Yup.  The printk is bogus - I thought I'd removed it a couple of
kernels ago.

It's a bit sad that an abandoned readahead attempt is indistinguishable
from a dead disk.

-

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  7:14   ` Marcelo Tosatti
  2002-08-01  8:10     ` Jens Axboe
@ 2002-08-01 20:15     ` Steven Cole
  2002-08-06  3:46       ` Bill Davidsen
  1 sibling, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-01 20:15 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jens Axboe, lkml, Andrew Morton, Steven Cole

On Thu, 2002-08-01 at 01:14, Marcelo Tosatti wrote:
> 
> On Thu, 1 Aug 2002, Jens Axboe wrote:
> 
> > On Thu, Aug 01 2002, Marcelo Tosatti wrote:
> > > <akpm@zip.com.au> (02/08/01 1.663)
> > > 	[PATCH] disable READA
> >
> > Since -rc5 is not to be found yet, I don't know what version of this
> > made it in. Is READA just being disabled on SMP, or was it the general
> > #if 0 change that got included?
> 
> Its being disabled on UP and SMP. I dont like having such readahead IO
> mode working only for UP.
> 
> > I'm asking since plain disabling READA might have nasty performance
> > effects. Andrew, I bet you did some numbers on this, care to share?
> 
> If thats true (the performance effects) I'll release -final with IMO not
> very coherent READA semantics :)
> 
> Anyway, lets wait for the numbers.

Marcelo,

Here are some dbench numbers, from the "for what it's worth" department.
This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
The first column is dbench clients.  The numbers are throughput
in MB/sec.  The 2.5.29 kernel had a few RR-supplied smp fixes.
Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
I've also ran this set of tests several times on -rc5 using ext3
and data=writeback, and everything looks fine.

Steven

		2.4.19-rc2	2.4.19-rc5	2.5.29

1		114.616		113.402		112.668
2		173.234		183.829		175.148
3		185.995		187.411		184.63
4		185.447		186.891		188.199
6		191.115		191.439		191.787
8		191.962		191.551		191.53
10		192.984		194.036		194.923
12		183.847		185.73		195.328
16		183.609		183.439		196.224
20		181.519		179.956		193.681
24		183.509		183.387		194.09
28		176.04		175.832		169.326
32		174.583		163.09		137.815
36		155.04		164.154		121.861
40		155.37		156.028		102.014
44		152.546		138.171		91.6088
48		146.419		135.447		84.3884
52		139.788		125.968		89.2374
56		113.933		122.592		81.021
64		110.792		106.484		84.648
80				87.4692		60.6054
96				87.7201		57.9622
112				74.9503		49.468
128				67.2649		47.0254




^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] solved APM bug with -rc5
  2002-08-01 16:53           ` Alan Cox
  2002-08-01 16:41             ` Willy Tarreau
@ 2002-08-01 20:35             ` Willy TARREAU
  2002-08-01 20:52               ` Richard Gooch
  2002-08-01 22:16               ` Alan Cox
  1 sibling, 2 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 20:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy Tarreau, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 05:53:46PM +0100, Alan Cox wrote:
> Very curious indeed because someone else reported that rc3-ac5 works
> (which has the same vm86 code). In addition the vm86 handler in the
> kernel isnt actually used for APM. We make 32bit APM calls and the one
> 16bit case we do is a true return to real mode.

I finally got rid of it ! I now understand why it hanged randomly, and
why I spent lots of time adding/removing unrelated patches. It's because
in apm=power-off mode (SMP), a kernel thread is started for the apm()
function, which does bios calls. And sometimes, the bios is called from
CPU >0, which my bios doesn't like at all, thus explaining why the oopses
were corrupted.

By copying a piece of code somewhere else in the same file, I could force
apm() to be used only by CPU0. I could verify that it doesn't crash anymore,
and that I can also crash it on demand if I force CPU1.

The bonus is that I could re-enable the debug code in this function even
in SMP mode since we're sure that it runs on CPU0.

Here is the patch against 2.4.19-rc5. Marcelo, Alan, please review and apply.

Cheers,
Willy


diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
--- linux-2.4.19-rc5/arch/i386/kernel/apm.c	Thu Aug  1 22:07:39 2002
+++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c	Thu Aug  1 22:26:56 2002
@@ -1661,6 +1661,17 @@
 	strcpy(current->comm, "kapmd");
 	sigfillset(&current->blocked);
 
+#ifdef CONFIG_SMP
+	/* 2002/08/01 - WT
+	 * This is to avoid random crashes at boot time during initialization
+	 * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
+	 * Some bioses don't like being called from CPU != 0.
+	 */
+	while (cpu_number_map(smp_processor_id()) != 0) {
+		schedule();
+	}
+#endif
+	
 	if (apm_info.connection_version == 0) {
 		apm_info.connection_version = apm_info.bios.version;
 		if (apm_info.connection_version > 0x100) {
@@ -1707,7 +1718,7 @@
 		}
 	}
 
-	if (debug && (smp_num_cpus == 1)) {
+	if (debug) {
 		error = apm_get_power_status(&bx, &cx, &dx);
 		if (error)
 			printk(KERN_INFO "apm: power status not available\n");

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 20:35             ` [PATCH] solved APM bug with -rc5 Willy TARREAU
@ 2002-08-01 20:52               ` Richard Gooch
  2002-08-01 20:54                 ` Richard Gooch
  2002-08-01 20:58                 ` Dave Jones
  2002-08-01 22:16               ` Alan Cox
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Gooch @ 2002-08-01 20:52 UTC (permalink / raw)
  To: Willy TARREAU; +Cc: Alan Cox, Marcelo Tosatti, lkml

Willy TARREAU writes:
> I finally got rid of it ! I now understand why it hanged randomly, and
> why I spent lots of time adding/removing unrelated patches. It's because
> in apm=power-off mode (SMP), a kernel thread is started for the apm()
> function, which does bios calls. And sometimes, the bios is called from
> CPU >0, which my bios doesn't like at all, thus explaining why the oopses
> were corrupted.
[...]
> diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
> --- linux-2.4.19-rc5/arch/i386/kernel/apm.c	Thu Aug  1 22:07:39 2002
> +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c	Thu Aug  1 22:26:56 2002
> @@ -1661,6 +1661,17 @@
>  	strcpy(current->comm, "kapmd");
>  	sigfillset(&current->blocked);
>  
> +#ifdef CONFIG_SMP
> +	/* 2002/08/01 - WT
> +	 * This is to avoid random crashes at boot time during initialization
> +	 * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> +	 * Some bioses don't like being called from CPU != 0.
> +	 */
> +	while (cpu_number_map(smp_processor_id()) != 0) {
> +		schedule();
> +	}
> +#endif
> +	

Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
wonderful world of preemption means that you can get rescheduled on
another CPU without warning, unless you take a lock or explicitely
disable preemption.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 20:52               ` Richard Gooch
@ 2002-08-01 20:54                 ` Richard Gooch
  2002-08-01 21:17                   ` Willy TARREAU
  2002-08-01 20:58                 ` Dave Jones
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Gooch @ 2002-08-01 20:54 UTC (permalink / raw)
  To: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml

Richard Gooch writes:
> Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> wonderful world of preemption means that you can get rescheduled on
> another CPU without warning, unless you take a lock or explicitely
> disable preemption.

Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
doesn't exist on 2.4 (thankfully).

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 20:52               ` Richard Gooch
  2002-08-01 20:54                 ` Richard Gooch
@ 2002-08-01 20:58                 ` Dave Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Dave Jones @ 2002-08-01 20:58 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Willy TARREAU, Alan Cox, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 02:52:16PM -0600, Richard Gooch wrote:
 > > diff -urN linux-2.4.19-rc5/arch/i386/kernel/apm.c linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c
 > > --- linux-2.4.19-rc5/arch/i386/kernel/apm.c	Thu Aug  1 22:07:39 2002
 > > +++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c	Thu Aug  1 22:26:56 2002
 > 
 > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
 > wonderful world of preemption means that you can get rescheduled on
 > another CPU without warning, unless you take a lock or explicitely
 > disable preemption.

It's a 2.4 patch. Leave preemption problems to those insane
enough to run 2.4+preempt.

        Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 22:16               ` Alan Cox
@ 2002-08-01 21:07                 ` Willy Tarreau
  2002-08-01 21:47                   ` Linus Torvalds
  2002-08-02  0:12                 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
  1 sibling, 1 reply; 56+ messages in thread
From: Willy Tarreau @ 2002-08-01 21:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy TARREAU, Marcelo Tosatti, lkml

On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
> > +	while (cpu_number_map(smp_processor_id()) != 0) {
> > +		schedule();
> > +	}
 
> What guarantees that loop will ever exit ?

none, as in the already existing other implementation. But at least, I'd
prefer an infinite loop instead of some random code being executed without
noticing it.

Do you know a better way of doing that ? The other implementation
used a fake thread which also did a schedule(). I wonder if this
is to make the scheduler work a bit more so that we get more
chances to swap the CPU.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 20:54                 ` Richard Gooch
@ 2002-08-01 21:17                   ` Willy TARREAU
  2002-08-01 22:37                     ` Alan Cox
  0 siblings, 1 reply; 56+ messages in thread
From: Willy TARREAU @ 2002-08-01 21:17 UTC (permalink / raw)
  To: Richard Gooch; +Cc: lkml

On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote:
> Richard Gooch writes:
> > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> > wonderful world of preemption means that you can get rescheduled on
> > another CPU without warning, unless you take a lock or explicitely
> > disable preemption.
> 
> Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
> doesn't exist on 2.4 (thankfully).

Never mind, your comment is interesting anyway because it shows that
preemption patch for 2.4 needs to adapt to such updates.

Thanks,
willy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 21:07                 ` Willy Tarreau
@ 2002-08-01 21:47                   ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2002-08-01 21:47 UTC (permalink / raw)
  To: linux-kernel

In article <20020801210745.GA20387@alpha.home.local>,
Willy Tarreau  <willy@w.ods.org> wrote:
>On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
>> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
>> > +	while (cpu_number_map(smp_processor_id()) != 0) {
>> > +		schedule();
>> > +	}
> 
>> What guarantees that loop will ever exit ?
>
>none, as in the already existing other implementation. But at least, I'd
>prefer an infinite loop instead of some random code being executed without
>noticing it.
>
>Do you know a better way of doing that ?

It should set its CPU affinity to be cpu0. I don't know how well that
works in 2.4.x, though. Ask Ingo..

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 20:35             ` [PATCH] solved APM bug with -rc5 Willy TARREAU
  2002-08-01 20:52               ` Richard Gooch
@ 2002-08-01 22:16               ` Alan Cox
  2002-08-01 21:07                 ` Willy Tarreau
  2002-08-02  0:12                 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
  1 sibling, 2 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 22:16 UTC (permalink / raw)
  To: Willy TARREAU; +Cc: Marcelo Tosatti, lkml

On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:

>  
> +#ifdef CONFIG_SMP
> +	/* 2002/08/01 - WT
> +	 * This is to avoid random crashes at boot time during initialization
> +	 * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> +	 * Some bioses don't like being called from CPU != 0.
> +	 */
> +	while (cpu_number_map(smp_processor_id()) != 0) {
> +		schedule();
> +	}
> +#endif

What guarantees that loop will ever exit ?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] solved APM bug with -rc5
  2002-08-01 21:17                   ` Willy TARREAU
@ 2002-08-01 22:37                     ` Alan Cox
  0 siblings, 0 replies; 56+ messages in thread
From: Alan Cox @ 2002-08-01 22:37 UTC (permalink / raw)
  To: Willy TARREAU; +Cc: Richard Gooch, lkml

On Thu, 2002-08-01 at 22:17, Willy TARREAU wrote:
> On Thu, Aug 01, 2002 at 02:54:08PM -0600, Richard Gooch wrote:
> > Richard Gooch writes:
> > > Hm. I bet you didn't try this with CONFIG_PREEMPT=y, right? IIRC, the
> > > wonderful world of preemption means that you can get rescheduled on
> > > another CPU without warning, unless you take a lock or explicitely
> > > disable preemption.
> > 
> > Apologies. I forgot that CONFIG_PREEMPT is a 2.5.x feature, and
> > doesn't exist on 2.4 (thankfully).
> 
> Never mind, your comment is interesting anyway because it shows that
> preemption patch for 2.4 needs to adapt to such updates.

Pre-emption for 2.4 needs to do a lot of work on raid and even athlon
compiles to fix the FPU stuff, let alone corner cases


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] solved APM bug with -rc5 (take 2)
  2002-08-01 22:16               ` Alan Cox
  2002-08-01 21:07                 ` Willy Tarreau
@ 2002-08-02  0:12                 ` Willy TARREAU
  1 sibling, 0 replies; 56+ messages in thread
From: Willy TARREAU @ 2002-08-02  0:12 UTC (permalink / raw)
  To: Alan Cox, Marcelo Tosatti; +Cc: Linus Torvalds, Ingo Molnar, lkml

On Thu, Aug 01, 2002 at 11:16:23PM +0100, Alan Cox wrote:
> On Thu, 2002-08-01 at 21:35, Willy TARREAU wrote:
> > +#ifdef CONFIG_SMP
> > +	/* 2002/08/01 - WT
> > +	 * This is to avoid random crashes at boot time during initialization
> > +	 * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
> > +	 * Some bioses don't like being called from CPU != 0.
> > +	 */
> > +	while (cpu_number_map(smp_processor_id()) != 0) {
> > +		schedule();
> > +	}
> > +#endif
> 
> What guarantees that loop will ever exit ?

I asked Ingo for some advice, and he gently sent me a piece of code as an
example of how to reliably bind a task to a CPU. I tried it, and it's OK here.
I could reliably switch several times from cpu0 to cpu1, then back to cpu0.
Since it was cleaner than the previous method, I also did the same for
apm_power_off(), thus getting rid of apm_magic() and its dedicated thread.
Then again, I tested with multiple cpu switches, and every time, my system
correctly handles the case. I'm writing this mail under 2.4.19-rc5.

So here is the patch against 2.4.19-rc5, hoping it will get in this time.
I think it should apply without a glitch to 2.4.19-rc5-ac1, but don't
know about 2.5, nor even if it is needed.

Feedback welcome, of course ;-)

Cheers,
Willy


--- linux-2.4.19-rc5/arch/i386/kernel/apm.c	Thu Aug  1 22:07:39 2002
+++ linux-2.4.19-rc5-fix/arch/i386/kernel/apm.c	Fri Aug  2 01:52:55 2002
@@ -862,14 +862,6 @@
 		apm_do_busy();
 }
 
-#ifdef CONFIG_SMP
-static int apm_magic(void * unused)
-{
-	while (1)
-		schedule();
-}
-#endif
-
 /**
  *	apm_power_off	-	ask the BIOS to power off
  *
@@ -897,10 +889,11 @@
 	 */
 #ifdef CONFIG_SMP
 	/* Some bioses don't like being called from CPU != 0 */
-	while (cpu_number_map(smp_processor_id()) != 0) {
-		kernel_thread(apm_magic, NULL,
-			CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
+	if (cpu_number_map(smp_processor_id()) != 0) {
+		current->cpus_allowed = 1;
 		schedule();
+		if (unlikely(cpu_number_map(smp_processor_id()) != 0))
+			BUG();
 	}
 #endif
 	if (apm_info.realmode_power_off)
@@ -1661,6 +1654,21 @@
 	strcpy(current->comm, "kapmd");
 	sigfillset(&current->blocked);
 
+#ifdef CONFIG_SMP
+	/* 2002/08/01 - WT
+	 * This is to avoid random crashes at boot time during initialization
+	 * on SMP systems in case of "apm=power-off" mode. Seen on ASUS A7M266D.
+	 * Some bioses don't like being called from CPU != 0.
+	 * Method suggested by Ingo Molnar.
+	 */
+	if (cpu_number_map(smp_processor_id()) != 0) {
+		current->cpus_allowed = 1;
+		schedule();
+		if (unlikely(cpu_number_map(smp_processor_id()) != 0))
+			BUG();
+	}
+#endif
+	
 	if (apm_info.connection_version == 0) {
 		apm_info.connection_version = apm_info.bios.version;
 		if (apm_info.connection_version > 0x100) {
@@ -1707,7 +1715,7 @@
 		}
 	}
 
-	if (debug && (smp_num_cpus == 1)) {
+	if (debug) {
 		error = apm_get_power_status(&bx, &cx, &dx);
 		if (error)
 			printk(KERN_INFO "apm: power status not available\n");


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] pdc20265 problem.
  2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
@ 2002-08-02  1:47 ` Nick Orlov
  2002-08-02  2:29   ` Nick Orlov
  2002-08-02 12:27   ` Alan Cox
  4 siblings, 2 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02  1:47 UTC (permalink / raw)
  To: lkml

[-- Attachment #1: Type: text/plain, Size: 329 bytes --]


> <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak

Because of this fix my Promise 20265 became ide0 instead of ide2.
Is there any reason to mark pdc20265 as ON_BOARD controller?

Anyway, attached patch fix it for me :)

-- 
With best wishes,
	Nick Orlov.


[-- Attachment #2: pcd20265.patch --]
[-- Type: text/plain, Size: 301 bytes --]

408c408
<         {DEVID_PDC20265,"PDC20265",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	ON_BOARD,	48 },
---
>         {DEVID_PDC20265,"PDC20265",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	OFF_BOARD,	48 },

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] pdc20265 problem.
  2002-08-02  1:47 ` [PATCH] pdc20265 problem Nick Orlov
@ 2002-08-02  2:29   ` Nick Orlov
  2002-08-02 12:27   ` Alan Cox
  1 sibling, 0 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02  2:29 UTC (permalink / raw)
  To: lkml

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

On Thu, Aug 01, 2002 at 09:47:28PM -0400, Nick Orlov wrote:
> 
> > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> 
> Because of this fix my Promise 20265 became ide0 instead of ide2.
> Is there any reason to mark pdc20265 as ON_BOARD controller?
> 
> Anyway, attached patch fix it for me :)
> 

Sorry, wrong diff format. Rediffed and attached.

-- 
With best wishes,
	Nick Orlov.


[-- Attachment #2: pdc20265.patch --]
[-- Type: text/plain, Size: 1067 bytes --]

--- linux/drivers/ide/ide-pci.c.orig	2002-08-01 21:41:29.000000000 -0400
+++ linux/drivers/ide/ide-pci.c	2002-08-01 21:10:27.000000000 -0400
@@ -405,7 +405,7 @@
 #ifndef CONFIG_PDC202XX_FORCE
         {DEVID_PDC20246,"PDC20246",	PCI_PDC202XX,	NULL,		INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	OFF_BOARD,	16 },
         {DEVID_PDC20262,"PDC20262",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	OFF_BOARD,	48 },
-        {DEVID_PDC20265,"PDC20265",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	ON_BOARD,	48 },
+        {DEVID_PDC20265,"PDC20265",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	OFF_BOARD,	48 },
         {DEVID_PDC20267,"PDC20267",	PCI_PDC202XX,	ATA66_PDC202XX,	INIT_PDC202XX,	NULL,		{{0x00,0x00,0x00}, {0x00,0x00,0x00}},	OFF_BOARD,	48 },
 #else /* !CONFIG_PDC202XX_FORCE */
 	{DEVID_PDC20246,"PDC20246",	PCI_PDC202XX,	NULL,		INIT_PDC202XX,	NULL,		{{0x50,0x02,0x02}, {0x50,0x04,0x04}}, 	OFF_BOARD,	16 },

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] pdc20265 problem.
  2002-08-02  1:47 ` [PATCH] pdc20265 problem Nick Orlov
  2002-08-02  2:29   ` Nick Orlov
@ 2002-08-02 12:27   ` Alan Cox
  2002-08-02 12:52     ` Nick Orlov
  1 sibling, 1 reply; 56+ messages in thread
From: Alan Cox @ 2002-08-02 12:27 UTC (permalink / raw)
  To: Nick Orlov; +Cc: lkml

On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> 
> > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> 
> Because of this fix my Promise 20265 became ide0 instead of ide2.
> Is there any reason to mark pdc20265 as ON_BOARD controller?

How about because it can be and it should be checked. I don't know what
is going on with the ifdef in your case to cause this but its not as
simple as it seems


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] pdc20265 problem.
  2002-08-02 12:27   ` Alan Cox
@ 2002-08-02 12:52     ` Nick Orlov
  2002-08-02 14:00       ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 12:52 UTC (permalink / raw)
  To: lkml

On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> > 
> > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> > 
> > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > Is there any reason to mark pdc20265 as ON_BOARD controller?
> 
> How about because it can be and it should be checked. I don't know what
> is going on with the ifdef in your case to cause this but its not as
> simple as it seems

Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...

And what determines how id will be assigned to controllers if both of
them are ON_BOARD ?

-- 
With best wishes,
	Nick Orlov.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] pdc20265 problem.
  2002-08-02 12:52     ` Nick Orlov
@ 2002-08-02 14:00       ` Bartlomiej Zolnierkiewicz
  2002-08-02 14:45         ` Nick Orlov
  0 siblings, 1 reply; 56+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2002-08-02 14:00 UTC (permalink / raw)
  To: Nick Orlov; +Cc: lkml


On Fri, 2 Aug 2002, Nick Orlov wrote:

> On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> > >
> > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > > 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> > >
> > > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > > Is there any reason to mark pdc20265 as ON_BOARD controller?
> >
> > How about because it can be and it should be checked. I don't know what
> > is going on with the ifdef in your case to cause this but its not as
> > simple as it seems
>
> Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...
>
> And what determines how id will be assigned to controllers if both of
> them are ON_BOARD ?

AFAIR problem is that some vendors included onboard 20265 as primary
device (playing tricks for that) and to be consistent we have to treat it as
onboard, we have right now no way to check if it is on or offboard.
EDD support will probably help here.

Regards
--
Bartlomiej

> --
> With best wishes,
> 	Nick Orlov.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] pdc20265 problem.
  2002-08-02 14:00       ` Bartlomiej Zolnierkiewicz
@ 2002-08-02 14:45         ` Nick Orlov
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Orlov @ 2002-08-02 14:45 UTC (permalink / raw)
  To: lkml

On Fri, Aug 02, 2002 at 04:00:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 
> On Fri, 2 Aug 2002, Nick Orlov wrote:
> 
> > On Fri, Aug 02, 2002 at 01:27:25PM +0100, Alan Cox wrote:
> > > On Fri, 2002-08-02 at 02:47, Nick Orlov wrote:
> > > >
> > > > > <marcelo@plucky.distro.conectiva> (02/07/19 1.646)
> > > > > 	Fix wrong #ifdef in ide-pci.c: Was causing problems with FastTrak
> > > >
> > > > Because of this fix my Promise 20265 became ide0 instead of ide2.
> > > > Is there any reason to mark pdc20265 as ON_BOARD controller?
> > >
> > > How about because it can be and it should be checked. I don't know what
> > > is going on with the ifdef in your case to cause this but its not as
> > > simple as it seems
> >
> > Why pdc20265 is so special ? All other Promises marked as OFF_BOARD...
> >
> > And what determines how id will be assigned to controllers if both of
> > them are ON_BOARD ?
> 
> AFAIR problem is that some vendors included onboard 20265 as primary
> device (playing tricks for that) and to be consistent we have to treat it as
> onboard, we have right now no way to check if it is on or offboard.
> EDD support will probably help here.
> 

Just FYI,

before these "#ifdef" fixes it was treated as OFF_BOARD unless
CONFIG_PDC202XX_FORCE is set. (now it's inverted)

And my point is that it does not matter how physically this controller
installed - onboard or offboard. Idea is that we should have control
which controller should be treated as "primary" (ide0/1) and which as
"secondary" (ide2/3). I don't see/know how we can do it unless we mark
one of controllers ON_BOARD and another OFF_BOARD and play with
CONFIG_BLK_DEV_OFFBOARD.

And also I don't believe that this is good idea to treat one of Promises so
differently.

-- 
With best wishes,
	Nick Orlov.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01  7:55 ` Keith Owens
  2002-08-01  8:10   ` Jens Axboe
@ 2002-08-04  6:50   ` H. Peter Anvin
  1 sibling, 0 replies; 56+ messages in thread
From: H. Peter Anvin @ 2002-08-04  6:50 UTC (permalink / raw)
  To: Keith Owens; +Cc: Marcelo Tosatti, ftpadmin, lkml

Keith Owens wrote:
> patch-2.4.19-rc5.gz has been there for 25 minutes but the .bz2 file and
> the signature have not been created yet.  Is there a problem with the
> automatic conversion and signing code on master?

The sign/convert/upload machinery is sometimes slow when it is either 
transferring large files, or doing its daily "rsync --checksum" for 
paranoia's sake.  The latter happens at 00:00 local time, currently 
17:00 UTC.

	-hpa


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-01 20:15     ` Steven Cole
@ 2002-08-06  3:46       ` Bill Davidsen
  2002-08-06  4:30         ` Andrew Morton
                           ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Bill Davidsen @ 2002-08-06  3:46 UTC (permalink / raw)
  To: Steven Cole; +Cc: Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton, Steven Cole

On 1 Aug 2002, Steven Cole wrote:

> Here are some dbench numbers, from the "for what it's worth" department.
> This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> The first column is dbench clients.  The numbers are throughput
> in MB/sec.  The 2.5.29 kernel had a few RR-supplied smp fixes.
> Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> I've also ran this set of tests several times on -rc5 using ext3
> and data=writeback, and everything looks fine.
> 
> Steven

Call me an optimist, but after all the reliability problems we had win the
2.5 series, I sort of hoped it would be better in performance, not
increasingly worse. Am I misreading this? Can we fall back to the faster
2.4 code :-(
 
> 		2.4.19-rc2	2.4.19-rc5	2.5.29
> 
> 1		114.616		113.402		112.668
> 2		173.234		183.829		175.148
> 3		185.995		187.411		184.63
> 4		185.447		186.891		188.199
> 6		191.115		191.439		191.787
> 8		191.962		191.551		191.53
> 10		192.984		194.036		194.923
> 12		183.847		185.73		195.328
> 16		183.609		183.439		196.224
> 20		181.519		179.956		193.681
> 24		183.509		183.387		194.09
> 28		176.04		175.832		169.326
> 32		174.583		163.09		137.815
> 36		155.04		164.154		121.861
> 40		155.37		156.028		102.014
> 44		152.546		138.171		91.6088
> 48		146.419		135.447		84.3884
> 52		139.788		125.968		89.2374
> 56		113.933		122.592		81.021
> 64		110.792		106.484		84.648
> 80				87.4692		60.6054
> 96				87.7201		57.9622
> 112				74.9503		49.468
> 128				67.2649		47.0254

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  3:46       ` Bill Davidsen
@ 2002-08-06  4:30         ` Andrew Morton
  2002-08-06 14:07           ` Steven Cole
  2002-08-06  5:42         ` Jens Axboe
  2002-08-06 12:59         ` Rik van Riel
  2 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2002-08-06  4:30 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Steven Cole

Bill Davidsen wrote:
> 
> On 1 Aug 2002, Steven Cole wrote:
> 
> > Here are some dbench numbers, from the "for what it's worth" department.
> > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> > The first column is dbench clients.  The numbers are throughput
> > in MB/sec.  The 2.5.29 kernel had a few RR-supplied smp fixes.
> > Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> > I've also ran this set of tests several times on -rc5 using ext3
> > and data=writeback, and everything looks fine.
> >
> > Steven
> 
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(

IO in 2.5 is much more CPU efficient that in 2.4, and straight-line
bandwidth is better as well.

The scheduling of that IO has a few problems, so in wildly seeky loads
like dbench the kernel still falls over its own feet a bit.  The
two main culprits here are the lock_buffer() in block_write_full_page()
against the blockdev mapping, and the writeback of dirty pages from the
tail of the LRU in page reclaim.

And no, the eventual dbench numbers will not be a measure of the success
of the tuning which will happen on the run in to 2.6.  Dbench throughput
may well be lower, because we probably should be starting writeback
at lower dirty thresholds.

If you want good dbench numbers:

echo 70 > /proc/sys/vm/dirty_background_ratio
echo 75 > /proc/sys/vm/dirty_async_ratio
echo 80 > /proc/sys/vm/dirty_sync_ratio
echo 30000 > /proc/sys/vm/dirty_expire_centisecs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  3:46       ` Bill Davidsen
  2002-08-06  4:30         ` Andrew Morton
@ 2002-08-06  5:42         ` Jens Axboe
  2002-08-06  8:30           ` Adrian Bunk
  2002-08-06 10:31           ` Lincoln Dale
  2002-08-06 12:59         ` Rik van Riel
  2 siblings, 2 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-06  5:42 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Steven Cole, Marcelo Tosatti, lkml, Andrew Morton, Steven Cole

On Mon, Aug 05 2002, Bill Davidsen wrote:
> On 1 Aug 2002, Steven Cole wrote:
> 
> > Here are some dbench numbers, from the "for what it's worth" department.
> > This was done with SMP kernels, on a dual p3 box, SCSI disk, ext2.
> > The first column is dbench clients.  The numbers are throughput
> > in MB/sec.  The 2.5.29 kernel had a few RR-supplied smp fixes.
> > Looks like for this limited test, 2.4.19-rc5 holds up pretty well.
> > I've also ran this set of tests several times on -rc5 using ext3
> > and data=writeback, and everything looks fine.
> > 
> > Steven
> 
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(

try a work load that excercises the block i/o layer alone (O_DIRECT,
raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
from ols, unfortunately I don't know if they have then online.

please don't put too much wait in dbench numbers for this sort of thing
:-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  5:42         ` Jens Axboe
@ 2002-08-06  8:30           ` Adrian Bunk
  2002-08-06  8:48             ` Jens Axboe
  2002-08-06 10:31           ` Lincoln Dale
  1 sibling, 1 reply; 56+ messages in thread
From: Adrian Bunk @ 2002-08-06  8:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bill Davidsen, lkml

On Tue, 6 Aug 2002, Jens Axboe wrote:

>...
> try a work load that excercises the block i/o layer alone (O_DIRECT,
> raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
> from ols, unfortunately I don't know if they have then online.
>...

Pages 390-406 in

  http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz

or are you talking about something different?

cu
Adrian

-- 

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
								Alan Cox


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  8:30           ` Adrian Bunk
@ 2002-08-06  8:48             ` Jens Axboe
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Axboe @ 2002-08-06  8:48 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Bill Davidsen, lkml

On Tue, Aug 06 2002, Adrian Bunk wrote:
> On Tue, 6 Aug 2002, Jens Axboe wrote:
> 
> >...
> > try a work load that excercises the block i/o layer alone (O_DIRECT,
> > raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
> > from ols, unfortunately I don't know if they have then online.
> >...
> 
> Pages 390-406 in
> 
>   http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
> 
> or are you talking about something different?

Right thanks, exactly those. Table 3 on page 395 is the one I noted.
Forget readv, as that hasn't been done in 2.5 yet. I'd say a 2.5.17
untweaked kernel beating 2.4 tweaked beyond recognition isn't too shabby
for a devel series kernel.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  5:42         ` Jens Axboe
  2002-08-06  8:30           ` Adrian Bunk
@ 2002-08-06 10:31           ` Lincoln Dale
  1 sibling, 0 replies; 56+ messages in thread
From: Lincoln Dale @ 2002-08-06 10:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bill Davidsen, Steven Cole, Marcelo Tosatti, lkml, Andrew Morton,
	Steven Cole

At 07:42 AM 6/08/2002 +0200, Jens Axboe wrote:
> > Call me an optimist, but after all the reliability problems we had win the
> > 2.5 series, I sort of hoped it would be better in performance, not
> > increasingly worse. Am I misreading this? Can we fall back to the faster
> > 2.4 code :-(
>
>try a work load that excercises the block i/o layer alone (O_DIRECT,
>raw, whatnot) and then compare 2.4 and 2.5. ibm had some slides on this
>from ols, unfortunately I don't know if they have then online.

the BIO in 2.5 kicks butt over the 2.4 BIO - both in terms of increased 
throughput and decreased cpu utilization.
see some testing i previously did: 
http://marc.theaimsgroup.com/?l=linux-kernel&m=102635456620627&w=2


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  3:46       ` Bill Davidsen
  2002-08-06  4:30         ` Andrew Morton
  2002-08-06  5:42         ` Jens Axboe
@ 2002-08-06 12:59         ` Rik van Riel
  2002-08-07  1:09           ` Bill Davidsen
  2 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-06 12:59 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Mon, 5 Aug 2002, Bill Davidsen wrote:

> > Here are some dbench numbers, from the "for what it's worth" department.
>
> Call me an optimist, but after all the reliability problems we had win the
> 2.5 series, I sort of hoped it would be better in performance, not
> increasingly worse. Am I misreading this? Can we fall back to the faster
> 2.4 code :-(

Dbench is at its best when half (or more) of the dbench processes
are stuck semi-infinitely in __get_request_wait and the others can
operate in RAM without ever touching the disk.

In effect, if you want the best dbench throughput you should make
the system completely unsuitable for real world applications ;)

There are a few things that are good for both real world performance
and dbench performance, but those are easily dwarved by random factors
like IO scheduling, timeslice length, etc...

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06  4:30         ` Andrew Morton
@ 2002-08-06 14:07           ` Steven Cole
  2002-08-06 14:20             ` Rik van Riel
  2002-08-06 17:12             ` Andrew Morton
  0 siblings, 2 replies; 56+ messages in thread
From: Steven Cole @ 2002-08-06 14:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml

On Mon, 2002-08-05 at 22:30, Andrew Morton wrote:
[snipped]
> 
> IO in 2.5 is much more CPU efficient that in 2.4, and straight-line
> bandwidth is better as well.
> 
> The scheduling of that IO has a few problems, so in wildly seeky loads
> like dbench the kernel still falls over its own feet a bit.  The
> two main culprits here are the lock_buffer() in block_write_full_page()
> against the blockdev mapping, and the writeback of dirty pages from the
> tail of the LRU in page reclaim.
> 
> And no, the eventual dbench numbers will not be a measure of the success
> of the tuning which will happen on the run in to 2.6.  Dbench throughput
> may well be lower, because we probably should be starting writeback
> at lower dirty thresholds.
> 
> If you want good dbench numbers:
> 
> echo 70 > /proc/sys/vm/dirty_background_ratio
> echo 75 > /proc/sys/vm/dirty_async_ratio
> echo 80 > /proc/sys/vm/dirty_sync_ratio
> echo 30000 > /proc/sys/vm/dirty_expire_centisecs

That last one looks like the biggest cheat.  Rather than optimizing for
dbench, is there a set of pessimizing numbers which would optimally turn
dbench into a semi-useful tool for measuring meaningful IO performance? 
Or is dbench really only useful for stress testing?

Thanks for the explanations.

Steven


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06 14:07           ` Steven Cole
@ 2002-08-06 14:20             ` Rik van Riel
  2002-08-06 17:12             ` Andrew Morton
  1 sibling, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2002-08-06 14:20 UTC (permalink / raw)
  To: Steven Cole
  Cc: Andrew Morton, Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml

On 6 Aug 2002, Steven Cole wrote:

> That last one looks like the biggest cheat.  Rather than optimizing for
> dbench, is there a set of pessimizing numbers which would optimally turn
> dbench into a semi-useful tool for measuring meaningful IO performance?
> Or is dbench really only useful for stress testing?

Yes, dbench is only useful as a stress testing tool.

A minor varation in kernel behaviour can change dbench
throughput by an order of magnitude and I'm not talking
about any specific kernel component here ... ANY kernel
component could trigger it.

While it is easy to measure dbench throughput, it is
nearly impossible to:

1) analyse why dbench throughput changed from kernel to kernel

2) predict the relation (if any) these changes in dbench
   throughput have with changes in performance of real
   applications, if any

3) identify which kernel subsystem was responsible for the
   change in dbench performance

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06 14:07           ` Steven Cole
  2002-08-06 14:20             ` Rik van Riel
@ 2002-08-06 17:12             ` Andrew Morton
  1 sibling, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2002-08-06 17:12 UTC (permalink / raw)
  To: Steven Cole; +Cc: Bill Davidsen, Marcelo Tosatti, Jens Axboe, lkml

Steven Cole wrote:
> 
> ...
> > If you want good dbench numbers:
> >
> > echo 70 > /proc/sys/vm/dirty_background_ratio
> > echo 75 > /proc/sys/vm/dirty_async_ratio
> > echo 80 > /proc/sys/vm/dirty_sync_ratio
> > echo 30000 > /proc/sys/vm/dirty_expire_centisecs
> 
> That last one looks like the biggest cheat.  Rather than optimizing for
> dbench, is there a set of pessimizing numbers which would optimally turn
> dbench into a semi-useful tool for measuring meaningful IO performance?
> Or is dbench really only useful for stress testing?
> 

We tend to use dbench in two modes nowadays.  One is the "RAM only"
mode, where the run completes before hitting disk at all.  That's
a very useful and repeatable test for CPU efficiency and lock contention.

The other mode is of course when there are enough clients and
enough dirty data for the test to go to disk.  As Rik says, this
tends to be subject to chaotic effects, and it is also extremely
non linear.

Because when the run slows down a little bit, it takes longer, so
more data becomes eligible for time-expiry-based writeback, which
causes more IO, which causes the run to take longer, etc, etc.

Yes, one does tend still to keep one's eye on the "heavy" dbench
throughput, but I suspect that tuning for this workload is a bad
thing overall.  This is because good dbench numbers come from
allowing a large amount of dirty data to float about in memory
(it will never get written out).  But for real workloads which
don't delete their own output 30 seconds later, we want to start
writeback earlier.  To use the disk bandwidth more smoothly
and to decrease memory allocation latency.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-06 12:59         ` Rik van Riel
@ 2002-08-07  1:09           ` Bill Davidsen
  2002-08-07  2:54             ` Steven Cole
  0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07  1:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Tue, 6 Aug 2002, Rik van Riel wrote:

> On Mon, 5 Aug 2002, Bill Davidsen wrote:
> 
> > > Here are some dbench numbers, from the "for what it's worth" department.
> >
> > Call me an optimist, but after all the reliability problems we had win the
> > 2.5 series, I sort of hoped it would be better in performance, not
> > increasingly worse. Am I misreading this? Can we fall back to the faster
> > 2.4 code :-(
> 
> Dbench is at its best when half (or more) of the dbench processes
> are stuck semi-infinitely in __get_request_wait and the others can
> operate in RAM without ever touching the disk.
> 
> In effect, if you want the best dbench throughput you should make
> the system completely unsuitable for real world applications ;)

I assumed that the posted results were apples and apples. That may not be
the case. If this was one kernel tuned for dbench and one for something
else, then the information content is pretty low, to me at least. But if
it is both tuned or both stock, then I would hope 2.5 would be better. If
the text said that and I read past it, I apologise.
 
> There are a few things that are good for both real world performance
> and dbench performance, but those are easily dwarved by random factors
> like IO scheduling, timeslice length, etc...

I confess to being a kernel junkie when I have the time, I have run into
the limitation of 19 boot stanzas in LILO :-( I have a case statement in
rc.local to tune -aa VM, stock, and -ac rmap a little differently, since
this machine is fairly fast and has bigish memory (2GB this week) and
getting several ISO images in RAM and then having bdflush kick them out is
bad. Looking forward to the io scheduler.

I like to see 2.4.19 vs. 2.5.{29+} both tuned and untuned, but I have no
days off in the next ten. By then there will be more new stuff, but the
fast machine will be several area codes away, perhaps one of the people
who like to do benchmarks might be too curious to wait.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07  1:09           ` Bill Davidsen
@ 2002-08-07  2:54             ` Steven Cole
  2002-08-07 22:30               ` Bill Davidsen
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Cole @ 2002-08-07  2:54 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote:
> On Tue, 6 Aug 2002, Rik van Riel wrote:
> 
> > On Mon, 5 Aug 2002, Bill Davidsen wrote:
> > 
> > > > Here are some dbench numbers, from the "for what it's worth" department.
> > >
> > > Call me an optimist, but after all the reliability problems we had win the
> > > 2.5 series, I sort of hoped it would be better in performance, not
> > > increasingly worse. Am I misreading this? Can we fall back to the faster
> > > 2.4 code :-(
> > 
> > Dbench is at its best when half (or more) of the dbench processes
> > are stuck semi-infinitely in __get_request_wait and the others can
> > operate in RAM without ever touching the disk.
> > 
> > In effect, if you want the best dbench throughput you should make
> > the system completely unsuitable for real world applications ;)
> 
> I assumed that the posted results were apples and apples. That may not be

Well, maybe Granny Smiths and Red Delicious. The problem with dbench is
that it checks how well they roll and bounce.  But even that can be
important sometimes. ;)

> the case. If this was one kernel tuned for dbench and one for something
> else, then the information content is pretty low, to me at least. But if
> it is both tuned or both stock, then I would hope 2.5 would be better. If
> the text said that and I read past it, I apologise.

All kernels were stock as patched with no special changes to 
/proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x.
Sorry, I didn't explicitly state that in the initial report.

Steven


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07  2:54             ` Steven Cole
@ 2002-08-07 22:30               ` Bill Davidsen
  2002-08-07 22:39                 ` Rik van Riel
  0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07 22:30 UTC (permalink / raw)
  To: Steven Cole
  Cc: Rik van Riel, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On 6 Aug 2002, Steven Cole wrote:

> On Tue, 2002-08-06 at 19:09, Bill Davidsen wrote:

> > I assumed that the posted results were apples and apples. That may not be
> 
> Well, maybe Granny Smiths and Red Delicious. The problem with dbench is
> that it checks how well they roll and bounce.  But even that can be
> important sometimes. ;)
> 
> > the case. If this was one kernel tuned for dbench and one for something
> > else, then the information content is pretty low, to me at least. But if
> > it is both tuned or both stock, then I would hope 2.5 would be better. If
> > the text said that and I read past it, I apologise.
> 
> All kernels were stock as patched with no special changes to 
> /proc/sys/vm/bdflush for 2.4.x or to /proc/sys/vm/dirty* for 2.5.x.
> Sorry, I didn't explicitly state that in the initial report.

Actually that was what I was assuming when I noted that the 2.5 appeared
to be slower by a good bit for some high load values of dbench. In a
perfect world the kernel would hit the hardware spped, guess no one is
claiming that until 2.7 ;-)

The initial results from the io scheduler, as posted here, look as if
there will be a way to "take it up another notch" in the future.

Thanks much for the clarification, the data are useful even if they do
show room for improvement in the corner case.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07 22:30               ` Bill Davidsen
@ 2002-08-07 22:39                 ` Rik van Riel
  2002-08-07 23:44                   ` Bill Davidsen
  0 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-07 22:39 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Wed, 7 Aug 2002, Bill Davidsen wrote:

> Thanks much for the clarification, the data are useful even if they do
> show room for improvement in the corner case.

If dbench numbers are meaningful to you, maybe you could
translate them into something kernel developers can
understand ? ;)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07 22:39                 ` Rik van Riel
@ 2002-08-07 23:44                   ` Bill Davidsen
  2002-08-07 23:53                     ` Rik van Riel
  0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-07 23:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Wed, 7 Aug 2002, Rik van Riel wrote:

> On Wed, 7 Aug 2002, Bill Davidsen wrote:
> 
> > Thanks much for the clarification, the data are useful even if they do
> > show room for improvement in the corner case.
> 
> If dbench numbers are meaningful to you, maybe you could
> translate them into something kernel developers can
> understand ? ;)

Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing isn't
working as well, another problem, go have a beer to drown your sorrow. On
the other hand if it runs much better, you have done a great job and can
go have a beer to celebrate.

Seriously, I would read the reasonably smooth curve of values as good
sign, as opposed to "gets real badd and improves under more load" or
similar. And the fact that it stayed up, and presumably didn't eat all the
filesystems indicates that the system is getting more stable IDE.

One more thing, if you have been fighting bad machines for 15 hours and no
one is looking, it's time to go get a beer. And cashews, and cheddar. I am
out of here (as in where I am working right now, not my office).

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07 23:44                   ` Bill Davidsen
@ 2002-08-07 23:53                     ` Rik van Riel
  2002-08-09 17:46                       ` Bill Davidsen
  0 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2002-08-07 23:53 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Steven Cole, Marcelo Tosatti, Jens Axboe, lkml, Andrew Morton,
	Steven Cole

On Wed, 7 Aug 2002, Bill Davidsen wrote:

> Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> isn't working as well,

Are you volunteering to identify that "something" for us ?

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-07 23:53                     ` Rik van Riel
@ 2002-08-09 17:46                       ` Bill Davidsen
  2002-08-09 19:27                         ` Rik van Riel
  0 siblings, 1 reply; 56+ messages in thread
From: Bill Davidsen @ 2002-08-09 17:46 UTC (permalink / raw)
  To: Rik van Riel; +Cc: lkml

On Wed, 7 Aug 2002, Rik van Riel wrote:

> On Wed, 7 Aug 2002, Bill Davidsen wrote:
> 
> > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> > isn't working as well,
> 
> Are you volunteering to identify that "something" for us ?

Hell no. I was simply commenting that there is some general qualitative
information available from those numbers, even if it is hard to quantify
them. Not working as well for a benchmark may indicate much better typical
performance, and as I understand dbench the io scheduler may improve that
significantly as well.

No, clearly there are other, probably a lot more representative numbers,
which show 2.5 is better. "Isn't working as well" for one thing doesn't
mean "in general," but might be of interest to the primary developers.

The fact that the curve doesn't end in a reload from backup tells me that
the IDE code is much better that it was ;-)

What time I have for diddling kernel code is spent on making network code
changes, and is all against 2.4 base. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Linux v2.4.19-rc5
  2002-08-09 17:46                       ` Bill Davidsen
@ 2002-08-09 19:27                         ` Rik van Riel
  0 siblings, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2002-08-09 19:27 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: lkml

On Fri, 9 Aug 2002, Bill Davidsen wrote:
> On Wed, 7 Aug 2002, Rik van Riel wrote:
> > On Wed, 7 Aug 2002, Bill Davidsen wrote:
> >
> > > Sure, glad to. If the 2.5 numbers are much worse than 2.4, somthing
> > > isn't working as well,
> >
> > Are you volunteering to identify that "something" for us ?
>
> Hell no. I was simply commenting that there is some general qualitative
> information available from those numbers, even if it is hard to quantify
> them.

As long as there is nobody to interpret what the dbench
numbers actually mean, why are we treating them as the
most important thing around ? ;)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2002-08-09 19:24 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-01  6:38 Linux v2.4.19-rc5 Marcelo Tosatti
2002-08-01  7:49 ` Jens Axboe
2002-08-01  7:14   ` Marcelo Tosatti
2002-08-01  8:10     ` Jens Axboe
2002-08-01  9:02       ` Andrew Morton
2002-08-01  8:58         ` Jens Axboe
2002-08-01 14:45         ` Steven Cole
2002-08-01 18:57           ` Andrew Morton
2002-08-01 20:15     ` Steven Cole
2002-08-06  3:46       ` Bill Davidsen
2002-08-06  4:30         ` Andrew Morton
2002-08-06 14:07           ` Steven Cole
2002-08-06 14:20             ` Rik van Riel
2002-08-06 17:12             ` Andrew Morton
2002-08-06  5:42         ` Jens Axboe
2002-08-06  8:30           ` Adrian Bunk
2002-08-06  8:48             ` Jens Axboe
2002-08-06 10:31           ` Lincoln Dale
2002-08-06 12:59         ` Rik van Riel
2002-08-07  1:09           ` Bill Davidsen
2002-08-07  2:54             ` Steven Cole
2002-08-07 22:30               ` Bill Davidsen
2002-08-07 22:39                 ` Rik van Riel
2002-08-07 23:44                   ` Bill Davidsen
2002-08-07 23:53                     ` Rik van Riel
2002-08-09 17:46                       ` Bill Davidsen
2002-08-09 19:27                         ` Rik van Riel
2002-08-01  7:55 ` Keith Owens
2002-08-01  8:10   ` Jens Axboe
2002-08-04  6:50   ` H. Peter Anvin
2002-08-01 11:32 ` Willy TARREAU
2002-08-01 13:54   ` Alan Cox
2002-08-01 12:48     ` Willy TARREAU
2002-08-01 12:12 ` Linux v2.4.19-rc5 - APM bug Willy TARREAU
2002-08-01 13:32   ` [PANIC] APM bug with -rc4 and -rc5 Willy TARREAU
2002-08-01 14:55     ` Alan Cox
2002-08-01 13:56       ` Willy Tarreau
2002-08-01 15:24         ` Willy Tarreau
2002-08-01 16:53           ` Alan Cox
2002-08-01 16:41             ` Willy Tarreau
2002-08-01 20:35             ` [PATCH] solved APM bug with -rc5 Willy TARREAU
2002-08-01 20:52               ` Richard Gooch
2002-08-01 20:54                 ` Richard Gooch
2002-08-01 21:17                   ` Willy TARREAU
2002-08-01 22:37                     ` Alan Cox
2002-08-01 20:58                 ` Dave Jones
2002-08-01 22:16               ` Alan Cox
2002-08-01 21:07                 ` Willy Tarreau
2002-08-01 21:47                   ` Linus Torvalds
2002-08-02  0:12                 ` [PATCH] solved APM bug with -rc5 (take 2) Willy TARREAU
2002-08-02  1:47 ` [PATCH] pdc20265 problem Nick Orlov
2002-08-02  2:29   ` Nick Orlov
2002-08-02 12:27   ` Alan Cox
2002-08-02 12:52     ` Nick Orlov
2002-08-02 14:00       ` Bartlomiej Zolnierkiewicz
2002-08-02 14:45         ` Nick Orlov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).