linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 2.4.21-rc7
@ 2003-06-03 17:04 Marcelo Tosatti
  2003-06-03 18:02 ` Tomas Szepe
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-03 17:04 UTC (permalink / raw)
  To: lkml


Hallo,

Now I really hope its the last one, all this rc's are making me mad.

Ok, here it is.


Summary of changes from v2.4.21-rc6 to v2.4.21-rc7
============================================

<ehabkost@conectiva.com.br>:
  o [SPARC]: Export phys_base on sparc32

<jgarzik@pobox.com>:
  o fix olympic driver build

<lethal@linux-sh.org>:
  o Fix Solution Engine 7751 Build
  o Define VM_DATA_DEFAULT_FLAGS for SH

<wesolows@foobazco.org>:
  o [sparc]: Attempt mul/div emulation handling on all cpus

David S. Miller <davem@nuts.ninka.net>:
  o [SPARC]: Fix sys_ipc to return ENOSYS instead of EINVAL as appropriate
  o [SPARC64]: Implement dump_stack in 2.4.x
  o [SPARC64]: Only use power interrupt when button property exists
  o [IPV4/IPV6]: Use Jenkins hash for fragment reassembly handling
  o [IPV6]: Input full addresses into TCP_SYNQ hash function
  o [IPV4]: Add sysctl to control ipfrag_secret_interval
  o [SPARC64]: Fix probe error handling in envctrl.c driver
  o [SPARC64]: Fix probe error handling in bbc_{envctrl,i2c}.c driver
  o [SPARC64]: Fix exploitable holes and bugs in ioctl32 translations

Douglas Gilbert <dougg@torque.net>:
  o sg: Fix side effect introduced by last "off by one" fix

Eric Brower <ebrower@usa.net>:
  o [SPARC]: Refactor AUXIO support

Marcelo Tosatti <marcelo@freak.distro.conectiva>:
  o Changed EXTRAVERSION to -rc7

Pete Zaitcev <zaitcev@redhat.com>:
  o [sparc] Force type in __put_user
  o [SPARC]: Fix gcc-3.x builds

Rob Radez <rob@osinvestor.com>:
  o [sparc]: Fix uninitialized spinlock in SRMMU code
  o [SPARC]: Kill initialize_secondary, unused


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
@ 2003-06-03 18:02 ` Tomas Szepe
  2003-06-03 18:07   ` Marcelo Tosatti
  2003-06-03 18:30 ` Alex Romosan
  2003-06-05 12:09 ` Andreas Haumer
  2 siblings, 1 reply; 19+ messages in thread
From: Tomas Szepe @ 2003-06-03 18:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml, alan

> [marcelo@conectiva.com.br]
> 
> Now I really hope its the last one, all this rc's are making me mad.

Are you quite sure you don't want Alan to get you the updates necessary
for IDE to build as modules for .21 final?

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 18:02 ` Tomas Szepe
@ 2003-06-03 18:07   ` Marcelo Tosatti
  2003-06-03 19:15     ` lk
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-03 18:07 UTC (permalink / raw)
  To: Tomas Szepe; +Cc: lkml, alan



On Tue, 3 Jun 2003, Tomas Szepe wrote:

> > [marcelo@conectiva.com.br]
> >
> > Now I really hope its the last one, all this rc's are making me mad.
>
> Are you quite sure you don't want Alan to get you the updates necessary
> for IDE to build as modules for .21 final?

Well, I can for sure release -rc8 with that.

I just want this possible -rc8 to be released no later than tonight.

Alan?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
  2003-06-03 18:02 ` Tomas Szepe
@ 2003-06-03 18:30 ` Alex Romosan
  2003-06-03 19:27   ` Jeff Garzik
  2003-06-05 12:09 ` Andreas Haumer
  2 siblings, 1 reply; 19+ messages in thread
From: Alex Romosan @ 2003-06-03 18:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

Marcelo Tosatti <marcelo@conectiva.com.br> writes:

> Now I really hope its the last one, all this rc's are making me mad.

i still can't get it to compile for sparc32:

gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7   -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms  -DEXPORT_SYMTAB -c ksyms.c
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
/usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `s' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_to_user':
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `s' conflicts with asm clobber list
make[3]: *** [ksyms.o] Error 1
make[3]: Leaving directory `/usr/src/linux/kernel'
make[2]: *** [first_rule] Error 2
make[2]: Leaving directory `/usr/src/linux/kernel'
make[1]: *** [_dir_kernel] Error 2
make[1]: Leaving directory `/usr/src/linux'
make: *** [stamp-build] Error 2

not sure when this started. the last kernel i managed to compile was
rc2 (skipped rc3 and rc4, rc5 didn't compile). the last one that will
boot was 2.4.21-pre1. this is on a sun4m Fujitsu TurboSparc.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 18:07   ` Marcelo Tosatti
@ 2003-06-03 19:15     ` lk
  2003-06-03 19:40       ` Alan Cox
  0 siblings, 1 reply; 19+ messages in thread
From: lk @ 2003-06-03 19:15 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

> > > Now I really hope its the last one, all this rc's are making me mad.
> >
> > Are you quite sure you don't want Alan to get you the updates necessary
> > for IDE to build as modules for .21 final?
> 
> Well, I can for sure release -rc8 with that.
> 
> I just want this possible -rc8 to be released no later than tonight.

Unfortunately I just committed my test box to production and can't test 
Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to 
include them in -rc8 as well.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 18:30 ` Alex Romosan
@ 2003-06-03 19:27   ` Jeff Garzik
  2003-06-03 19:58     ` Alex Romosan
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2003-06-03 19:27 UTC (permalink / raw)
  To: Alex Romosan; +Cc: Marcelo Tosatti, lkml

On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> 
> > Now I really hope its the last one, all this rc's are making me mad.
> 
> i still can't get it to compile for sparc32:
> 
> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7   -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms  -DEXPORT_SYMTAB -c ksyms.c
> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':

That looks like you either need a different compiler version,
or different binutils version...

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 19:15     ` lk
@ 2003-06-03 19:40       ` Alan Cox
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Cox @ 2003-06-03 19:40 UTC (permalink / raw)
  To: lk; +Cc: Marcelo Tosatti, lkml

On Maw, 2003-06-03 at 20:15, lk@trolloc.com wrote:
> Unfortunately I just committed my test box to production and can't test 
> Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to 
> include them in -rc8 as well.

You could add the dma autoenable but the rest should be avoided


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 19:27   ` Jeff Garzik
@ 2003-06-03 19:58     ` Alex Romosan
  2003-06-03 20:14       ` Tom Rini
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Romosan @ 2003-06-03 19:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Marcelo Tosatti, lkml

Jeff Garzik <jgarzik@pobox.com> writes:

> On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
>> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
>> 
>> > Now I really hope its the last one, all this rc's are making me mad.
>> 
>> i still can't get it to compile for sparc32:
>> 
>> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7   -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms  -DEXPORT_SYMTAB -c ksyms.c
>> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
>> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
>> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
>> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
>
> That looks like you either need a different compiler version,
> or different binutils version...

gcc (GCC) 3.3 (Debian)
GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux

the same versions work on i386 though...

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 19:58     ` Alex Romosan
@ 2003-06-03 20:14       ` Tom Rini
  2003-06-04  3:35         ` David S. Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Rini @ 2003-06-03 20:14 UTC (permalink / raw)
  To: Alex Romosan; +Cc: Jeff Garzik, Marcelo Tosatti, lkml

On Tue, Jun 03, 2003 at 12:58:40PM -0700, Alex Romosan wrote:

> Jeff Garzik <jgarzik@pobox.com> writes:
> 
> > On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
> >> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> >> 
> >> > Now I really hope its the last one, all this rc's are making me mad.
> >> 
> >> i still can't get it to compile for sparc32:
> >> 
> >> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7   -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms  -DEXPORT_SYMTAB -c ksyms.c
> >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
> >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
> >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
> >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
> >
> > That looks like you either need a different compiler version,
> > or different binutils version...
> 
> gcc (GCC) 3.3 (Debian)
> GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux

That would do it.

> the same versions work on i386 though...

Yes, but i386 either didn't have now invalid clober lists, or they were
fixed in the -pre portion (like it was on PPC32 as well).

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 20:14       ` Tom Rini
@ 2003-06-04  3:35         ` David S. Miller
  2003-06-04 15:09           ` Mr. James W. Laferriere
  2003-06-04 23:37           ` Alex Romosan
  0 siblings, 2 replies; 19+ messages in thread
From: David S. Miller @ 2003-06-04  3:35 UTC (permalink / raw)
  To: Tom Rini; +Cc: Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml

On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
> > gcc (GCC) 3.3 (Debian)
> > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
> 
> That would do it.

I don't trust anything past gcc-3.2.x on sparc and sparc64.
Use 3.3.x and later at your own peril.

-- 
David S. Miller <davem@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-04  3:35         ` David S. Miller
@ 2003-06-04 15:09           ` Mr. James W. Laferriere
  2003-06-04 23:37           ` Alex Romosan
  1 sibling, 0 replies; 19+ messages in thread
From: Mr. James W. Laferriere @ 2003-06-04 15:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: Tom Rini, Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml

	Hello Dave ,  Thank you for the warning .  Now how about why
	laymans style ?  Tia ,  JimL

On Tue, 3 Jun 2003, David S. Miller wrote:
> On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
> > > gcc (GCC) 3.3 (Debian)
> > > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
> > That would do it.
> I don't trust anything past gcc-3.2.x on sparc and sparc64.
> Use 3.3.x and later at your own peril.
-- 
       +------------------------------------------------------------------+
       | James   W.   Laferriere | System    Techniques | Give me VMS     |
       | Network        Engineer |     P.O. Box 854     |  Give me Linux  |
       | babydr@baby-dragons.com | Coudersport PA 16915 |   only  on  AXP |
       +------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-04  3:35         ` David S. Miller
  2003-06-04 15:09           ` Mr. James W. Laferriere
@ 2003-06-04 23:37           ` Alex Romosan
  1 sibling, 0 replies; 19+ messages in thread
From: Alex Romosan @ 2003-06-04 23:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Rini, Jeff Garzik, Marcelo Tosatti, lkml

"David S. Miller" <davem@redhat.com> writes:

> On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
>> > gcc (GCC) 3.3 (Debian)
>> > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
>> 
>> That would do it.
>
> I don't trust anything past gcc-3.2.x on sparc and sparc64.
> Use 3.3.x and later at your own peril.

recompiled with gcc-3.2.3 and the kernel not only compiled but also
booted. thank you.

--alex--

-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
  2003-06-03 18:02 ` Tomas Szepe
  2003-06-03 18:30 ` Alex Romosan
@ 2003-06-05 12:09 ` Andreas Haumer
  2003-06-07 15:46   ` Andreas Haumer
  2 siblings, 1 reply; 19+ messages in thread
From: Andreas Haumer @ 2003-06-05 12:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Marcelo Tosatti wrote:
> Hallo,
>
> Now I really hope its the last one, all this rc's are making me mad.
>
;-)

So, here's a report on the more positive side...

As I mentioned in some e-mails in the last few days,
I'm currently testing an Asus AP1700-S5 server with
a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
4x36GB U320SCSI drives (3 of them are assembled as RAID5),
connected via GBit Ethernet to our internal network

root@setup:~ {533} $ lspci
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02)

root@setup:~ {538} $ uptime
  2:05pm  up 18:09, 11 users,  load average: 8.03, 8.45, 8.15

This system is running 2.4.21-rc7 for more than 18 hours
now with the following load:

*) an endless loop to create and remove a large file on the
   RAID5 (ext3 filesystem):
   while true; do time dd if /dev/zero of /var/tmp/largefile bs 1M count 2000 ; rm -f /var/tmp/largefile; done

*) some commands to create additional load:
   cd /
   find . boot/ usr/ tmp/ opt/ var/ -xdev -type f -exec md5sum {} \;

*) NFS copy of a whole 40GB filesystem tree from a Linux NFS server
   to the RAID5 (in a loop)

*) the system is also NFS serving a Linux NFS client, which
   copies the whole server filesystem into /dev/null

*) Additionally, I have the following programs running:
   - Squid (currently used as proxy for our internal web browsers)
   - Apache
   - jedit (with j2sdk-1.4.1_01)
   - StarOffice-5.2
   - Mozilla-1.3.1
   - and lots of additional programs (shell, sshd, emacs), but
     no X server (we are using Linux workstations as X-Terminals)

All in all, there are more than 190 processes at any point in
time in the past 18 hours.
This all produces a permanent load between 7 and 9

vmstat 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  4  4 111720   3220  11344 423820   0   0     4 18976 4892  4273   2  68  30
 0  4  3 111720   3204  11352 423728  32   0    80 25216 1460  2095   0  15  85
 0  4  3 111716   3332  11352 423364  76   0    92 25796 1432  1895   2  14  84
 0  4  3 111716   3208  11372 423392  48   0   712 26336 1566  2346   4  14  81
 0  6  3 111716   3208  11412 423196 132   0   420 32820 1774  3113  12  19  69
 0  5  3 111716   3376  11440 422340 704   0   924 24444 1570  2811   3  17  79
 6  2  4 111716   2328  11560 423988 536   0   700 32088 2268  4590   6  73  21
11  3  4 111764  63352  11604 321148  16 308   310 36868 2267  5390  12  46  42

root@setup:~ {537} $ uptime
  1:37pm  up 17:41, 10 users,  load average: 7.94, 7.31, 7.18

Under this circumstances, I made the following observations:

a) The system runs stable for more than 18 hours now

b) It seems to behave quite fine, given the load.
   Response time for all services (web-proxy, web-server)
   is reasonable low (you almost don't notice any delay)

c) Interactive programs (Mozilla, StarOffice, JEdit) are
   still quite usable. There is some delay when opening
   a file in SO (say, about 2-3 seconds), but that's fine

d) Sometimes (but not really reproducable) I noticed a
   _big_ delay when connecting to the server using SSH
   (with "big", I mean 1 minute or so). I eventually
   get a connection, and then can work as normal.

e) The server uses a single, but hyperthreaded CPU.
   Hyperthreading is enabled, and Linux shows both
   logical CPU's:

root@setup:~ {529} $ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7
cpu MHz         : 2392.169
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 4771.02

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7
cpu MHz         : 2392.169
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 4771.02

   But interrupt distribution seems a little bit strange:

root@setup:~ {530} $ cat /proc/interrupts
           CPU0       CPU1
  0:    6318080          0    IO-APIC-edge  timer
  1:        967          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  4:      32477          0    IO-APIC-edge  serial
  5:   55629300          0   IO-APIC-level  eth0
  9:   85639064          0   IO-APIC-level  acpi, ioc0, ioc1
 11:          0          0   IO-APIC-level  usb-ohci
 15:          2          0    IO-APIC-edge  ide1
NMI:          0          0
LOC:    6318529    6318527
ERR:          0
MIS:          0

   With 2.4.21-rc6-ac1, interrupts where counted for both
   logical CPU's. Is this a bug or a feature?

HTH

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+3zMOxJmyeGcXPhERAu6CAKCILyOUfPyGaKG8pvbl4droch6B+ACbBNB/
Dw1L/tRv2JSrOHA12B8BaHM=
=rWPF
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-05 12:09 ` Andreas Haumer
@ 2003-06-07 15:46   ` Andreas Haumer
  2003-06-09 10:16     ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
  2003-06-11 20:48     ` Linux 2.4.21-rc7 Marcelo Tosatti
  0 siblings, 2 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-07 15:46 UTC (permalink / raw)
  To: Andreas Haumer; +Cc: Marcelo Tosatti, lkml

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Andreas Haumer wrote:
> Hi!
>
> Marcelo Tosatti wrote:
>
>>Hallo,
>>
>>Now I really hope its the last one, all this rc's are making me mad.
>>
>
> ;-)
>
> So, here's a report on the more positive side...
>
I think, I have to take that back... :-((

> As I mentioned in some e-mails in the last few days,
> I'm currently testing an Asus AP1700-S5 server with
> a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
> 4x36GB U320SCSI drives (3 of them are assembled as RAID5),
> connected via GBit Ethernet to our internal network
>
I had this system running under heavy load for about 24 hours
without problems. I then stopped the stress testing, and had
several system freezes since then.

With system freeze I mean:

*) machine doesn't answer to ping, no reaction to console
   keyboard, no message on the console screen, no message
   in logfile, no oops, no noticeable system activity

I changed several BIOS settings (disabled hyperthreading,
disabled USB, disabled power management) and tried to run
the kernel with "acpi=off" and "noapic".
I also changed root disk, because I found a SCSI error
message in the logs once.

Nothing seems to help. The system just freezes under light
load at some time between 1 and 8 hours uptime.
It's really strange that it survived heavy load for
more than 24 hours in the first place.

I found some problem reports from several people,
which sound quite similar to the freeze I see here.
These people all had motherboards with serverworks
chipset, GBit ethernet and noticed similar lockups
or system freeze symptoms. From the reports I'm not
sure if the problems still persist or if they should
be solved now. Can someone please comment on that?

Here are some infos from the system again:

root@server:~ {505} $ cat /proc/interrupts
           CPU0
  0:     118748    IO-APIC-edge  timer
  1:        274    IO-APIC-edge  keyboard
  2:          0          XT-PIC  cascade
  4:       7011    IO-APIC-edge  serial
  9:    1181037   IO-APIC-level  ioc0, ioc1
 14:       1685   IO-APIC-level  eth0
 15:          2    IO-APIC-edge  ide1
NMI:          0
LOC:     118700
ERR:          0
MIS:          0

root@server:~ {506} $ cat /proc/cmdline
auto BOOT_IMAGE=lx2421rc7 ro root=100 acpi=off

root@server:~ {507} $ uname -a
Linux server 2.4.21-rc7 #1 SMP Wed Jun 4 18:31:15 CEST 2003 i686 unknown

root@server:~ {508} $ lsmod
Module                  Size  Used by    Not tainted
af_packet              13256   1  (autoclean)
e1000                  50028   1  (autoclean)
ext3                   60832   2  (autoclean)
jbd                    40056   2  (autoclean) [ext3]
raid5                  17704   1  (autoclean)
md                     57472   2  (autoclean) [raid5]
xor                     8868   0  (autoclean) [raid5]
unix                   15664  38  (autoclean)
ext2                   33440   4  (autoclean)
sd_mod                 10652  18  (autoclean)
isense                 32404   0  (autoclean) (unused)
mptctl                 19116   0  (autoclean) (unused)
mptscsih               29696   9  (autoclean)
mptbase                32640   5  (autoclean) [isense mptctl mptscsih]
scsi_mod               95748   2  (autoclean) [sd_mod mptscsih]

root@server:~ {511} $ lspci -vvvv
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
        Subsystem: Intel Corp. 82540EM Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (63750ns min), cache line size 08
        Interrupt: pin A routed to IRQ 14
        Region 0: Memory at fd800000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at d800 [size=64]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [e4] PCI-X non-bridge device.
                Command: DPERE- ERO+ RBC=0 OST=0
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-      Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000

00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
        Subsystem: ATI Technologies Inc: Unknown device 8008
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min), cache line size 08
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: I/O ports at d400 [size=256]
        Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at febe0000 [disabled] [size=128K]
        Capabilities: [5c] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
        Subsystem: ServerWorks CSB5 South Bridge
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 32

00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 88 [Master SecP])
        Subsystem: ServerWorks CSB5 IDE Controller
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Region 0: I/O ports at <ignored>
        Region 1: I/O ports at <ignored>
        Region 2: I/O ports at <ignored>
        Region 3: I/O ports at <ignored>
        Region 4: I/O ports at a800 [size=16]

00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
        Subsystem: ServerWorks: Unknown device 0230
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0

00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60]
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60]
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60]
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60]
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
        Subsystem: LSI Logic / Symbios Logic: Unknown device 1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 72 (4250ns min, 4500ns max), cache line size 08
        Interrupt: pin A routed to IRQ 9
        Region 0: I/O ports at a000 [size=256]
        Region 1: Memory at fa000000 (64-bit, non-prefetchable) [size=64K]
        Region 3: Memory at f9800000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fe900000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [68]
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
        Subsystem: LSI Logic / Symbios Logic: Unknown device 1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 72 (4250ns min, 4500ns max), cache line size 08
        Interrupt: pin B routed to IRQ 9
        Region 0: I/O ports at 9800 [size=256]
        Region 1: Memory at f9000000 (64-bit, non-prefetchable) [size=64K]
        Region 3: Memory at f8800000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fe800000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [68]
03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02)
        Subsystem: Intel Corp.: Unknown device 110d
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (63750ns min), cache line size 08
        Interrupt: pin A routed to IRQ 5
        Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at f7800000 (64-bit, non-prefetchable) [size=128K]
        Region 4: I/O ports at 9400 [size=32]
        Expansion ROM at fe7e0000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [e4] PCI-X non-bridge device.
                Command: DPERE- ERO+ RBC=0 OST=0
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-      Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000

Any idea how I should proceed now?
I really could use some help here, I'm running out
of ideas... :-((

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+4gjsxJmyeGcXPhERAsT4AJ9sylkxso5kXO51+6c5bfskVV2meACgrF33
t8xXYpu6FGPsiQ9VBmnk6ek=
=Yov+
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [2.4.21-rc7] AP1700-S5 system freeze :-((
  2003-06-07 15:46   ` Andreas Haumer
@ 2003-06-09 10:16     ` Andreas Haumer
  2003-06-09 11:46       ` Stephan von Krawczynski
  2003-06-11 20:48     ` Linux 2.4.21-rc7 Marcelo Tosatti
  1 sibling, 1 reply; 19+ messages in thread
From: Andreas Haumer @ 2003-06-09 10:16 UTC (permalink / raw)
  To: Andreas Haumer; +Cc: lkml

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Note: I'm reporting this with a different subject line now,
as I got zero replies to my first bugreport. This is still
the same Asus AP1700-S5 server as in my previous reports,
though:

Asus AP1700-S5 server, single Xeon 2.4GHz CPU (FSB533)
512MB registered DDR with ECC, Asus PR-DLS533 motherboard
with ServerWorks GCLE chipset

root@server:~ {535} $ lspci
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
01:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)

Andreas Haumer wrote:
[...]
> I had this system running under heavy load for about 24 hours
> without problems. I then stopped the stress testing, and had
> several system freezes since then.
>
> With system freeze I mean:
>
> *) machine doesn't answer to ping, no reaction to console
>    keyboard, no message on the console screen, no message
>    in logfile, no oops, no noticeable system activity
>
I just had another freeze or lockup of this system,
after 1 day and 14 hours uptime. :-(

This time the machine was running with an 3Com 3c905c
100MBit NIC, with the onboard e1000 GBit controllers disabled.
Obviously, this didn't help, too...

When I noticed the freeze, I tried to ping the server,
and got a few replies back, but with a delay of more than
60 seconds! I didn't wait that long when I tried to ping
the server on the previous lockups, so maybe the "no answer
to ping" symptom I described is more a "big delay in
answering ping packets" symptom. Does that ring any bell?

Any idea anyone?

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+5F6HxJmyeGcXPhERApOfAJ4klAsR0lA8Zzk5s22quImzxud6agCgvAi1
FXZuNQV3C4UaKVi9gOvtJFM=
=qL4B
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2.4.21-rc7] AP1700-S5 system freeze :-((
  2003-06-09 10:16     ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
@ 2003-06-09 11:46       ` Stephan von Krawczynski
  2003-06-09 12:21         ` Andreas Haumer
  0 siblings, 1 reply; 19+ messages in thread
From: Stephan von Krawczynski @ 2003-06-09 11:46 UTC (permalink / raw)
  To: Andreas Haumer; +Cc: linux-kernel

Hello Andreas,

I am not quite sure if you are experiencing something similar to my problem.
Fact is this:

I have a serverworks based dual PIII board and I am experiencing freezes just
about every day. 

Equal setups:

Kernel 2.4.21-rc7
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)

Lockups during light load


Differing:

Just about everything else:
                       yours:            mine:
Storage System:        Symbios           AIC
VGA           :        ATI Rage XL       ATI Radeon RV200
Network       :        Intel/3com        Intel/Broadcom
Processor     :        Xeon UP           PIII SMP


I could already produce oops-messages on the problem and mine all come up in
kmem_cache_alloc_batch. It would be interesting where your box freezes. It
cannot be at this same place, because the code is not there in UP.
Try this (in case you are not working in front of the box):

Start box and switch to text console, enter "setterm -blank 0" to disable
screen blanker. Wait for oops. If we are lucky you will see something, get a
pencil then :-)

-- 
Regards,
Stephan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2.4.21-rc7] AP1700-S5 system freeze :-((
  2003-06-09 11:46       ` Stephan von Krawczynski
@ 2003-06-09 12:21         ` Andreas Haumer
  0 siblings, 0 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-09 12:21 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Many thanks for your reply!

Stephan von Krawczynski wrote:
> Hello Andreas,
>
> I am not quite sure if you are experiencing something similar to my problem.
> Fact is this:
>
> I have a serverworks based dual PIII board and I am experiencing freezes just
> about every day.
>
> Equal setups:
>
> Kernel 2.4.21-rc7
> 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31)
> 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
>
> Lockups during light load
>
Me too.
I had it running for 24 hours with heavy stress testing
and a load above 7 all the time without problems. I then
stopped this test, and the box locked up 2 hours later,
and locked up about 7 or 8 times in the past few days :-(

>
> Differing:
>
> Just about everything else:
>                        yours:            mine:
> Storage System:        Symbios           AIC

This is not a "normal" symbios logic "sym53c8xx"
storage controller, but a "Symbios Logic 53c1030",
which uses the Fusion MPT driver. This is the first
time I'm running this driver, so I don't know if it's
considered stable (but I guess so)
Unfortunately I can't replace it as I don't have any
spare SCSI controller which fits right now.

> VGA           :        ATI Rage XL       ATI Radeon RV200
> Network       :        Intel/3com        Intel/Broadcom
> Processor     :        Xeon UP           PIII SMP
>
>
> I could already produce oops-messages on the problem and mine all come up in
> kmem_cache_alloc_batch. It would be interesting where your box freezes. It
> cannot be at this same place, because the code is not there in UP.
> Try this (in case you are not working in front of the box):
>
> Start box and switch to text console, enter "setterm -blank 0" to disable
> screen blanker. Wait for oops. If we are lucky you will see something, get a
> pencil then :-)
>
I always have the system running with text console and
screen blanking disabled. Alas, I see no oops :-(

IMHO it doesn't look like the kernel crashes with an oops,
it does look more like it suddenly goes into an endless
loop or ridiculously high load somehow.
Last time I hade this freeze, I noticed that the system
answered my ICMP ping messages with a delay of more than
60 seconds. This looked like the system was very busy
at that time.

I'm now running with 2.4.20rc2, and also have syslog
routed to another system on the network. We'll see if
I can get any more information out of this.

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+5HvjxJmyeGcXPhERAvOvAJ94cQS4tlzylHiVU084v7FK/e/aowCgw4w9
M3YWSHXzx9IuKeU4Z6WicEk=
=8102
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
  2003-06-07 15:46   ` Andreas Haumer
  2003-06-09 10:16     ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
@ 2003-06-11 20:48     ` Marcelo Tosatti
       [not found]       ` <1055408183.2552.18.camel@tor.trudheim.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-11 20:48 UTC (permalink / raw)
  To: Andreas Haumer; +Cc: lkml



On Sat, 7 Jun 2003, Andreas Haumer wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi!
>
> Andreas Haumer wrote:
> > Hi!
> >
> > Marcelo Tosatti wrote:
> >
> >>Hallo,
> >>
> >>Now I really hope its the last one, all this rc's are making me mad.
> >>
> >
> > ;-)
> >
> > So, here's a report on the more positive side...
> >
> I think, I have to take that back... :-((
>
> > As I mentioned in some e-mails in the last few days,
> > I'm currently testing an Asus AP1700-S5 server with
> > a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
> > 4x36GB U320SCSI drives (3 of them are assembled as RAID5),
> > connected via GBit Ethernet to our internal network
> >
> I had this system running under heavy load for about 24 hours
> without problems. I then stopped the stress testing, and had
> several system freezes since then.
>
> With system freeze I mean:
>
> *) machine doesn't answer to ping, no reaction to console
>    keyboard, no message on the console screen, no message
>    in logfile, no oops, no noticeable system activity

Maybe the NMI oopser helps?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Linux 2.4.21-rc7
       [not found]       ` <1055408183.2552.18.camel@tor.trudheim.com>
@ 2003-06-12  9:35         ` Andreas Haumer
  0 siblings, 0 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-12  9:35 UTC (permalink / raw)
  To: Anders Karlsson; +Cc: Marcelo Tosatti, Linux Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Anders Karlsson wrote:
> On Wed, 2003-06-11 at 21:48, Marcelo Tosatti wrote:
>
>>On Sat, 7 Jun 2003, Andreas Haumer wrote:
>
> [snip]
>
>>>I had this system running under heavy load for about 24 hours
>>>without problems. I then stopped the stress testing, and had
>>>several system freezes since then.
>>>
>>>With system freeze I mean:
>>>
>>>*) machine doesn't answer to ping, no reaction to console
>>>   keyboard, no message on the console screen, no message
>>>   in logfile, no oops, no noticeable system activity
>
>
> I have this problem without actually stressing the machine too hard. The
> average load on my Thinkpad over a weekend would perhaps be 0.05, yet I
> can have several hard hangs where there seems to be no trace of a hang
> at all in logfiles.
>
I have to admit that "system freeze" is a quite unspecific
symptom. It could have a zillion of different reasons.

In my case I'm currently chasing SCSI errors which I think
could have something to do with it (besides, it's _not_ an Adaptec
controller, but a LSI 53c1030 with Fusion MPT driver... :-)

In my server logs I sometimes see SCSI timeouts like this:

[...]
scsi : aborting command due to timeout : pid 1148093, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 00 0f af 00 00 10 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=dfca8e00)
  IOs outstanding = 3
mptscsih: ioc0: Issue of TaskMgmt Successful!
SCSI host 0 abort (pid 1148093) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
mptscsih: OldReset scheduling BUS_RESET (sc=dfca8e00)
  IOs outstanding = 4
SCSI Error Report =-=-= (0:0:0)
  SCSI_Status=02h (CHECK CONDITION)
  Original_CDB[]: 2A 00 00 3C 4D 78 00 00 02 00 - "WRITE(10)"
  SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
  SenseKey=6h (UNIT ATTENTION); FRU=00h
  ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:1:0)
  SCSI_Status=02h (CHECK CONDITION)
  Original_CDB[]: 28 00 00 00 0F AF 00 00 10 00 - "READ(10)"
  SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
  SenseKey=6h (UNIT ATTENTION); FRU=00h
  ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:2:0)
  SCSI_Status=02h (CHECK CONDITION)
  Original_CDB[]: 28 00 00 4E 0A 37 00 00 08 00 - "READ(10)"
  SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
  SenseKey=6h (UNIT ATTENTION); FRU=00h
  ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:3:0)
  SCSI_Status=02h (CHECK CONDITION)
  Original_CDB[]: 28 00 03 B0 08 6F 00 00 08 00 - "READ(10)"
  SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
  SenseKey=6h (UNIT ATTENTION); FRU=00h
  ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
[...]

There are 4 hot swap SCSI disks in the server, and all of them
eventually report those timeouts (so it's not specific to a single
disk)
I already replaced cabling, tried a different hot swap (SCA)
cage, and I'm now trying to replace the disks one by one to
eventually find the culprit.

There are two problems with this approach:

1.) After each change I have to wait several hours up to two
    days for a SCSI timeout to occur as I can not reproduce
    the problem at will.

2.) I'm not _sure_ if those SCSI timeouts are related to the server
    freeze symptoms I see. It's just an assumption.
    IMHO it could work as follows: SCSI timeouts occure somtimes.
    The driver then aborts the command and resets the SCSI bus
    to get it into a sane state again. But what if the bus reset
    doesn't work as expected and the bus remains unusable for a
    while? Could this bring the whole system into this "freeze"
    state (the system is still running, but everything waits for
    the SCSI bus to recover)? Could this explain the symptom of
    those big delays of ICMP ping answer messages I saw?

So the most precious resource for chasing this problem is time,
and this is also the resource which I don't have available as
much as I'd like to... :-(

>
>>Maybe the NMI oopser helps?
>
>
> Marcelo, where can I get hold of this and would there be documentation
> included with it for how to install/use it?
>
Look at /usr/src/linux/Documentation/nmi_watchdog.txt

Regards,

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+6El7xJmyeGcXPhERAqykAKCumORTm/lDofkrg52FX33rOfgC/ACeNxR7
l9/znrbi0lZoR/zw+LTdNhI=
=W7Gt
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-06-12  9:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
2003-06-03 18:07   ` Marcelo Tosatti
2003-06-03 19:15     ` lk
2003-06-03 19:40       ` Alan Cox
2003-06-03 18:30 ` Alex Romosan
2003-06-03 19:27   ` Jeff Garzik
2003-06-03 19:58     ` Alex Romosan
2003-06-03 20:14       ` Tom Rini
2003-06-04  3:35         ` David S. Miller
2003-06-04 15:09           ` Mr. James W. Laferriere
2003-06-04 23:37           ` Alex Romosan
2003-06-05 12:09 ` Andreas Haumer
2003-06-07 15:46   ` Andreas Haumer
2003-06-09 10:16     ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
2003-06-09 11:46       ` Stephan von Krawczynski
2003-06-09 12:21         ` Andreas Haumer
2003-06-11 20:48     ` Linux 2.4.21-rc7 Marcelo Tosatti
     [not found]       ` <1055408183.2552.18.camel@tor.trudheim.com>
2003-06-12  9:35         ` Andreas Haumer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).