* Linux 2.4.21-rc7
@ 2003-06-03 17:04 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-03 17:04 UTC (permalink / raw)
To: lkml
Hallo,
Now I really hope its the last one, all this rc's are making me mad.
Ok, here it is.
Summary of changes from v2.4.21-rc6 to v2.4.21-rc7
============================================
<ehabkost@conectiva.com.br>:
o [SPARC]: Export phys_base on sparc32
<jgarzik@pobox.com>:
o fix olympic driver build
<lethal@linux-sh.org>:
o Fix Solution Engine 7751 Build
o Define VM_DATA_DEFAULT_FLAGS for SH
<wesolows@foobazco.org>:
o [sparc]: Attempt mul/div emulation handling on all cpus
David S. Miller <davem@nuts.ninka.net>:
o [SPARC]: Fix sys_ipc to return ENOSYS instead of EINVAL as appropriate
o [SPARC64]: Implement dump_stack in 2.4.x
o [SPARC64]: Only use power interrupt when button property exists
o [IPV4/IPV6]: Use Jenkins hash for fragment reassembly handling
o [IPV6]: Input full addresses into TCP_SYNQ hash function
o [IPV4]: Add sysctl to control ipfrag_secret_interval
o [SPARC64]: Fix probe error handling in envctrl.c driver
o [SPARC64]: Fix probe error handling in bbc_{envctrl,i2c}.c driver
o [SPARC64]: Fix exploitable holes and bugs in ioctl32 translations
Douglas Gilbert <dougg@torque.net>:
o sg: Fix side effect introduced by last "off by one" fix
Eric Brower <ebrower@usa.net>:
o [SPARC]: Refactor AUXIO support
Marcelo Tosatti <marcelo@freak.distro.conectiva>:
o Changed EXTRAVERSION to -rc7
Pete Zaitcev <zaitcev@redhat.com>:
o [sparc] Force type in __put_user
o [SPARC]: Fix gcc-3.x builds
Rob Radez <rob@osinvestor.com>:
o [sparc]: Fix uninitialized spinlock in SRMMU code
o [SPARC]: Kill initialize_secondary, unused
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
@ 2003-06-03 18:02 ` Tomas Szepe
2003-06-03 18:07 ` Marcelo Tosatti
2003-06-03 18:30 ` Alex Romosan
2003-06-05 12:09 ` Andreas Haumer
2 siblings, 1 reply; 19+ messages in thread
From: Tomas Szepe @ 2003-06-03 18:02 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml, alan
> [marcelo@conectiva.com.br]
>
> Now I really hope its the last one, all this rc's are making me mad.
Are you quite sure you don't want Alan to get you the updates necessary
for IDE to build as modules for .21 final?
--
Tomas Szepe <szepe@pinerecords.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 18:02 ` Tomas Szepe
@ 2003-06-03 18:07 ` Marcelo Tosatti
2003-06-03 19:15 ` lk
0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-03 18:07 UTC (permalink / raw)
To: Tomas Szepe; +Cc: lkml, alan
On Tue, 3 Jun 2003, Tomas Szepe wrote:
> > [marcelo@conectiva.com.br]
> >
> > Now I really hope its the last one, all this rc's are making me mad.
>
> Are you quite sure you don't want Alan to get you the updates necessary
> for IDE to build as modules for .21 final?
Well, I can for sure release -rc8 with that.
I just want this possible -rc8 to be released no later than tonight.
Alan?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
@ 2003-06-03 18:30 ` Alex Romosan
2003-06-03 19:27 ` Jeff Garzik
2003-06-05 12:09 ` Andreas Haumer
2 siblings, 1 reply; 19+ messages in thread
From: Alex Romosan @ 2003-06-03 18:30 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml
Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> Now I really hope its the last one, all this rc's are making me mad.
i still can't get it to compile for sparc32:
gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
/usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `s' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_to_user':
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `d' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `l' conflicts with asm clobber list
/usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `s' conflicts with asm clobber list
make[3]: *** [ksyms.o] Error 1
make[3]: Leaving directory `/usr/src/linux/kernel'
make[2]: *** [first_rule] Error 2
make[2]: Leaving directory `/usr/src/linux/kernel'
make[1]: *** [_dir_kernel] Error 2
make[1]: Leaving directory `/usr/src/linux'
make: *** [stamp-build] Error 2
not sure when this started. the last kernel i managed to compile was
rc2 (skipped rc3 and rc4, rc5 didn't compile). the last one that will
boot was 2.4.21-pre1. this is on a sun4m Fujitsu TurboSparc.
--alex--
--
| I believe the moment is at hand when, by a paranoiac and active |
| advance of the mind, it will be possible (simultaneously with |
| automatism and other passive states) to systematize confusion |
| and thus to help to discredit completely the world of reality. |
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 18:07 ` Marcelo Tosatti
@ 2003-06-03 19:15 ` lk
2003-06-03 19:40 ` Alan Cox
0 siblings, 1 reply; 19+ messages in thread
From: lk @ 2003-06-03 19:15 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml
> > > Now I really hope its the last one, all this rc's are making me mad.
> >
> > Are you quite sure you don't want Alan to get you the updates necessary
> > for IDE to build as modules for .21 final?
>
> Well, I can for sure release -rc8 with that.
>
> I just want this possible -rc8 to be released no later than tonight.
Unfortunately I just committed my test box to production and can't test
Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to
include them in -rc8 as well.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 18:30 ` Alex Romosan
@ 2003-06-03 19:27 ` Jeff Garzik
2003-06-03 19:58 ` Alex Romosan
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2003-06-03 19:27 UTC (permalink / raw)
To: Alex Romosan; +Cc: Marcelo Tosatti, lkml
On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
>
> > Now I really hope its the last one, all this rc's are making me mad.
>
> i still can't get it to compile for sparc32:
>
> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c
> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
That looks like you either need a different compiler version,
or different binutils version...
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 19:15 ` lk
@ 2003-06-03 19:40 ` Alan Cox
0 siblings, 0 replies; 19+ messages in thread
From: Alan Cox @ 2003-06-03 19:40 UTC (permalink / raw)
To: lk; +Cc: Marcelo Tosatti, lkml
On Maw, 2003-06-03 at 20:15, lk@trolloc.com wrote:
> Unfortunately I just committed my test box to production and can't test
> Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to
> include them in -rc8 as well.
You could add the dma autoenable but the rest should be avoided
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 19:27 ` Jeff Garzik
@ 2003-06-03 19:58 ` Alex Romosan
2003-06-03 20:14 ` Tom Rini
0 siblings, 1 reply; 19+ messages in thread
From: Alex Romosan @ 2003-06-03 19:58 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Marcelo Tosatti, lkml
Jeff Garzik <jgarzik@pobox.com> writes:
> On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
>> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
>>
>> > Now I really hope its the last one, all this rc's are making me mad.
>>
>> i still can't get it to compile for sparc32:
>>
>> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c
>> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
>> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
>> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
>> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
>
> That looks like you either need a different compiler version,
> or different binutils version...
gcc (GCC) 3.3 (Debian)
GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
the same versions work on i386 though...
--alex--
--
| I believe the moment is at hand when, by a paranoiac and active |
| advance of the mind, it will be possible (simultaneously with |
| automatism and other passive states) to systematize confusion |
| and thus to help to discredit completely the world of reality. |
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 19:58 ` Alex Romosan
@ 2003-06-03 20:14 ` Tom Rini
2003-06-04 3:35 ` David S. Miller
0 siblings, 1 reply; 19+ messages in thread
From: Tom Rini @ 2003-06-03 20:14 UTC (permalink / raw)
To: Alex Romosan; +Cc: Jeff Garzik, Marcelo Tosatti, lkml
On Tue, Jun 03, 2003 at 12:58:40PM -0700, Alex Romosan wrote:
> Jeff Garzik <jgarzik@pobox.com> writes:
>
> > On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote:
> >> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> >>
> >> > Now I really hope its the last one, all this rc's are making me mad.
> >>
> >> i still can't get it to compile for sparc32:
> >>
> >> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c
> >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck':
> >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list
> >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list
> >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user':
> >
> > That looks like you either need a different compiler version,
> > or different binutils version...
>
> gcc (GCC) 3.3 (Debian)
> GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
That would do it.
> the same versions work on i386 though...
Yes, but i386 either didn't have now invalid clober lists, or they were
fixed in the -pre portion (like it was on PPC32 as well).
--
Tom Rini
http://gate.crashing.org/~trini/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 20:14 ` Tom Rini
@ 2003-06-04 3:35 ` David S. Miller
2003-06-04 15:09 ` Mr. James W. Laferriere
2003-06-04 23:37 ` Alex Romosan
0 siblings, 2 replies; 19+ messages in thread
From: David S. Miller @ 2003-06-04 3:35 UTC (permalink / raw)
To: Tom Rini; +Cc: Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml
On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
> > gcc (GCC) 3.3 (Debian)
> > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
>
> That would do it.
I don't trust anything past gcc-3.2.x on sparc and sparc64.
Use 3.3.x and later at your own peril.
--
David S. Miller <davem@redhat.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-04 3:35 ` David S. Miller
@ 2003-06-04 15:09 ` Mr. James W. Laferriere
2003-06-04 23:37 ` Alex Romosan
1 sibling, 0 replies; 19+ messages in thread
From: Mr. James W. Laferriere @ 2003-06-04 15:09 UTC (permalink / raw)
To: David S. Miller
Cc: Tom Rini, Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml
Hello Dave , Thank you for the warning . Now how about why
laymans style ? Tia , JimL
On Tue, 3 Jun 2003, David S. Miller wrote:
> On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
> > > gcc (GCC) 3.3 (Debian)
> > > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
> > That would do it.
> I don't trust anything past gcc-3.2.x on sparc and sparc64.
> Use 3.3.x and later at your own peril.
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-04 3:35 ` David S. Miller
2003-06-04 15:09 ` Mr. James W. Laferriere
@ 2003-06-04 23:37 ` Alex Romosan
1 sibling, 0 replies; 19+ messages in thread
From: Alex Romosan @ 2003-06-04 23:37 UTC (permalink / raw)
To: David S. Miller; +Cc: Tom Rini, Jeff Garzik, Marcelo Tosatti, lkml
"David S. Miller" <davem@redhat.com> writes:
> On Tue, 2003-06-03 at 13:14, Tom Rini wrote:
>> > gcc (GCC) 3.3 (Debian)
>> > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux
>>
>> That would do it.
>
> I don't trust anything past gcc-3.2.x on sparc and sparc64.
> Use 3.3.x and later at your own peril.
recompiled with gcc-3.2.3 and the kernel not only compiled but also
booted. thank you.
--alex--
--
| I believe the moment is at hand when, by a paranoiac and active |
| advance of the mind, it will be possible (simultaneously with |
| automatism and other passive states) to systematize confusion |
| and thus to help to discredit completely the world of reality. |
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
2003-06-03 18:30 ` Alex Romosan
@ 2003-06-05 12:09 ` Andreas Haumer
2003-06-07 15:46 ` Andreas Haumer
2 siblings, 1 reply; 19+ messages in thread
From: Andreas Haumer @ 2003-06-05 12:09 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: lkml
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi!
Marcelo Tosatti wrote:
> Hallo,
>
> Now I really hope its the last one, all this rc's are making me mad.
>
;-)
So, here's a report on the more positive side...
As I mentioned in some e-mails in the last few days,
I'm currently testing an Asus AP1700-S5 server with
a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
4x36GB U320SCSI drives (3 of them are assembled as RAID5),
connected via GBit Ethernet to our internal network
root@setup:~ {533} $ lspci
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02)
root@setup:~ {538} $ uptime
2:05pm up 18:09, 11 users, load average: 8.03, 8.45, 8.15
This system is running 2.4.21-rc7 for more than 18 hours
now with the following load:
*) an endless loop to create and remove a large file on the
RAID5 (ext3 filesystem):
while true; do time dd if /dev/zero of /var/tmp/largefile bs 1M count 2000 ; rm -f /var/tmp/largefile; done
*) some commands to create additional load:
cd /
find . boot/ usr/ tmp/ opt/ var/ -xdev -type f -exec md5sum {} \;
*) NFS copy of a whole 40GB filesystem tree from a Linux NFS server
to the RAID5 (in a loop)
*) the system is also NFS serving a Linux NFS client, which
copies the whole server filesystem into /dev/null
*) Additionally, I have the following programs running:
- Squid (currently used as proxy for our internal web browsers)
- Apache
- jedit (with j2sdk-1.4.1_01)
- StarOffice-5.2
- Mozilla-1.3.1
- and lots of additional programs (shell, sshd, emacs), but
no X server (we are using Linux workstations as X-Terminals)
All in all, there are more than 190 processes at any point in
time in the past 18 hours.
This all produces a permanent load between 7 and 9
vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 4 4 111720 3220 11344 423820 0 0 4 18976 4892 4273 2 68 30
0 4 3 111720 3204 11352 423728 32 0 80 25216 1460 2095 0 15 85
0 4 3 111716 3332 11352 423364 76 0 92 25796 1432 1895 2 14 84
0 4 3 111716 3208 11372 423392 48 0 712 26336 1566 2346 4 14 81
0 6 3 111716 3208 11412 423196 132 0 420 32820 1774 3113 12 19 69
0 5 3 111716 3376 11440 422340 704 0 924 24444 1570 2811 3 17 79
6 2 4 111716 2328 11560 423988 536 0 700 32088 2268 4590 6 73 21
11 3 4 111764 63352 11604 321148 16 308 310 36868 2267 5390 12 46 42
root@setup:~ {537} $ uptime
1:37pm up 17:41, 10 users, load average: 7.94, 7.31, 7.18
Under this circumstances, I made the following observations:
a) The system runs stable for more than 18 hours now
b) It seems to behave quite fine, given the load.
Response time for all services (web-proxy, web-server)
is reasonable low (you almost don't notice any delay)
c) Interactive programs (Mozilla, StarOffice, JEdit) are
still quite usable. There is some delay when opening
a file in SO (say, about 2-3 seconds), but that's fine
d) Sometimes (but not really reproducable) I noticed a
_big_ delay when connecting to the server using SSH
(with "big", I mean 1 minute or so). I eventually
get a connection, and then can work as normal.
e) The server uses a single, but hyperthreaded CPU.
Hyperthreading is enabled, and Linux shows both
logical CPU's:
root@setup:~ {529} $ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 7
cpu MHz : 2392.169
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4771.02
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 7
cpu MHz : 2392.169
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4771.02
But interrupt distribution seems a little bit strange:
root@setup:~ {530} $ cat /proc/interrupts
CPU0 CPU1
0: 6318080 0 IO-APIC-edge timer
1: 967 0 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
4: 32477 0 IO-APIC-edge serial
5: 55629300 0 IO-APIC-level eth0
9: 85639064 0 IO-APIC-level acpi, ioc0, ioc1
11: 0 0 IO-APIC-level usb-ohci
15: 2 0 IO-APIC-edge ide1
NMI: 0 0
LOC: 6318529 6318527
ERR: 0
MIS: 0
With 2.4.21-rc6-ac1, interrupts where counted for both
logical CPU's. Is this a bug or a feature?
HTH
- - andreas
- --
Andreas Haumer | mailto:andreas@xss.co.at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+3zMOxJmyeGcXPhERAu6CAKCILyOUfPyGaKG8pvbl4droch6B+ACbBNB/
Dw1L/tRv2JSrOHA12B8BaHM=
=rWPF
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-05 12:09 ` Andreas Haumer
@ 2003-06-07 15:46 ` Andreas Haumer
2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti
0 siblings, 2 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-07 15:46 UTC (permalink / raw)
To: Andreas Haumer; +Cc: Marcelo Tosatti, lkml
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi!
Andreas Haumer wrote:
> Hi!
>
> Marcelo Tosatti wrote:
>
>>Hallo,
>>
>>Now I really hope its the last one, all this rc's are making me mad.
>>
>
> ;-)
>
> So, here's a report on the more positive side...
>
I think, I have to take that back... :-((
> As I mentioned in some e-mails in the last few days,
> I'm currently testing an Asus AP1700-S5 server with
> a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
> 4x36GB U320SCSI drives (3 of them are assembled as RAID5),
> connected via GBit Ethernet to our internal network
>
I had this system running under heavy load for about 24 hours
without problems. I then stopped the stress testing, and had
several system freezes since then.
With system freeze I mean:
*) machine doesn't answer to ping, no reaction to console
keyboard, no message on the console screen, no message
in logfile, no oops, no noticeable system activity
I changed several BIOS settings (disabled hyperthreading,
disabled USB, disabled power management) and tried to run
the kernel with "acpi=off" and "noapic".
I also changed root disk, because I found a SCSI error
message in the logs once.
Nothing seems to help. The system just freezes under light
load at some time between 1 and 8 hours uptime.
It's really strange that it survived heavy load for
more than 24 hours in the first place.
I found some problem reports from several people,
which sound quite similar to the freeze I see here.
These people all had motherboards with serverworks
chipset, GBit ethernet and noticed similar lockups
or system freeze symptoms. From the reports I'm not
sure if the problems still persist or if they should
be solved now. Can someone please comment on that?
Here are some infos from the system again:
root@server:~ {505} $ cat /proc/interrupts
CPU0
0: 118748 IO-APIC-edge timer
1: 274 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
4: 7011 IO-APIC-edge serial
9: 1181037 IO-APIC-level ioc0, ioc1
14: 1685 IO-APIC-level eth0
15: 2 IO-APIC-edge ide1
NMI: 0
LOC: 118700
ERR: 0
MIS: 0
root@server:~ {506} $ cat /proc/cmdline
auto BOOT_IMAGE=lx2421rc7 ro root=100 acpi=off
root@server:~ {507} $ uname -a
Linux server 2.4.21-rc7 #1 SMP Wed Jun 4 18:31:15 CEST 2003 i686 unknown
root@server:~ {508} $ lsmod
Module Size Used by Not tainted
af_packet 13256 1 (autoclean)
e1000 50028 1 (autoclean)
ext3 60832 2 (autoclean)
jbd 40056 2 (autoclean) [ext3]
raid5 17704 1 (autoclean)
md 57472 2 (autoclean) [raid5]
xor 8868 0 (autoclean) [raid5]
unix 15664 38 (autoclean)
ext2 33440 4 (autoclean)
sd_mod 10652 18 (autoclean)
isense 32404 0 (autoclean) (unused)
mptctl 19116 0 (autoclean) (unused)
mptscsih 29696 9 (autoclean)
mptbase 32640 5 (autoclean) [isense mptctl mptscsih]
scsi_mod 95748 2 (autoclean) [sd_mod mptscsih]
root@server:~ {511} $ lspci -vvvv
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
Subsystem: Intel Corp. 82540EM Gigabit Ethernet Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (63750ns min), cache line size 08
Interrupt: pin A routed to IRQ 14
Region 0: Memory at fd800000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at d800 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
Subsystem: ATI Technologies Inc: Unknown device 8008
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Region 1: I/O ports at d400 [size=256]
Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at febe0000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
Subsystem: ServerWorks CSB5 South Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 32
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 88 [Master SecP])
Subsystem: ServerWorks CSB5 IDE Controller
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Region 0: I/O ports at <ignored>
Region 1: I/O ports at <ignored>
Region 2: I/O ports at <ignored>
Region 3: I/O ports at <ignored>
Region 4: I/O ports at a800 [size=16]
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
Subsystem: ServerWorks: Unknown device 0230
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60]
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60]
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60]
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60]
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
Subsystem: LSI Logic / Symbios Logic: Unknown device 1000
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 72 (4250ns min, 4500ns max), cache line size 08
Interrupt: pin A routed to IRQ 9
Region 0: I/O ports at a000 [size=256]
Region 1: Memory at fa000000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at f9800000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fe900000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [68]
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
Subsystem: LSI Logic / Symbios Logic: Unknown device 1000
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 72 (4250ns min, 4500ns max), cache line size 08
Interrupt: pin B routed to IRQ 9
Region 0: I/O ports at 9800 [size=256]
Region 1: Memory at f9000000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at f8800000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fe800000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [68]
03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02)
Subsystem: Intel Corp.: Unknown device 110d
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (63750ns min), cache line size 08
Interrupt: pin A routed to IRQ 5
Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at f7800000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at 9400 [size=32]
Expansion ROM at fe7e0000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Any idea how I should proceed now?
I really could use some help here, I'm running out
of ideas... :-((
- - andreas
- --
Andreas Haumer | mailto:andreas@xss.co.at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+4gjsxJmyeGcXPhERAsT4AJ9sylkxso5kXO51+6c5bfskVV2meACgrF33
t8xXYpu6FGPsiQ9VBmnk6ek=
=Yov+
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
* [2.4.21-rc7] AP1700-S5 system freeze :-((
2003-06-07 15:46 ` Andreas Haumer
@ 2003-06-09 10:16 ` Andreas Haumer
2003-06-09 11:46 ` Stephan von Krawczynski
2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti
1 sibling, 1 reply; 19+ messages in thread
From: Andreas Haumer @ 2003-06-09 10:16 UTC (permalink / raw)
To: Andreas Haumer; +Cc: lkml
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi!
Note: I'm reporting this with a different subject line now,
as I got zero replies to my first bugreport. This is still
the same Asus AP1700-S5 server as in my previous reports,
though:
Asus AP1700-S5 server, single Xeon 2.4GHz CPU (FSB533)
512MB registered DDR with ECC, Asus PR-DLS533 motherboard
with ServerWorks GCLE chipset
root@server:~ {535} $ lspci
00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31)
00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge
00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.3 Host bridge: ServerWorks GCLE Host Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
01:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
Andreas Haumer wrote:
[...]
> I had this system running under heavy load for about 24 hours
> without problems. I then stopped the stress testing, and had
> several system freezes since then.
>
> With system freeze I mean:
>
> *) machine doesn't answer to ping, no reaction to console
> keyboard, no message on the console screen, no message
> in logfile, no oops, no noticeable system activity
>
I just had another freeze or lockup of this system,
after 1 day and 14 hours uptime. :-(
This time the machine was running with an 3Com 3c905c
100MBit NIC, with the onboard e1000 GBit controllers disabled.
Obviously, this didn't help, too...
When I noticed the freeze, I tried to ping the server,
and got a few replies back, but with a delay of more than
60 seconds! I didn't wait that long when I tried to ping
the server on the previous lockups, so maybe the "no answer
to ping" symptom I described is more a "big delay in
answering ping packets" symptom. Does that ring any bell?
Any idea anyone?
- - andreas
- --
Andreas Haumer | mailto:andreas@xss.co.at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+5F6HxJmyeGcXPhERApOfAJ4klAsR0lA8Zzk5s22quImzxud6agCgvAi1
FXZuNQV3C4UaKVi9gOvtJFM=
=qL4B
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [2.4.21-rc7] AP1700-S5 system freeze :-((
2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
@ 2003-06-09 11:46 ` Stephan von Krawczynski
2003-06-09 12:21 ` Andreas Haumer
0 siblings, 1 reply; 19+ messages in thread
From: Stephan von Krawczynski @ 2003-06-09 11:46 UTC (permalink / raw)
To: Andreas Haumer; +Cc: linux-kernel
Hello Andreas,
I am not quite sure if you are experiencing something similar to my problem.
Fact is this:
I have a serverworks based dual PIII board and I am experiencing freezes just
about every day.
Equal setups:
Kernel 2.4.21-rc7
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
Lockups during light load
Differing:
Just about everything else:
yours: mine:
Storage System: Symbios AIC
VGA : ATI Rage XL ATI Radeon RV200
Network : Intel/3com Intel/Broadcom
Processor : Xeon UP PIII SMP
I could already produce oops-messages on the problem and mine all come up in
kmem_cache_alloc_batch. It would be interesting where your box freezes. It
cannot be at this same place, because the code is not there in UP.
Try this (in case you are not working in front of the box):
Start box and switch to text console, enter "setterm -blank 0" to disable
screen blanker. Wait for oops. If we are lucky you will see something, get a
pencil then :-)
--
Regards,
Stephan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [2.4.21-rc7] AP1700-S5 system freeze :-((
2003-06-09 11:46 ` Stephan von Krawczynski
@ 2003-06-09 12:21 ` Andreas Haumer
0 siblings, 0 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-09 12:21 UTC (permalink / raw)
To: Stephan von Krawczynski; +Cc: linux-kernel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi!
Many thanks for your reply!
Stephan von Krawczynski wrote:
> Hello Andreas,
>
> I am not quite sure if you are experiencing something similar to my problem.
> Fact is this:
>
> I have a serverworks based dual PIII board and I am experiencing freezes just
> about every day.
>
> Equal setups:
>
> Kernel 2.4.21-rc7
> 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31)
> 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
>
> Lockups during light load
>
Me too.
I had it running for 24 hours with heavy stress testing
and a load above 7 all the time without problems. I then
stopped this test, and the box locked up 2 hours later,
and locked up about 7 or 8 times in the past few days :-(
>
> Differing:
>
> Just about everything else:
> yours: mine:
> Storage System: Symbios AIC
This is not a "normal" symbios logic "sym53c8xx"
storage controller, but a "Symbios Logic 53c1030",
which uses the Fusion MPT driver. This is the first
time I'm running this driver, so I don't know if it's
considered stable (but I guess so)
Unfortunately I can't replace it as I don't have any
spare SCSI controller which fits right now.
> VGA : ATI Rage XL ATI Radeon RV200
> Network : Intel/3com Intel/Broadcom
> Processor : Xeon UP PIII SMP
>
>
> I could already produce oops-messages on the problem and mine all come up in
> kmem_cache_alloc_batch. It would be interesting where your box freezes. It
> cannot be at this same place, because the code is not there in UP.
> Try this (in case you are not working in front of the box):
>
> Start box and switch to text console, enter "setterm -blank 0" to disable
> screen blanker. Wait for oops. If we are lucky you will see something, get a
> pencil then :-)
>
I always have the system running with text console and
screen blanking disabled. Alas, I see no oops :-(
IMHO it doesn't look like the kernel crashes with an oops,
it does look more like it suddenly goes into an endless
loop or ridiculously high load somehow.
Last time I hade this freeze, I noticed that the system
answered my ICMP ping messages with a delay of more than
60 seconds. This looked like the system was very busy
at that time.
I'm now running with 2.4.20rc2, and also have syslog
routed to another system on the network. We'll see if
I can get any more information out of this.
- - andreas
- --
Andreas Haumer | mailto:andreas@xss.co.at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+5HvjxJmyeGcXPhERAvOvAJ94cQS4tlzylHiVU084v7FK/e/aowCgw4w9
M3YWSHXzx9IuKeU4Z6WicEk=
=8102
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
2003-06-07 15:46 ` Andreas Haumer
2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
@ 2003-06-11 20:48 ` Marcelo Tosatti
[not found] ` <1055408183.2552.18.camel@tor.trudheim.com>
1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-11 20:48 UTC (permalink / raw)
To: Andreas Haumer; +Cc: lkml
On Sat, 7 Jun 2003, Andreas Haumer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi!
>
> Andreas Haumer wrote:
> > Hi!
> >
> > Marcelo Tosatti wrote:
> >
> >>Hallo,
> >>
> >>Now I really hope its the last one, all this rc's are making me mad.
> >>
> >
> > ;-)
> >
> > So, here's a report on the more positive side...
> >
> I think, I have to take that back... :-((
>
> > As I mentioned in some e-mails in the last few days,
> > I'm currently testing an Asus AP1700-S5 server with
> > a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and
> > 4x36GB U320SCSI drives (3 of them are assembled as RAID5),
> > connected via GBit Ethernet to our internal network
> >
> I had this system running under heavy load for about 24 hours
> without problems. I then stopped the stress testing, and had
> several system freezes since then.
>
> With system freeze I mean:
>
> *) machine doesn't answer to ping, no reaction to console
> keyboard, no message on the console screen, no message
> in logfile, no oops, no noticeable system activity
Maybe the NMI oopser helps?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7
[not found] ` <1055408183.2552.18.camel@tor.trudheim.com>
@ 2003-06-12 9:35 ` Andreas Haumer
0 siblings, 0 replies; 19+ messages in thread
From: Andreas Haumer @ 2003-06-12 9:35 UTC (permalink / raw)
To: Anders Karlsson; +Cc: Marcelo Tosatti, Linux Kernel Mailing List
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi!
Anders Karlsson wrote:
> On Wed, 2003-06-11 at 21:48, Marcelo Tosatti wrote:
>
>>On Sat, 7 Jun 2003, Andreas Haumer wrote:
>
> [snip]
>
>>>I had this system running under heavy load for about 24 hours
>>>without problems. I then stopped the stress testing, and had
>>>several system freezes since then.
>>>
>>>With system freeze I mean:
>>>
>>>*) machine doesn't answer to ping, no reaction to console
>>> keyboard, no message on the console screen, no message
>>> in logfile, no oops, no noticeable system activity
>
>
> I have this problem without actually stressing the machine too hard. The
> average load on my Thinkpad over a weekend would perhaps be 0.05, yet I
> can have several hard hangs where there seems to be no trace of a hang
> at all in logfiles.
>
I have to admit that "system freeze" is a quite unspecific
symptom. It could have a zillion of different reasons.
In my case I'm currently chasing SCSI errors which I think
could have something to do with it (besides, it's _not_ an Adaptec
controller, but a LSI 53c1030 with Fusion MPT driver... :-)
In my server logs I sometimes see SCSI timeouts like this:
[...]
scsi : aborting command due to timeout : pid 1148093, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 00 0f af 00 00 10 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=dfca8e00)
IOs outstanding = 3
mptscsih: ioc0: Issue of TaskMgmt Successful!
SCSI host 0 abort (pid 1148093) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
mptscsih: OldReset scheduling BUS_RESET (sc=dfca8e00)
IOs outstanding = 4
SCSI Error Report =-=-= (0:0:0)
SCSI_Status=02h (CHECK CONDITION)
Original_CDB[]: 2A 00 00 3C 4D 78 00 00 02 00 - "WRITE(10)"
SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
SenseKey=6h (UNIT ATTENTION); FRU=00h
ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:1:0)
SCSI_Status=02h (CHECK CONDITION)
Original_CDB[]: 28 00 00 00 0F AF 00 00 10 00 - "READ(10)"
SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
SenseKey=6h (UNIT ATTENTION); FRU=00h
ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:2:0)
SCSI_Status=02h (CHECK CONDITION)
Original_CDB[]: 28 00 00 4E 0A 37 00 00 08 00 - "READ(10)"
SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
SenseKey=6h (UNIT ATTENTION); FRU=00h
ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
SCSI Error Report =-=-= (0:3:0)
SCSI_Status=02h (CHECK CONDITION)
Original_CDB[]: 28 00 03 B0 08 6F 00 00 08 00 - "READ(10)"
SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ...
SenseKey=6h (UNIT ATTENTION); FRU=00h
ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
[...]
There are 4 hot swap SCSI disks in the server, and all of them
eventually report those timeouts (so it's not specific to a single
disk)
I already replaced cabling, tried a different hot swap (SCA)
cage, and I'm now trying to replace the disks one by one to
eventually find the culprit.
There are two problems with this approach:
1.) After each change I have to wait several hours up to two
days for a SCSI timeout to occur as I can not reproduce
the problem at will.
2.) I'm not _sure_ if those SCSI timeouts are related to the server
freeze symptoms I see. It's just an assumption.
IMHO it could work as follows: SCSI timeouts occure somtimes.
The driver then aborts the command and resets the SCSI bus
to get it into a sane state again. But what if the bus reset
doesn't work as expected and the bus remains unusable for a
while? Could this bring the whole system into this "freeze"
state (the system is still running, but everything waits for
the SCSI bus to recover)? Could this explain the symptom of
those big delays of ICMP ping answer messages I saw?
So the most precious resource for chasing this problem is time,
and this is also the resource which I don't have available as
much as I'd like to... :-(
>
>>Maybe the NMI oopser helps?
>
>
> Marcelo, where can I get hold of this and would there be documentation
> included with it for how to install/use it?
>
Look at /usr/src/linux/Documentation/nmi_watchdog.txt
Regards,
- - andreas
- --
Andreas Haumer | mailto:andreas@xss.co.at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+6El7xJmyeGcXPhERAqykAKCumORTm/lDofkrg52FX33rOfgC/ACeNxR7
l9/znrbi0lZoR/zw+LTdNhI=
=W7Gt
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2003-06-12 9:24 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
2003-06-03 18:07 ` Marcelo Tosatti
2003-06-03 19:15 ` lk
2003-06-03 19:40 ` Alan Cox
2003-06-03 18:30 ` Alex Romosan
2003-06-03 19:27 ` Jeff Garzik
2003-06-03 19:58 ` Alex Romosan
2003-06-03 20:14 ` Tom Rini
2003-06-04 3:35 ` David S. Miller
2003-06-04 15:09 ` Mr. James W. Laferriere
2003-06-04 23:37 ` Alex Romosan
2003-06-05 12:09 ` Andreas Haumer
2003-06-07 15:46 ` Andreas Haumer
2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
2003-06-09 11:46 ` Stephan von Krawczynski
2003-06-09 12:21 ` Andreas Haumer
2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti
[not found] ` <1055408183.2552.18.camel@tor.trudheim.com>
2003-06-12 9:35 ` Andreas Haumer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).