All of lore.kernel.org
 help / color / mirror / Atom feed
* Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
@ 2004-09-21  1:40 Matt R Hall
  2004-09-21  6:34 ` David S. Miller
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Matt R Hall @ 2004-09-21  1:40 UTC (permalink / raw)
  To: sparclinux

I filed a report about this in kernel.org's bugzilla and it has not
received any comments. I would appreciate it if people could read this
and provide input. Here is a shortened version of the full report:

Distribution: Debian Unstable (Sid) 2004-09-20
Hardware Environment:
Sun Ultra Enterprise II
2x200 mHz UltraSPARC I Processor
512MB RAM
2x4.3GB 7200RPM SCSI Disc
Toshiba SCSI CD-ROM

Kernel configuration and dmesg output are pasted below for reference purposes.

Software Environment:
Standard Debian Unstable System

Problem Description:
Sun Ultra Enterprise 2 was hardlocking in 2.6.xx. Never responded to keyboard
input. Sometimes would respond to pings and SSH connections for a while, but
would always lock hard eventually with no errors. Note that this exact same
system works perfectly in 2.4.xx. I reiterate, SMP functions correctly there.
Obviously something has changed and caused functionality to break, hence why I
have marked this bug high severity. IMHO, "stable" kernel series should not
result in a step back in hardware compatibility.

Disabling SMP resulted in a working system.

Steps to reproduce:
Compile with the following kernel configuration, except add the SMP option.
Witness the machine fail to continue functioning.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
@ 2004-09-21  6:34 ` David S. Miller
  2004-09-21  9:41 ` Matt R Hall
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-09-21  6:34 UTC (permalink / raw)
  To: sparclinux

On Mon, 20 Sep 2004 18:40:39 -0700
Matt R Hall <mhcomputing@gmail.com> wrote:

> I filed a report about this in kernel.org's bugzilla and it has not
> received any comments.

Because the sparc64 kernel maintainer, me, doesn't look at
kernel.org's bugzilla ever. :-)

Can you make sure 2.6.9-rc2 has the same problem?

I don't have any of my SBUS SMP systems active in my array
of machines lately which is why stability on them has fallen
on the way side, they're just too damn slow to work with :(

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
  2004-09-21  6:34 ` David S. Miller
@ 2004-09-21  9:41 ` Matt R Hall
  2004-09-21 12:01 ` Jason Wever
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Matt R Hall @ 2004-09-21  9:41 UTC (permalink / raw)
  To: sparclinux

I specifically tested _every single_ kernel version since 2.6.0. It
took me an excruciating period of several weeks to cycle through this
number of dot releases. The stable, working uniprocessor kernel is
2.6.9-rc2. 2.6.9-rc2 specifically failed in SMP mode, but, it worked
in 2.4.xx.

I would not mind that only 2.4.xx works, but I am making a public
acccess UNIX server so I want a clear upgrade path, and the newer
better packet matching support so I can do some sophisticated egress
filtering to prevent abuse. Also, I seek a dependable upgrade path
into the future, which is something 2.4.xx cannot offer for the long
term.

On Mon, 20 Sep 2004 23:34:10 -0700, David S. Miller <davem@davemloft.net> wrote:
> On Mon, 20 Sep 2004 18:40:39 -0700
> Matt R Hall <mhcomputing@gmail.com> wrote:
> 
> > I filed a report about this in kernel.org's bugzilla and it has not
> > received any comments.
> 
> Because the sparc64 kernel maintainer, me, doesn't look at
> kernel.org's bugzilla ever. :-)
> 
> Can you make sure 2.6.9-rc2 has the same problem?
> 
> I don't have any of my SBUS SMP systems active in my array
> of machines lately which is why stability on them has fallen
> on the way side, they're just too damn slow to work with :(
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
  2004-09-21  6:34 ` David S. Miller
  2004-09-21  9:41 ` Matt R Hall
@ 2004-09-21 12:01 ` Jason Wever
  2004-09-21 13:00 ` Meelis Roos
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Jason Wever @ 2004-09-21 12:01 UTC (permalink / raw)
  To: sparclinux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 20 Sep 2004, David S. Miller wrote:

> On Mon, 20 Sep 2004 18:40:39 -0700
> Matt R Hall <mhcomputing@gmail.com> wrote:
>
>> I filed a report about this in kernel.org's bugzilla and it has not
>> received any comments.
>
> Because the sparc64 kernel maintainer, me, doesn't look at
> kernel.org's bugzilla ever. :-)
>
> Can you make sure 2.6.9-rc2 has the same problem?

I haven't tried anything > 2.6.8.1 yet, but at last look, anything > 2.6.6 
on a Blade 100 would cause lockups under high disk activity possibly in 
combination with high CPU load (in my cases, typically both were high at 
the time of lockup).

I've definitely seen this on 2.6 SMP on the Ultra 2 as well.  I'll try to 
test 2.6.9-rc2 on both machines tonight and see if I can still replicate.

- -- 
Jason Wever
Gentoo/Sparc Co-Team Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBUBgxdKvgdVioq28RAjndAJ4gJ3Pol6tweg0+D5LjQWKNOHL0rACfctWX
c7lnf22iSeGb7qYM2gJ9Eic=6ydb
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (2 preceding siblings ...)
  2004-09-21 12:01 ` Jason Wever
@ 2004-09-21 13:00 ` Meelis Roos
  2004-09-21 13:24 ` Stephen P. Becker
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Meelis Roos @ 2004-09-21 13:00 UTC (permalink / raw)
  To: sparclinux

MRH> Sun Ultra Enterprise 2 was hardlocking in 2.6.xx. Never responded to keyboard
MRH> input. Sometimes would respond to pings and SSH connections for a while, but
MRH> would always lock hard eventually with no errors. Note that this exact same
MRH> system works perfectly in 2.4.xx. I reiterate, SMP functions correctly there.
MRH> Obviously something has changed and caused functionality to break, hence why I
MRH> have marked this bug high severity. IMHO, "stable" kernel series should not
MRH> result in a step back in hardware compatibility.
MRH> 
MRH> Disabling SMP resulted in a working system.

Exactly the same here. Dual 168 MHz UE2, with CG6 card. Network activity
hangs the machine after some activity (like before getting a prompt from
sshor while doing a test ls -alR over ssh). Keyboard produces no input,
even when configured as kernel keymap. Additionally, serial console
behaves bad - kernel messages come through fine but user level messages
are very slow and only output char by char when these is scsi disk
activity, and any serial console output ceases when the disk goes idle.
I tried loading and unloading the cd tray, hoping that maybe it causes
scsi interrupts and I see some more output but it didn't seem to work
(maybe media load does not cause unit attention, I'm not sure).

I got a working U2 config from Dustin Marquess, this still hangs my
machine.

Additionally with the recent 2.6.9-rc* BK snapshots, the display stays
black and no framebuffer console is seen. Network works until it hangs.

Here is full dmesg (using console=ttyS0), the lines beginning with udev
are the character-at-a-time slow ones.

PROMLIB: Sun IEEE Boot Prom 3.25.0 1999/12/03 11:35
Linux version 2.6.9-rc2 (mroos@kookos) (gcc version 3.3.4 (Debian 1:3.3.4-6sarg4
ARCH: SUN4U
Ethernet address: 08:00:20:86:56:4a
Built 1 zonelists
Kernel command line: root=/dev/sda1 ro console=ttyS0
PID hash table entries: 4096 (order: 12, 131072 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 7, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 6, 524288 bytes)
Memory: 1035840k available (1768k kernel code, 536k data, 112k init) [fffff8000]
Mount-cache hash table entries: 512 (order: 0, 8192 bytes)
CPU 1: synchronized TICK with master CPU (last diff 0 cycles,maxerr 384 cycles)
Brought up 2 CPUs
Total of 2 processors activated (665.60 BogoMIPS).
SMP: Calibrating ecache flush... Using heuristic of 186099 cycles, 1 ticks.
NET: Registered protocol family 16
SCSI subsystem initialized
SYSIO: UPA portID 1f, at 000001fe00000000
sbus0: Clock 25.0 MHz
dma0: HME DVMA gate array
Console: switching to colour frame buffer device 144x56
cg6: CGsix [TGX+ sparc] at 1ff:10000000
Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin` sec (no)
zs2 at 0x000001fff1000004 (irq = 12,7e8) is a SunZilog
zs3 at 0x000001fff1000000 (irq = 12,7e8) is a SunZilog
ttyS0 at MMIO 0x0 (irq = 6777600) is a SunZilog
ttyS1 at MMIO 0x0 (irq = 6777600) is a SunZilog
Console: ttyS0 (SunZilog zs0)
sunhme.c:v2.02 24/Aug/2003 David S. Miller (davem@redhat.com)
eth0: HAPPY MEAL (SBUS) 10/100baseT Ethernet 08:00:20:86:56:4a
esp0: IRQ 4,7e0 SCSI ID 7 Clk 40MHz CCYC%000 CCF=8 TOut 167
NCR53C9XF(espfast)
ESP: Total of 1 ESP hosts found, 1 actually in use.
scsi0 : Sparc ESP366-HME
Using anticipatory io scheduler
  Vendor: IBM-PSG   Model: DNES-309170Y  !#  Rev: SAHR
  Type:   Direct-Access                      ANSI SCSI revision: 03
esp0: target 1 [period 100ns offset 15 20.00MHz FAST-WIDE SCSI-II]
  Vendor: TOSHIBA   Model: XM-5401TASUN4XCD  Rev: 1036
  Type:   CD-ROM                             ANSI SCSI revision: 02
SCSI device sda: 17774160 512-byte hdwr sectors (9100 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 1, lun 0
esp0: target 6 asynchronous
sr0: scsi-1 drive
Uniform CD-ROM driver Revision: 3.20
mice: PS/2 mouse device common for all mice
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP: Hash tables configured (established 65536 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
udev Adding 399328k swap on /dev/sda2.  Priority:-1 extents:1
requiEXT3 FS on sda1, internal journal
res hotplug support, not started.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (3 preceding siblings ...)
  2004-09-21 13:00 ` Meelis Roos
@ 2004-09-21 13:24 ` Stephen P. Becker
  2004-09-22  3:08 ` Jason Wever
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Stephen P. Becker @ 2004-09-21 13:24 UTC (permalink / raw)
  To: sparclinux

For what it's worth, I used to have 2x200 UltraSPARCI processors in my 
Ultra2.  I could not get it to boot via nfsroot using a 2.6.7 kernel 
(HME started resetting itself over and over just as it mounted the 
nfsroot) unless I disabled SMP.  I didn't have any such problems once I 
got my 2x300mhz UltraSPARCII processors.

Steve

Matt R Hall wrote:
> I filed a report about this in kernel.org's bugzilla and it has not
> received any comments. I would appreciate it if people could read this
> and provide input. Here is a shortened version of the full report:
> 
> Distribution: Debian Unstable (Sid) 2004-09-20
> Hardware Environment:
> Sun Ultra Enterprise II
> 2x200 mHz UltraSPARC I Processor
> 512MB RAM
> 2x4.3GB 7200RPM SCSI Disc
> Toshiba SCSI CD-ROM
> 
> Kernel configuration and dmesg output are pasted below for reference purposes.
> 
> Software Environment:
> Standard Debian Unstable System
> 
> Problem Description:
> Sun Ultra Enterprise 2 was hardlocking in 2.6.xx. Never responded to keyboard
> input. Sometimes would respond to pings and SSH connections for a while, but
> would always lock hard eventually with no errors. Note that this exact same
> system works perfectly in 2.4.xx. I reiterate, SMP functions correctly there.
> Obviously something has changed and caused functionality to break, hence why I
> have marked this bug high severity. IMHO, "stable" kernel series should not
> result in a step back in hardware compatibility.
> 
> Disabling SMP resulted in a working system.
> 
> Steps to reproduce:
> Compile with the following kernel configuration, except add the SMP option.
> Witness the machine fail to continue functioning.
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (4 preceding siblings ...)
  2004-09-21 13:24 ` Stephen P. Becker
@ 2004-09-22  3:08 ` Jason Wever
  2004-09-22  3:20 ` David S. Miller
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Jason Wever @ 2004-09-22  3:08 UTC (permalink / raw)
  To: sparclinux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 21 Sep 2004, Jason Wever wrote:

> I've definitely seen this on 2.6 SMP on the Ultra 2 as well.  I'll try to
> test 2.6.9-rc2 on both machines tonight and see if I can still replicate.

OK the Ultra 2 is up and running so far.  I've got it generating some good 
loads and will keep beating up on it (sometimes it can take a day or so 
for me to see this).

However 2.6.9-rc2 on the Blade 100 stops at the point it would normally 
switch from the "Loading Linux..." prompt to the framebuffer.  I've tried 
letting it go but it doesn't seem to go anywhere as I never get any 
network connectivity.  Is there a way I can try to generate some 
semi-useful output here to help troubleshoot what is going on?

Thanks.
- -- 
Jason Wever
Gentoo/Sparc Co-Team Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBUOyXdKvgdVioq28RAqMxAJ0UIQ9sLP3BjBmlbRhyoGQlh3c4XgCfbzRL
wiTmpJNkObURQ8pZiwR3gww=mWCR
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (5 preceding siblings ...)
  2004-09-22  3:08 ` Jason Wever
@ 2004-09-22  3:20 ` David S. Miller
  2004-09-23  0:26 ` Jason Wever
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-09-22  3:20 UTC (permalink / raw)
  To: sparclinux

On Tue, 21 Sep 2004 21:08:05 -0600 (MDT)
Jason Wever <weeve@gentoo.org> wrote:

> However 2.6.9-rc2 on the Blade 100 stops at the point it would normally 
> switch from the "Loading Linux..." prompt to the framebuffer.  I've tried 
> letting it go but it doesn't seem to go anywhere as I never get any 
> network connectivity.  Is there a way I can try to generate some 
> semi-useful output here to help troubleshoot what is going on?

I see this too, try two things (which I haven't gotten around
to trying yet):

1) boot with the "-p" option, this will log early
   console messages
2) try also "console=prom" and make sure CONFIG_PROM_CONSOLE
   is enabled as well

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (6 preceding siblings ...)
  2004-09-22  3:20 ` David S. Miller
@ 2004-09-23  0:26 ` Jason Wever
  2004-09-23  4:46 ` David S. Miller
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Jason Wever @ 2004-09-23  0:26 UTC (permalink / raw)
  To: sparclinux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 21 Sep 2004, David S. Miller wrote:

> I see this too, try two things (which I haven't gotten around
> to trying yet):
>
> 1) boot with the "-p" option, this will log early
>   console messages

This in fact does yield the problem, which is a kernel panic or two.  Text 
is as follows;

Allocated 8 Megs of memory at 0x40000000 for kernel
Loaded kernel version 2.6.9

PROMLIB: Sun IEEE Boot Prom 4.13.4 2004/05/19 18:35
Linux version 2.6.9-rc2 (root@excelsior.weeve.org) (gcc version 3.3.4 
20040623 (
Gentoo Linux 3.3.4)) #7 Tue Sep 21 20:53:14 MDT 2004
ARCH: SUN4U
Ethernet address: 00:03:ba:0e:60:2e
Remapping the kernel... done.
Booting Linux...
Built 1 zonelists
Kernel command line: root=/dev/hda4 ro -p
PID hash table entries: 4096 (order: 12, 131072 bytes)
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000000000
tsk->{mm,active_mm}->pgd = fffff8007f00ec00
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
swapper(0): Oops [#1]
TSTATE: 0000009980f09602 TPC: 00000000004507ac TNPC: 00000000004507b0 Y: 
00000000    Not tainted
TPC: <time_interpolator_update+0xc/0x260>
g0: 0000000000000000 g1: 00000000000f41a8 g2: 0000000000000000 g3: 
00000000000f41a8
g4: 000000000062a140 g5: 0000000000000000 g6: 0000000000626140 g7: 
0000000000000000
o0: 00000000246f0d06 o1: 000000000044bc20 o2: 0000000000000000 o3: 
0000000000020000
o4: 0000000000000080 o5: 0000000000000000 sp: 0000000000629041 ret_pc: 
000000000041ca64
RPC: <hbtick_get_tick+0x4/0x20>
l0: 0000000000001000 l1: 0000000000000018 l2: 00000000006fd000 l3: 
0000000000000004
l4: 00000000006fe3b0 l5: ffffffffffffffff l6: 0000000000704800 l7: 
00000000003a5400
i0: 00000000000f41a8 i1: 000000000044bcc0 i2: 0000000000000000 i3: 
000000000000000b
i4: 00000000006c2200 i5: 00000000006fd000 i6: 0000000000629101 i7: 
000000000044f988
I7: <update_wall_time_one_tick+0xa8/0x120>
Instruction DUMP: 9de3bf40  25001bf4  c45ca358 <c208a003> 80a06000 
0248006f  e0108000  c25ca358  80a42002
Kernel panic - not syncing: Aiee, killing interrupt handler!
  <0>Press L1-A to return to the boot prom
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000000000
tsk->{mm,active_mm}->pgd = fffff8007f00ec00
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
swapper(0): Oops [#2]
TSTATE: 0000009980f09600 TPC: 00000000004507ac TNPC: 00000000004507b0 Y: 
00000000    Not tainted
TPC: <time_interpolator_update+0xc/0x260>
g0: 8000000000000000 g1: 00000000001e8350 g2: 0000000000000000 g3: 
00000000000f41a8
g4: 000000000062a140 g5: 0000000000000000 g6: 0000000000626140 g7: 
0000000000000000
o0: 0000000024f67d21 o1: 00000000006f6861 o2: 00000000006f6800 o3: 
0000000000000020
o4: 000000000044f988 o5: 00000000006f6818 sp: 0000000000628411 ret_pc: 
000000000041ca64
RPC: <hbtick_get_tick+0x4/0x20>
l0: 0000000000000006 l1: 000000000062a140 l2: 00000000006fd000 l3: 
fffffffffffffff0
l4: 0000000000000004 l5: 0000000000000004 l6: 00000000fff59701 l7: 
00000000f0055310
i0: 00000000000f41a8 i1: 0000000000000001 i2: 0000000000629420 i3: 
00000000005eeb00
i4: 0000000000000001 i5: 00000000006fd000 i6: 00000000006284d1 i7: 
000000000044f988
I7: <update_wall_time_one_tick+0xa8/0x120>
Instruction DUMP: 9de3bf40  25001bf4  c45ca358 <c208a003> 80a06000 
0248006f  e0108000  c25ca358  80a42002
Kernel panic - not syncing: Aiee, killing interrupt handler!
  <0>Press L1-A to return to the boot prom


> 2) try also "console=prom" and make sure CONFIG_PROM_CONSOLE

This had the same result as a regular boot where it just appeared to stop 
with no useful information.

- -- 
Jason Wever
Gentoo/Sparc Co-Team Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBUhgpdKvgdVioq28RAihPAJ0V+nnJuIiJhH9bVF4sR583kgQPIgCeJHVe
ReMIYbhoPY9fRLXyjxvreME=EKxN
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (7 preceding siblings ...)
  2004-09-23  0:26 ` Jason Wever
@ 2004-09-23  4:46 ` David S. Miller
  2004-09-24 20:44 ` David S. Miller
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-09-23  4:46 UTC (permalink / raw)
  To: sparclinux

On Wed, 22 Sep 2004 18:26:14 -0600 (MDT)
Jason Wever <weeve@gentoo.org> wrote:

> tsk->{mm,active_mm}->context = 0000000000000000
> tsk->{mm,active_mm}->pgd = fffff8007f00ec00
>                \|/ ____ \|/
>                "@'/ .. \`@"
>                /_| \__/ |_\
>                   \__U_/
> swapper(0): Oops [#1]
> TSTATE: 0000009980f09602 TPC: 00000000004507ac TNPC: 00000000004507b0 Y: 
> 00000000    Not tainted
> TPC: <time_interpolator_update+0xc/0x260>

We're getting timer interrupts firing before the
register_time_interpolator() call in
arch/sparc64/kernel/time.c:time_init(), so that
dereferences a NULL time_interpolator pointer
and everything goes tits up.

I'll try to fix this tomorrow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (8 preceding siblings ...)
  2004-09-23  4:46 ` David S. Miller
@ 2004-09-24 20:44 ` David S. Miller
  2004-09-24 22:56 ` Jason Wever
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-09-24 20:44 UTC (permalink / raw)
  To: sparclinux


Jason, see if this patch gets your SB100 working again.
Thanks.

=== arch/sparc64/kernel/time.c 1.60 vs edited ==--- 1.60/arch/sparc64/kernel/time.c	2004-09-10 17:51:01 -07:00
+++ edited/arch/sparc64/kernel/time.c	2004-09-24 12:41:11 -07:00
@@ -910,10 +910,10 @@
 }
 
 /* This is gets the master TICK_INT timer going. */
-static unsigned long sparc64_init_timers(irqreturn_t (*cfunc)(int, void *, struct pt_regs *))
+static unsigned long sparc64_init_timers(void)
 {
-	unsigned long pstate, clock;
-	int node, err;
+	unsigned long clock;
+	int node;
 #ifdef CONFIG_SMP
 	extern void smp_tick_init(void);
 #endif
@@ -946,6 +946,14 @@
 	smp_tick_init();
 #endif
 
+	return clock;
+}
+
+static void sparc64_start_timers(irqreturn_t (*cfunc)(int, void *, struct pt_regs *))
+{
+	unsigned long pstate;
+	int err;
+
 	/* Register IRQ handler. */
 	err = request_irq(build_irq(0, 0, 0UL, 0UL), cfunc, SA_STATIC_ALLOC,
 			  "timer", NULL);
@@ -971,8 +979,6 @@
 			     : "r" (pstate));
 
 	local_irq_enable();
-
-	return clock;
 }
 
 struct freq_table {
@@ -1036,10 +1042,15 @@
 #define SPARC64_NSEC_PER_CYC_SHIFT	30UL
 void __init time_init(void)
 {
-	unsigned long clock = sparc64_init_timers(timer_interrupt);
+	unsigned long clock = sparc64_init_timers();
 
 	sparc64_cpu_interpolator.frequency = clock;
 	register_time_interpolator(&sparc64_cpu_interpolator);
+
+	/* Now that the interpolator is registered, it is
+	 * safe to start the timer ticking.
+	 */
+	sparc64_start_timers(timer_interrupt);
 
 	timer_ticks_per_nsec_quotient  		(((NSEC_PER_SEC << SPARC64_NSEC_PER_CYC_SHIFT) +

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (9 preceding siblings ...)
  2004-09-24 20:44 ` David S. Miller
@ 2004-09-24 22:56 ` Jason Wever
  2004-09-27 12:03 ` Jason Wever
  2004-09-27 18:17 ` David S. Miller
  12 siblings, 0 replies; 14+ messages in thread
From: Jason Wever @ 2004-09-24 22:56 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

On Fri, 24 Sep 2004 13:44:53 -0700
"David S. Miller" <davem@davemloft.net> wrote:

> Jason, see if this patch gets your SB100 working again.
> Thanks.

This definitely did the trick here.  System now boots normally.  Off to
stress test the disk I/O issue.

Thanks!

-- 
Jason Wever
Gentoo/Sparc Team Co-Lead

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (10 preceding siblings ...)
  2004-09-24 22:56 ` Jason Wever
@ 2004-09-27 12:03 ` Jason Wever
  2004-09-27 18:17 ` David S. Miller
  12 siblings, 0 replies; 14+ messages in thread
From: Jason Wever @ 2004-09-27 12:03 UTC (permalink / raw)
  To: sparclinux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 21 Sep 2004, Jason Wever wrote:

> I've definitely seen this on 2.6 SMP on the Ultra 2 as well.  I'll try to
> test 2.6.9-rc2 on both machines tonight and see if I can still replicate.


After beating up an Ultra 2 for a while by having it repeatedly unpack a 
tarball of several thousand small files (via a never-ending while loop) as 
well as building kdepim-3.3.0, here is what i get;

kernel BUG at arch/sparc64/mm/fault.c:336!
TRAPLOG: Error at trap level 0x2, dumping track stack.
TRAPLOG: Trap level 1 TSTATE[0000000080009602] TPC[000000000042220c] 
TNPC[00000]
TRAPLOG: Trap level 2 TSTATE[0000004480049402] TPC[0000000000408d2c] 
TNPC[00000]
TRAPLOG: Trap level 3 TSTATE[0000009911049403] TPC[0000000000408d20] 
TNPC[00000]
TRAPLOG: Trap level 4 TSTATE[0000009911049403] TPC[0000000000408d20] 
TNPC[00000]
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
swapper(0): TL1: Data Access Exception [#1]
TSTATE: 0000004480049402 TPC: 0000000000408d2c TNPC: 00000000004520e0 Y: 
000000d
TPC: <from_tl1_trap+0x20/0x34>
g0: fffff80001700120 g1: 0000000007ffffff g2: 00000000000ffc00 g3: 
0000000000000
g4: 0000000000629280 g5: 000000000000007f g6: 0000000000625280 g7: 
0000000000004
o0: fffff8007fe3b400 o1: 00000000fff00000 o2: 0007fffffff80080 o3: 
000007ffffe0f
o4: 0000000000000007 o5: fffff80080400000 sp: 0000000000628181 ret_pc: 
000000000
RPC: <sbus_unmap_sg+0xc0/0x100>
l0: 0007fffffff80080 l1: 00000000fff00000 l2: 00000000006a00c0 l3: 
0000000000630
l4: 00000000006c7400 l5: 00000000006c7400 l6: 000000000069ac00 l7: 
0000000000630
i0: fffff8007fe3b400 i1: 000000000000000f i2: 0000000000000080 i3: 
0000000000001
i4: 0000000000686380 i5: 00000000006bb800 i6: 0000000000628241 i7: 
0000000000560
I7: <esp_done+0x10/0x60>
Instruction DUMP: cad8d026  0ec14007  01000000 <caf00b80> 83f00000 
06f93ffe  8
TSTATE: 0000000011f09602 TPC: 000000000050e9a0 TNPC: 000000000050e9a4 Y: 
000000d
Kernel panic - not syncing: Aiee, killing interrupt handler!
  <0>Press L1-A to return to the boot prom
TPC: <__delay+0x20/0x40>
g0: fffff8007fe43091 g1: 0000000000548a40 g2: fffff8000000e020 g3: 
0000000000040
g4: 00000000006a6c00 g5: 00000005e0b7d000 g6: fffff8007fe44000 g7: 
fffff7ffff960
o0: 000000000000031c o1: 00000000006bb000 o2: 00000000004162d8 o3: 
00000000006c0
o4: 00000000006a1740 o5: 0000000000ffff00 sp: fffff8007fe430e1 ret_pc: 
000000000
RPC: <sunzilog_put_char+0x20/0x60>
l0: 0000000000000798 l1: 00000000014f8c00 l2: 0000000000408cc4 l3: 
000000000000b
l4: 0000000000000002 l5: 0000000000000000 l6: fffff8007fe44000 l7: 
0000000000008
i0: 000001fff1100004 i1: 000000000000000a i2: 00000000000081a4 i3: 
00000000005e0
i4: fffff800017f5000 i5: 0000000000000000 i6: fffff8007fe431a1 i7: 
000000000054c
I7: <sunzilog_console_write+0x5c/0xa0>
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 00000000000007a1
tsk->{mm,active_mm}->pgd = fffff80045320000
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
swapper(0): Oops [#2]
TSTATE: 0000009980f09604 TPC: 0000000000422c14 TNPC: 0000000000422c18 Y: 
000000d
TPC: <sbus_dma_sync_single_for_cpu+0x14/0x80>
g0: 00000000005e9a8c g1: 0000000000000000 g2: 00000000c238d800 g3: 
000000000000f
g4: 0000000000629280 g5: fffff8007fd88b12 g6: 0000000000625280 g7: 
0000000000500
o0: 0000000000000000 o1: 0000000000000004 o2: 0000000000000020 o3: 
0000000000000
o4: 000000000050c3f0 o5: 0000000000000000 sp: 00000000006277f9 ret_pc: 
000000008
RPC: <alloc_skb+0x48/0xc0>
l0: ffffffffffffffff l1: 00000000000000c0 l2: 00000000006d5800 l3: 
0000000000000
l4: 00000000006a17b8 l5: 00000000006a17b8 l6: 00000000006a6018 l7: 
0000000000620
i0: fffff8007fcee000 i1: 00000000c238d800 i2: 000000000000008d i3: 
0000000000002
i4: 0000000000000000 i5: 0000000000000000 i6: 00000000006278b9 i7: 
0000000000550
I7: <happy_meal_rx+0x310/0x380>
Instruction DUMP: 85366000  a0103fff  87343033 <f0584000> a12c300d 
8200801a  8
TSTATE: 0000009911f09602 TPC: 0000000000548a48 TNPC: 0000000000548a4c Y: 
000000d
Kernel panic - not syncing: Aiee, killing interrupt handler!
  <0>Press L1-A to return to the boot prom
TPC: <sunzilog_put_char+0x28/0x60>
g0: fffff8007fe43091 g1: 0000000000548a40 g2: fffff8000000e020 g3: 
0000000000040
g4: 00000000006a6c00 g5: 00000005e0b7d000 g6: fffff8007fe44000 g7: 
fffff7ffff960
o0: ffffffffffffffff o1: 00000000006bb000 o2: 00000000004162d8 o3: 
00000000006c0
o4: 00000000006a1740 o5: 0000000000ffff00 sp: fffff8007fe430e1 ret_pc: 
000000000
RPC: <sunzilog_put_char+0x20/0x60>
l0: 000000000000079f l1: 00000000014f8c00 l2: 0000000000408cc4 l3: 
000000000000b
l4: 0000000000000002 l5: 0000000000000000 l6: fffff8007fe44000 l7: 
0000000000008
i0: 000001fff1100004 i1: 0000000000000030 i2: 00000000000081a4 i3: 
00000000005e0
i4: fffff800017f5000 i5: 0000000000000000 i6: fffff8007fe431a1 i7: 
000000000054c
I7: <sunzilog_console_write+0x5c/0xa0>
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
(0): Kernel bad sw trap 5 [#3]
TSTATE: 0000004411f09603 TPC: 0000000000436084 TNPC: 0000000000436088 Y: 
000000d
TPC: <do_sparc64_fault+0x244/0x480>
g0: 0000000000000002 g1: 0000000000630120 g2: 0000000000630120 g3: 
0000000000000
g4: 00000000006a6c00 g5: 0000000000000000 g6: fffff8007fe44000 g7: 
0000000000000
o0: 000000000000002b o1: 00000000005f2a08 o2: 0000000000000150 o3: 
fffff8007a984
o4: 0000000000000000 o5: 0000000000000074 sp: fffff8007fe43621 ret_pc: 
00000000c
RPC: <do_sparc64_fault+0x23c/0x480>
l0: 00000000000000ff l1: 0000000000000096 l2: 00000000000000ff l3: 
fffff800017f0
l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 
0000000000031
i0: fffff8007fe43fc0 i1: 00000000006bb000 i2: 00000000004162d8 i3: 
00000000006c0
i4: 00000000006a1740 i5: 0000000000ffff00 i6: fffff8007fe43701 i7: 
000000000040c
I7: <sparc64_realfault_common+0x14/0x24>
Instruction DUMP: 110017ca  7fff87e9  90122208 <91d02005> 968c2001 
02480054  8
TSTATE: 0000001111f09600 TPC: 0000000000424208 TNPC: 000000000042420c Y: 
000000d
TPC: <spitfire_xcall_helper+0xc8/0xe0>
g0: 0000000000000000 g1: 0000000000000020 g2: 0000000000000050 g3: 
0000000000000
g4: 00000000006a6c00 g5: 000000000001869f g6: fffff8007fe44000 g7: 
0000000000003
o0: 0000000000630118 o1: 0000000000000001 o2: 0000000000002a5e o3: 
000000000000e
o4: ffffffffffffaa5e o5: 0000000000008000 sp: fffff8007fe43091 ret_pc: 
000000008
RPC: <vprintk+0x158/0x1e0>
l0: 0000000000000000 l1: 00000000006bbef9 l2: 000000000000000f l3: 
00000000006b0
l4: 0000000000000002 l5: 0000000000000000 l6: fffff8007fe44000 l7: 
fffff8007a890
i0: 0000000000435430 i1: 0000000000000000 i2: 0000000000000000 i3: 
0000000000006
i4: 0000000000004070 i5: 00000000006bbea8 i6: fffff8007fe43151 i7: 
0000000000420
I7: <smp_cross_call_masked+0x90/0x100>
Kernel panic - not syncing: Attempted to kill the idle task!
  <0>Press L1-A to return to the boot prom


- -- 
Jason Wever
Gentoo/Sparc Co-Team Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBWAGjdKvgdVioq28RArW0AJ9SPZOJjuE8IBiX34Feqr/5B04VVACcCp+B
Vj4lh7uQA7FoFUJSeOE5jiA=3NV4
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock
  2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
                   ` (11 preceding siblings ...)
  2004-09-27 12:03 ` Jason Wever
@ 2004-09-27 18:17 ` David S. Miller
  12 siblings, 0 replies; 14+ messages in thread
From: David S. Miller @ 2004-09-27 18:17 UTC (permalink / raw)
  To: sparclinux

On Mon, 27 Sep 2004 06:03:45 -0600 (MDT)
Jason Wever <weeve@gentoo.org> wrote:

> After beating up an Ultra 2 for a while by having it repeatedly unpack a 
> tarball of several thousand small files (via a never-ending while loop) as 
> well as building kdepim-3.3.0, here is what i get;

I have to warn you now that I'll not have time to look into this
kind of report until about later October.  All of my current
effort needs to be concentrated into the networking and general
sparc64 port maintainence, sorry :(

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-09-27 18:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-21  1:40 Ultra 2 Enterprise: All 2.6.xx SMP kernels hardlock Matt R Hall
2004-09-21  6:34 ` David S. Miller
2004-09-21  9:41 ` Matt R Hall
2004-09-21 12:01 ` Jason Wever
2004-09-21 13:00 ` Meelis Roos
2004-09-21 13:24 ` Stephen P. Becker
2004-09-22  3:08 ` Jason Wever
2004-09-22  3:20 ` David S. Miller
2004-09-23  0:26 ` Jason Wever
2004-09-23  4:46 ` David S. Miller
2004-09-24 20:44 ` David S. Miller
2004-09-24 22:56 ` Jason Wever
2004-09-27 12:03 ` Jason Wever
2004-09-27 18:17 ` David S. Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.