linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.20 CPU lockup - Now with OOPS message
@ 2003-01-23 23:30 Daniel Khan
  2003-01-24  0:05 ` GrandMasterLee
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Khan @ 2003-01-23 23:30 UTC (permalink / raw)
  To: linux-kernel

Hello List,

I reported frequently system lockups today.
Now after some playing around (cause I don't know anything about kernel
debugging - Thanks to Mark Hahn for the tipps)
I found a way to reproduce the lock and to get the OOPS.
Dan Kegel told me after the last post that only kernels built from the
kernel.org sources can be supported
by this list. I now used the 2.4.20-2.25smp kernel from RawHide. And I
didn't build a kernel from kernel.org.
O.K. here's the deal:
The OOPS below looks like spanish to me but for the hackers the thing could
be very clear.
So if you think that is a common kernel issue please help. Otherwise I'll
report to RedHat immediately.

Scenario:
2.4.20-2.25smp from RawHide

Doing a rsync from the crashing host _to_ another host over a 1000 Mbit 3com
(TG3).
The rsynced files include bigger files with about 1.5 gigs.
Heartbeat runs.

Below are the OOPS.
Please CC to dk@webcluster.at if you are wanting to help.

Thanks a lot

Daniel Khan

<------------------------CUT---------------------------->
NMI Watchdog detected LOCKUP on CPU0, eip c02499ac, registers:
via686a eeprom lm80 i2c-proc i2c-isa i2c-viapro i2c-core tg3 eepro100 mii
ipt_LOG ipt_limit ipt_state ipt_REJECT iptable_nat ip_cona
CPU:    0
EIP:    0060:[<c02499ac>]    Not tainted
EFLAGS: 00000086

EIP is at .text.lock.tcp_ipv4 [kernel] 0x182 (2.4.20-2.25smp)
eax: 00000001   ebx: d400010a   ecx: 00000000   edx: f78837d8
esi: f6f22ae0   edi: c3d3ad40   ebp: f74939f4   esp: f1335d8c
ds: 0068   es: 0068   ss: 0068
Process rsync (pid: 3151, stackpage=f1335000)
Stack: c3d3ad40 f3121f38 00000001 f1335e28 00000000 03ff0202 00000004
000003ff
       00000000 00000006 c3d3ad40 f74939e0 c022d67e c3d3ad40 f1335e28
c3d5a000
       00000000 00000006 00000000 00000001 00000000 c022d530 c021ce67
c3d3ad40
Call Trace:   [<c022d67e>] ip_local_deliver_finish [kernel] 0x14e
(0xf1335dbc))
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335de0))
[<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335de4))
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e00))
[<c022d2b3>] ip_local_deliver [kernel] 0x53 (0xf1335e1c))
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e34))
[<c022d8b9>] ip_rcv_finish [kernel] 0x219 (0xf1335e38))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e5c))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e6c))
[<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335e70))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e8c))
[<c022d480>] ip_rcv [kernel] 0x1a0 (0xf1335ea8))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335ec0))
[<c021566e>] netif_receive_skb [kernel] 0x14e (0xf1335ed8))
[<f89d2c7c>] tg3_rx [tg3] 0x27c (0xf1335ef8))
[<f89d2e71>] tg3_poll [tg3] 0x81 (0xf1335f38))
[<c0215917>] net_rx_action [kernel] 0xa7 (0xf1335f58))
[<c01289f9>] do_softirq [kernel] 0xd9 (0xf1335f80))
[<c010b81b>] do_IRQ [kernel] 0xfb (0xf1335f9c))
[<c010e7c8>] call_do_IRQ [kernel] 0x5 (0xf1335fc0))


Code: 7e f8 e9 68 e5 ff ff e8 2c ed eb ff e9 c3 ee ff ff e8 22 ed
console shuts up ...
 NMMI Watchdog detected LOCKUP on CPU1, eip f89d9f3b, registers:
<------------------------CUT---------------------------->


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20 CPU lockup - Now with OOPS message
  2003-01-23 23:30 2.4.20 CPU lockup - Now with OOPS message Daniel Khan
@ 2003-01-24  0:05 ` GrandMasterLee
  2003-01-24  0:11   ` AW: " Daniel Khan
  0 siblings, 1 reply; 6+ messages in thread
From: GrandMasterLee @ 2003-01-24  0:05 UTC (permalink / raw)
  To: dk; +Cc: linux-kernel

Can I ask how you reproduced this? I've got several systems with TG3's
and they only get lockups during network backups.


On Thu, 2003-01-23 at 17:30, Daniel Khan wrote:
> Hello List,
> 
> I reported frequently system lockups today.
> Now after some playing around (cause I don't know anything about kernel
> debugging - Thanks to Mark Hahn for the tipps)
> I found a way to reproduce the lock and to get the OOPS.
> Dan Kegel told me after the last post that only kernels built from the
> kernel.org sources can be supported
> by this list. I now used the 2.4.20-2.25smp kernel from RawHide. And I
> didn't build a kernel from kernel.org.
> O.K. here's the deal:
> The OOPS below looks like spanish to me but for the hackers the thing could
> be very clear.
> So if you think that is a common kernel issue please help. Otherwise I'll
> report to RedHat immediately.
> 
> Scenario:
> 2.4.20-2.25smp from RawHide
> 
> Doing a rsync from the crashing host _to_ another host over a 1000 Mbit 3com
> (TG3).
> The rsynced files include bigger files with about 1.5 gigs.
> Heartbeat runs.
> 
> Below are the OOPS.
> Please CC to dk@webcluster.at if you are wanting to help.
> 
> Thanks a lot
> 
> Daniel Khan
> 
> <------------------------CUT---------------------------->
> NMI Watchdog detected LOCKUP on CPU0, eip c02499ac, registers:
> via686a eeprom lm80 i2c-proc i2c-isa i2c-viapro i2c-core tg3 eepro100 mii
> ipt_LOG ipt_limit ipt_state ipt_REJECT iptable_nat ip_cona
> CPU:    0
> EIP:    0060:[<c02499ac>]    Not tainted
> EFLAGS: 00000086
> 
> EIP is at .text.lock.tcp_ipv4 [kernel] 0x182 (2.4.20-2.25smp)
> eax: 00000001   ebx: d400010a   ecx: 00000000   edx: f78837d8
> esi: f6f22ae0   edi: c3d3ad40   ebp: f74939f4   esp: f1335d8c
> ds: 0068   es: 0068   ss: 0068
> Process rsync (pid: 3151, stackpage=f1335000)
> Stack: c3d3ad40 f3121f38 00000001 f1335e28 00000000 03ff0202 00000004
> 000003ff
>        00000000 00000006 c3d3ad40 f74939e0 c022d67e c3d3ad40 f1335e28
> c3d5a000
>        00000000 00000006 00000000 00000001 00000000 c022d530 c021ce67
> c3d3ad40
> Call Trace:   [<c022d67e>] ip_local_deliver_finish [kernel] 0x14e
> (0xf1335dbc))
> [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335de0))
> [<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335de4))
> [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e00))
> [<c022d2b3>] ip_local_deliver [kernel] 0x53 (0xf1335e1c))
> [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e34))
> [<c022d8b9>] ip_rcv_finish [kernel] 0x219 (0xf1335e38))
> [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e5c))
> [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e6c))
> [<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335e70))
> [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e8c))
> [<c022d480>] ip_rcv [kernel] 0x1a0 (0xf1335ea8))
> [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335ec0))
> [<c021566e>] netif_receive_skb [kernel] 0x14e (0xf1335ed8))
> [<f89d2c7c>] tg3_rx [tg3] 0x27c (0xf1335ef8))
> [<f89d2e71>] tg3_poll [tg3] 0x81 (0xf1335f38))
> [<c0215917>] net_rx_action [kernel] 0xa7 (0xf1335f58))
> [<c01289f9>] do_softirq [kernel] 0xd9 (0xf1335f80))
> [<c010b81b>] do_IRQ [kernel] 0xfb (0xf1335f9c))
> [<c010e7c8>] call_do_IRQ [kernel] 0x5 (0xf1335fc0))
> 
> 
> Code: 7e f8 e9 68 e5 ff ff e8 2c ed eb ff e9 c3 ee ff ff e8 22 ed
> console shuts up ...
>  NMMI Watchdog detected LOCKUP on CPU1, eip f89d9f3b, registers:
> <------------------------CUT---------------------------->
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
GrandMasterLee <masterlee@digitalroadkill.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: 2.4.20 CPU lockup - Now with OOPS message
  2003-01-24  0:05 ` GrandMasterLee
@ 2003-01-24  0:11   ` Daniel Khan
  2003-01-24  1:40     ` GrandMasterLee
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Khan @ 2003-01-24  0:11 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

Hi,

> > I reported frequently system lockups today.
> > Now after some playing around (cause I don't know anything about kernel
> > debugging - Thanks to Mark Hahn for the tipps)
> > I found a way to reproduce the lock and to get the OOPS.
[..]

> Can I ask how you reproduced this? I've got several systems with TG3's
> and they only get lockups during network backups.

httpd session on the host which has big logfiles to get them changed.
Starting rsync to sync the logfiles and other stuff to the backup host.

Sometimes I have to retry 2-3 times but it crashes very reliable.
It's quite the same as the network backups you mentioning.

Daniel Khan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AW: 2.4.20 CPU lockup - Now with OOPS message
  2003-01-24  0:11   ` AW: " Daniel Khan
@ 2003-01-24  1:40     ` GrandMasterLee
  2003-01-24  2:33       ` AW: " Daniel Khan
  0 siblings, 1 reply; 6+ messages in thread
From: GrandMasterLee @ 2003-01-24  1:40 UTC (permalink / raw)
  To: dk; +Cc: linux-kernel

On Thu, 2003-01-23 at 18:11, Daniel Khan wrote:
> Hi,
> 
> > > I reported frequently system lockups today.
> > > Now after some playing around (cause I don't know anything about kernel
> > > debugging - Thanks to Mark Hahn for the tipps)
> > > I found a way to reproduce the lock and to get the OOPS.
> [..]
> 
> > Can I ask how you reproduced this? I've got several systems with TG3's
> > and they only get lockups during network backups.
> 
> httpd session on the host which has big logfiles to get them changed.
> Starting rsync to sync the logfiles and other stuff to the backup host.
> 
> Sometimes I have to retry 2-3 times but it crashes very reliable.
> It's quite the same as the network backups you mentioning.

We use rsync to do our backups. I've been getting lines in my backup
server kernel and dmesg like this:

TCP: Treason uncloaked! Peer 10.1.1.40:37859/873 shrinks window
2430745930:2430747378. Repaired.
TCP: Treason uncloaked! Peer 10.1.1.40:37859/873 shrinks window
2430745930:2430747378. Repaired.


I was able to successfully reproduce this error in a test setup, but not
the crashes. I'm curious if maybe I just start up too many instances of
rsync and see what happens.

Any particular method or size of files, etc, in reproducing this would
be greatly beneficial. TIA

> Daniel Khan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: AW: 2.4.20 CPU lockup - Now with OOPS message
  2003-01-24  1:40     ` GrandMasterLee
@ 2003-01-24  2:33       ` Daniel Khan
  2003-01-24  6:10         ` GrandMasterLee
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Khan @ 2003-01-24  2:33 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: linux-kernel

Hi,

[..]
> I was able to successfully reproduce this error in a test setup, but not
> the crashes. I'm curious if maybe I just start up too many instances of
> rsync and see what happens.
>
> Any particular method or size of files, etc, in reproducing this would
> be greatly beneficial. TIA

Here is the command
/usr/local/bin/nice-rsync --rsync-path=/usr/local/bin/nice-rsync --whole-fil
e -auq --delete /var/log/httpd/ 10.1.0.212:/var/log/httpd

/usr/local/bin/nice-rsync :

#!/bin/sh
  exec nice -n 19 rsync $*

Best regards

Daniel Khan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AW: AW: 2.4.20 CPU lockup - Now with OOPS message
  2003-01-24  2:33       ` AW: " Daniel Khan
@ 2003-01-24  6:10         ` GrandMasterLee
  0 siblings, 0 replies; 6+ messages in thread
From: GrandMasterLee @ 2003-01-24  6:10 UTC (permalink / raw)
  To: dk; +Cc: linux-kernel

On Thu, 2003-01-23 at 20:33, Daniel Khan wrote:
> Hi,
> 
> [..]
> > I was able to successfully reproduce this error in a test setup, but not
> > the crashes. I'm curious if maybe I just start up too many instances of
> > rsync and see what happens.
> >
> > Any particular method or size of files, etc, in reproducing this would
> > be greatly beneficial. TIA
> 
> Here is the command
> /usr/local/bin/nice-rsync --rsync-path=/usr/local/bin/nice-rsync --whole-fil
> e -auq --delete /var/log/httpd/ 10.1.0.212:/var/log/httpd
> 
> /usr/local/bin/nice-rsync :
> 
> #!/bin/sh
>   exec nice -n 19 rsync $*
> 
> Best regards
> 
> Daniel Khan


Kewl. Thanks, I will try this out tomorrow and let you know.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-24  6:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-23 23:30 2.4.20 CPU lockup - Now with OOPS message Daniel Khan
2003-01-24  0:05 ` GrandMasterLee
2003-01-24  0:11   ` AW: " Daniel Khan
2003-01-24  1:40     ` GrandMasterLee
2003-01-24  2:33       ` AW: " Daniel Khan
2003-01-24  6:10         ` GrandMasterLee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).