All of lore.kernel.org
 help / color / mirror / Atom feed
* ATM (LANE) - related Kernel-Crashes
@ 2004-03-16 12:08 Peter Daum
  2004-03-16 15:28 ` chas williams (contractor)
  2004-03-16 17:49 ` Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes) chas williams (contractor)
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Daum @ 2004-03-16 12:08 UTC (permalink / raw)
  To: linux-kernel, linux-atm-general

I have a bunch of machines with Forerunner LE ATM NICS running LANE.
Already for years (the first such occurence was still with kernel
2.2.x) I have been struggeling with kernel crashes that occur in
irregular intervals and without any obvious system.

Further information can be found at:
http://sourceforge.net/tracker/index.php?func=detail&aid=917247&group_id=7812&atid=107812
http://sourceforge.net/tracker/index.php?func=detail&aid=445059&group_id=7812&atid=107812

Unfortunately, I usually don't have any reasonable possibility to capture
crash dump data. This time for the first time in a while, I managed to
get a stack trace (Kernel 2.4.25):

Oops: 0000
CPU:    0
EIP:    0010:[<c02a4f5b>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010082
eax: c5934800   ebx: c645a080   ecx: c645a080   edx: 00000000
esi: 00000000   edi: 0000000e   ebp: c7fc0000   esp: c680fdf4
ds: 0018   es: 0018   ss: 0018
Process lt-zeppelin (pid: 107, stackpage=c680f000)
Stack: c7fc0000 c01ea242 c7fc0000 00000000 00000000 c5934800 c7fe6384
00000000
       0000000e c7fc0000 c01e984a c5934800 c645a080 ffffffff 0009f25f
       00030002
       0000000a ffffffff 00000005 00000000 0000006c 00000246 ffffffff
       c56145a0
Call Trace:    [<c01ea242>] [<c01e984a>] [<c01e8f3c>] [<c01e7e0d>]
[<c010a6f5>]
  [<c010a8b9>] [<c010cfc8>] [<c02a51e3>] [<c02a2380>] [<c02a044e>]
  [<c02a16b8>]
  [<c02460f6>] [<c014c2e0>] [<c012c0f2>] [<c010911f>]
Code: 8b 6a 6c 0f 84 70 01 00 00 fc 8b b3 84 00 00 00 bf c0 ed 31


>>EIP; c02a4f5b <lec_push+2b/220>   <=====

Trace; c01ea242 <dequeue_sm_buf+142/170>
Trace; c01e984a <dequeue_rx+8da/e90>
Trace; c01e8f3c <process_rsq+3c/70>
Trace; c01e7e0d <ns_irq_handler+36d/430>
Trace; c010a6f5 <handle_IRQ_event+45/70>
Trace; c010a8b9 <do_IRQ+69/b0>
Trace; c010cfc8 <call_do_IRQ+5/d>
Trace; c02a51e3 <lec_vcc_attach+93/b0>
Trace; c02a2380 <atm_push_raw+0/70>
Trace; c02a044e <try_atm_lane_ops+3e/60>
Trace; c02a16b8 <vcc_ioctl+258/350>
Trace; c02460f6 <sock_ioctl+26/30>
Trace; c014c2e0 <sys_ioctl+b0/260>
Trace; c012c0f2 <sys_munmap+42/70>
Trace; c010911f <system_call+33/38>

Code;  c02a4f5b <lec_push+2b/220>
00000000 <_EIP>:
Code;  c02a4f5b <lec_push+2b/220>   <=====
   0:   8b 6a 6c                  mov    0x6c(%edx),%ebp   <=====
Code;  c02a4f5e <lec_push+2e/220>
   3:   0f 84 70 01 00 00         je     179 <_EIP+0x179> c02a50d4
   <lec_push+1a4/220>
Code;  c02a4f64 <lec_push+34/220>
   9:   fc                        cld
Code;  c02a4f65 <lec_push+35/220>
   a:   8b b3 84 00 00 00         mov    0x84(%ebx),%esi
Code;  c02a4f6b <lec_push+3b/220>
  10:   bf c0 ed 31 00            mov    $0x31edc0,%edi

 <0>Kernel panic: Aiee, killing interrupt handler!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ATM (LANE) - related Kernel-Crashes
  2004-03-16 12:08 ATM (LANE) - related Kernel-Crashes Peter Daum
@ 2004-03-16 15:28 ` chas williams (contractor)
  2004-03-16 19:24   ` Peter Daum
  2004-03-16 17:49 ` Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes) chas williams (contractor)
  1 sibling, 1 reply; 6+ messages in thread
From: chas williams (contractor) @ 2004-03-16 15:28 UTC (permalink / raw)
  To: Peter Daum; +Cc: linux-kernel, linux-atm-general

In message <Pine.LNX.4.30.0403161249270.9408-100000@swamp.bayern.net>,Peter Dau
m writes:
>eax: c5934800   ebx: c645a080   ecx: c645a080   edx: 00000000
>esi: 00000000   edi: 0000000e   ebp: c7fc0000   esp: c680fdf4
>...
>Code;  c02a4f5b <lec_push+2b/220>
>00000000 <_EIP>:
>Code;  c02a4f5b <lec_push+2b/220>   <=====
>   0:   8b 6a 6c                  mov    0x6c(%edx),%ebp   <=====
>   3:   0f 84 70 01 00 00         je     179 <_EIP+0x179> c02a50d4
>   9:   fc                        cld
>   a:   8b b3 84 00 00 00         mov    0x84(%ebx),%esi
>  10:   bf c0 ed 31 00            mov    $0x31edc0,%edi
>
> <0>Kernel panic: Aiee, killing interrupt handler!

well this is pretty useful.  just curious--which gcc are you using to
build your kernels?  i have slightly different assembly for this bit
of code but it seems to point to:

Line 657 of "net/atm/lec.c" starts at address 0xe4a <lec_push+26> and ends at 0xe50 <lec_push+32>.

which is:

void
lec_push(struct atm_vcc *vcc, struct sk_buff *skb)
{
        struct net_device *dev = (struct net_device *)vcc->proto_data;
        struct lec_priv *priv = (struct lec_priv *)dev->priv;		<=====

%edx holds the result of vcc->proto_data (or dev) which seems to be 0.
this is bad.  since you died in an interrupt handler, it a fairly 
safe guess that this is a race.  a quick check of where proto_data gets
assigned shows:

lec_vcc_attach(struct atm_vcc *vcc, void *arg)
{
	...
        vcc->push = lec_push;
        vcc->proto_data = dev_lec[ioc_data.dev_num];
	...

this is bad.  these two lines should be reversed!  lec_push() is
not safe until vcc->proto_data is setup.  could you swap the order of
those two lines and give that a try?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes)
  2004-03-16 12:08 ATM (LANE) - related Kernel-Crashes Peter Daum
  2004-03-16 15:28 ` chas williams (contractor)
@ 2004-03-16 17:49 ` chas williams (contractor)
  2004-03-16 22:24   ` Peter Daum
  1 sibling, 1 reply; 6+ messages in thread
From: chas williams (contractor) @ 2004-03-16 17:49 UTC (permalink / raw)
  To: Peter Daum; +Cc: linux-kernel, linux-atm-general

while i was looking at the bug report on the sourceforge site, i decided
to take a quick look at your nicstar problem.  can you try the following
patch (apply with patch -p1).


===== drivers/atm/nicstar.c 1.14 vs edited =====
--- 1.14/drivers/atm/nicstar.c	Sun Feb 29 13:53:50 2004
+++ edited/drivers/atm/nicstar.c	Tue Mar 16 12:43:19 2004
@@ -467,7 +467,7 @@
 {
    int j;
    struct ns_dev *card = NULL;
-   unsigned char pci_latency;
+   unsigned char pci_latency, cache_size;
    unsigned error;
    u32 data;
    u32 u32d[4];
@@ -512,6 +512,21 @@
    PRINTK("nicstar%d: membase at 0x%x.\n", i, card->membase);
 
    pci_set_master(pcidev);
+
+   if (pci_read_config_byte(pcidev, PCI_CACHE_LINE_SIZE, &cache_size)) {
+	printk("nicstar%d: can't read cache line size?\n", i);
+	error = 6;
+	ns_init_card_error(card, error);
+	return error;
+   }
+
+   if ((cache_size << 2) != L1_CACHE_BYTES) {
+	printk("nicstar%d: PCI cache line size set incorrectly (%d), ", i, cache_size);
+	cache_size = L1_CACHE_BYTES >> 2;
+	printk("setting cache line size to %d\n", cache_size);
+	if (pci_write_config_byte(pcidev, PCI_CACHE_LINE_SIZE, cache_size))
+	    printk("nicstar%d: can't set cache line size to %d\n", i, cache_size);
+   }
 
    if (pci_read_config_byte(pcidev, PCI_LATENCY_TIMER, &pci_latency) != 0)
    {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ATM (LANE) - related Kernel-Crashes
  2004-03-16 15:28 ` chas williams (contractor)
@ 2004-03-16 19:24   ` Peter Daum
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Daum @ 2004-03-16 19:24 UTC (permalink / raw)
  To: chas williams (contractor); +Cc: linux-kernel, linux-atm-general

Hi,

On Tue, 16 Mar 2004, chas williams (contractor) wrote:

> well this is pretty useful.  just curious--which gcc are you using to
> build your kernels?  i have slightly different assembly for this bit
> of code but it seems to point to:
>
> Line 657 of "net/atm/lec.c" starts at address 0xe4a <lec_push+26> and ends at 0xe50 <lec_push+32>.

This particular kernel had been built using gcc 3.3. The problem,
however is certainly not related to any particular compiler
version.

> lec_vcc_attach(struct atm_vcc *vcc, void *arg)
> {
> 	...
>         vcc->push = lec_push;
>         vcc->proto_data = dev_lec[ioc_data.dev_num];
> 	...
>
> this is bad.  these two lines should be reversed!  lec_push() is
> not safe until vcc->proto_data is setup.  could you swap the order of
> those two lines and give that a try?

I will. Unfortunately, it will take a while to find out whether
this makes any difference. I currently have ~ 8 machines with the
same Forerunner LE atm adapters and LANE. All of them
sporadically crash without any recognizable pattern, but it also
happens that they run stable for months. In all cases where I
could find out any details, the problem seemed to be
LANE-related. Unfortunately, they did not always occur at the
place in the kernel code. lec_push seems() to be pretty frequent
but I also had 2 "documented" crashs in lec_send_packet() and one
in bind_vcc(). (Some more "historic" crash dumps are in the bug
report at http://sourceforge.net/tracker/index.php?
func=detail&aid=445059&group_id=7812&atid=107812)

(Sadly, all information I could gather about what is going on is
very fragmentary. I tried to capture the crash dumps with "lkcd"
but this also doesn't work when the kernel crashes within an
interrupt handler; all the data is by attaching an old laptop
machine as a serial console, but this method doesn't scale well
;-)

Thanks a lot for your endeavor,

Peter Daum


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes)
  2004-03-16 17:49 ` Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes) chas williams (contractor)
@ 2004-03-16 22:24   ` Peter Daum
  2004-03-16 23:04     ` chas williams (contractor)
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Daum @ 2004-03-16 22:24 UTC (permalink / raw)
  To: chas williams (contractor); +Cc: linux-kernel, linux-atm-general

Hi,

On Tue, 16 Mar 2004, chas williams (contractor) wrote:

> while i was looking at the bug report on the sourceforge site, i decided
> to take a quick look at your nicstar problem.  can you try the following
> patch (apply with patch -p1).

I applied your patch. The machine I checked (HP NetServer LC3, Pentium II)
now prints:
  nicstar0: PCI cache line size set incorrectly (0),
  setting cache line size to 8
when initializing the card.

Unfortunately, the patch does not fix the problem: My usual test
case, transferring data from Mcafee's ftp server, still doesn't
work.

Regards,
                Peter Daum


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes)
  2004-03-16 22:24   ` Peter Daum
@ 2004-03-16 23:04     ` chas williams (contractor)
  0 siblings, 0 replies; 6+ messages in thread
From: chas williams (contractor) @ 2004-03-16 23:04 UTC (permalink / raw)
  To: Peter Daum; +Cc: linux-kernel, linux-atm-general

In message <Pine.LNX.4.30.0403162025460.17727-100000@swamp.bayern.net>,Peter Da
um writes:
>Unfortunately, the patch does not fix the problem: My usual test
>case, transferring data from Mcafee's ftp server, still doesn't
>work.

and i am not surprised.  just read the manual for the card and it
doesnt support mwi (memory write invalidate) so the cache line
setting of the nicstar is meaningless.  i will need to think about
this some more.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-16 23:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-16 12:08 ATM (LANE) - related Kernel-Crashes Peter Daum
2004-03-16 15:28 ` chas williams (contractor)
2004-03-16 19:24   ` Peter Daum
2004-03-16 17:49 ` Bug in ForeRunner LE (cache line settings) (was ATM (LANE) - related Kernel-Crashes) chas williams (contractor)
2004-03-16 22:24   ` Peter Daum
2004-03-16 23:04     ` chas williams (contractor)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.