linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 21041 transmit timed out
       [not found] <Pine.LNX.4.05.10012201318030.1508-100000@callisto.of.borg>
@ 2001-03-12 13:19 ` Geert Uytterhoeven
  2001-03-12 17:34   ` Geert Uytterhoeven
  0 siblings, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2001-03-12 13:19 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development

On Wed, 20 Dec 2000, Geert Uytterhoeven wrote:
> Since I switched from the de4x5 driver to the tulip driver several months ago
> (before that, tulip didn't work on PPC), I never saw 21041 transmit timed out
> messages again, until today:
> 
> | Dec 20 12:13:29 callisto kernel: Linux Tulip driver version 0.9.11 (November 3, 2000)
> | Dec 20 12:13:29 callisto kernel: PCI: Enabling device 00:04.0 (0000 -> 0003)
> | Dec 20 12:13:29 callisto kernel: eth0: Digital DC21041 Tulip rev 33 at 0x1080, 21041 mode, 00:80:C8:5A:F8:5B, IRQ 29.
> | Dec 20 12:13:29 callisto kernel: eth0: 21041 Media table, default media 0800 (Autosense).
> | Dec 20 12:13:29 callisto kernel: eth0:  21041 media #0, 10baseT.
> | Dec 20 12:13:29 callisto kernel: eth0:  21041 media #4, 10baseT-FD.
> | Dec 20 12:13:29 callisto kernel: eth0:  21041 media #1, 10base2.
> | Dec 20 12:55:27 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
> | Dec 20 12:55:27 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000001c8, CSR13 ffffef05, CSR14 ffffff3f, resetting...
> | Dec 20 12:55:27 callisto kernel: eth0: 21143 100baseTx sensed media.
> | Dec 20 12:55:35 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
> | Dec 20 12:55:35 callisto kernel: eth0: 21041 transmit timed out, status fc660010, CSR12 000002c8, CSR13 ffffef0d, CSR14 fffff73d, resetting...
> | Dec 20 12:55:35 callisto kernel: eth0: 21143 100baseTx sensed media.

    [...]

> Then I tried ifconfig down/up, which didn't work. An additional rmmod/insmod
> pair for the tulip module woke up the card again:
> 
> | Dec 20 13:00:43 callisto kernel: Linux Tulip driver version 0.9.11 (November 3, 2000)
> | Dec 20 13:00:43 callisto kernel: eth0: Digital DC21041 Tulip rev 33 at 0x1080, 21041 mode, 00:80:C8:5A:F8:5B, IRQ 29.
> | Dec 20 13:00:43 callisto kernel: eth0: 21041 Media table, default media 0800 (Autosense).
> | Dec 20 13:00:43 callisto kernel: eth0:  21041 media #0, 10baseT.
> | Dec 20 13:00:43 callisto kernel: eth0:  21041 media #4, 10baseT-FD.
> | Dec 20 13:00:43 callisto kernel: eth0:  21041 media #1, 10base2.
> 
> The machine is a CHRP LongTrail (PPC 604e), running some kind of 2.4.0-test11
> kernel.
> 
> The card is a D-Link DE-530CT, with a 21041 on a 10baseT network (hence not
> 100baseTx, as detected during the problem phase!).

lspci output:

| 00:04.0 Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] (rev 21)
|         Subsystem: D-Link System Inc DE-530+
|         Flags: bus master, medium devsel, latency 0, IRQ 29
|         I/O ports at 1080 [size=128]
|         Memory at c1080000 (32-bit, non-prefetchable) [size=128]
|         Expansion ROM at c11c0000 [disabled] [size=256K]

I made a list of driver versions that showed the problem so far:

| Tulip driver version 0.9.11 (November 3, 2000)
| Tulip driver version 0.9.13 (January 2, 2001)
| Tulip driver version 0.9.13a (January 20, 2001)
| Tulip driver version 0.9.14 (February 20, 2001)

There's one version I never had problems with:

| Tulip driver version 0.9.5 (May 30, 2000) (from 2.4.0-test1-ac10)

Of course I'm not 100% sure, but I ran 2.4.0-test1-ac10 for nearly 6 months on
that machine, without a single Tulip problem. All other versions caused
problems in a much shorter timeframe.

BTW, do you want me to try 1.1.2?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 21041 transmit timed out
  2001-03-12 13:19 ` 21041 transmit timed out Geert Uytterhoeven
@ 2001-03-12 17:34   ` Geert Uytterhoeven
  0 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2001-03-12 17:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development

On Mon, 12 Mar 2001, Geert Uytterhoeven wrote:
> I made a list of driver versions that showed the problem so far:
> 
> | Tulip driver version 0.9.14 (February 20, 2001)

Wow! 0.9.14 in my 2.4.3-pre2 kernel just seem to have recovered from the
problem:

| NETDEV WATCHDOG: eth0: transmit timed out
| eth0: 21041 transmit timed out, status fc660000, CSR12 000001c8, CSR13 ffffef05, CSR14 ffffff3f, resetting...
| eth0: 21143 100baseTx sensed media.
| NETDEV WATCHDOG: eth0: transmit timed out
| eth0: 21041 transmit timed out, status fc260010, CSR12 000002c8, CSR13 ffffef0d, CSR14 fffff73d, resetting...
| NETDEV WATCHDOG: eth0: transmit timed out
| eth0: 21041 transmit timed out, status fc260010, CSR12 000002c8, CSR13 ffffef0d, CSR14 fffff73d, resetting...
| eth0: Out-of-sync dirty pointer, 243506 vs. 243524.

And eth0 is back online.

The next few lines in the syslog do look suspicious, though:

| release_dev: driver.table[1] not tty for ()
| release_dev: driver.table[7] not tty for (FikXZdoGQcCWQFGjV5evDzG+mv
| Q1jmH5buYh7LWpmXr8mfjykOCoR2Ry+NmzL3sE49mLozzdT22tUJKu6ztÄPÂÜð)
| Warning: dev (04:08) tty->count(2) != #fd's(1) in release_dev
| release_dev: driver.table[7] not tty for ()
| Warning: dev (04:41) tty->count(2) != #fd's(1) in tty_open
| Warning: dev (04:08) tty->count(3) != #fd's(1) in tty_open
| Warning: dev (04:41) tty->count(3) != #fd's(2) in tty_open
| Warning: dev (04:41) tty->count(3) != #fd's(2) in release_dev
| Warning: dev (04:41) tty->count(3) != #fd's(2) in tty_open

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 21041 transmit timed out
  2003-05-11  9:53 ` Geert Uytterhoeven
@ 2003-06-13 10:55   ` Geert Uytterhoeven
  0 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2003-06-13 10:55 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development

On Sun, 11 May 2003, Geert Uytterhoeven wrote:
> On Sun, 20 Apr 2003, Geert Uytterhoeven wrote:
> > Under heavy network activity (e.g. downloading ISOs), my Tulip card (D-Link
> > DE-530+ with DECchip 21041) still goes down with 2.4.20.
> >
> > Suddenly I start getting messages of the form:
> > | NETDEV WATCHDOG: eth0: transmit timed out
> > | eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
> > | eth0: 21143 100baseTx sensed media.
> >
> > and the network no longer works. Sometimes it automatically recovers after a
> > while (without printing additional messages), but usually I need to do a manual
> > ifconfig down/up sequence to revive the network.

    [...]

> It still takes a few minutes to recover, but at least it recovers
> automatically, and within the TCP timeout limit, so no connections are lost.
> 
> I tested this by transfering several gigabytes across the network using netcat,
> and it always recovered without needing manual intervention.
> 
> So I suggest the following patch:
> 
> --- linux-2.4.20/drivers/net/tulip/tulip_core.c.orig	Tue Oct 29 18:41:48 2002
> +++ linux-2.4.20/drivers/net/tulip/tulip_core.c	Sun May 11 11:31:29 2003
> @@ -575,7 +575,7 @@
>  					dev->if_port = 2 - dev->if_port;
>  				} else
>  					dev->if_port = 0;
> -			else
> +			else if (dev->if_port != 0 || (csr12 & 0x0004) != 0)
>  				dev->if_port = 1;
>  			tulip_select_media(dev, 0);
>  		}
> 
> Any comments?

I can confirm that this patch fixed my problem. Sometimes transmission still
times out, but the driver recovers automatically within 3 minutes. The machine
has been up for more than one month.

Still no comments?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 21041 transmit timed out
  2003-04-20 11:12 Geert Uytterhoeven
@ 2003-05-11  9:53 ` Geert Uytterhoeven
  2003-06-13 10:55   ` Geert Uytterhoeven
  0 siblings, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2003-05-11  9:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development

On Sun, 20 Apr 2003, Geert Uytterhoeven wrote:
> Under heavy network activity (e.g. downloading ISOs), my Tulip card (D-Link
> DE-530+ with DECchip 21041) still goes down with 2.4.20.
>
> Suddenly I start getting messages of the form:
> | NETDEV WATCHDOG: eth0: transmit timed out
> | eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
> | eth0: 21143 100baseTx sensed media.
>
> and the network no longer works. Sometimes it automatically recovers after a
> while (without printing additional messages), but usually I need to do a manual
> ifconfig down/up sequence to revive the network.
>
> The register values may differ. Last time I saw these, with their respective
> number of occurrencies:
>
>  1741 x status fc260010, CSR12 000000c8, CSR13 ffffef09, CSR14 ffffff7f
>   581 x status fc260010, CSR12 000002c8, CSR13 ffffef09, CSR14 ffffff7f
>    20 x status fc260010, CSR12 000050c8, CSR13 ffffef09, CSR14 fffff7fd
>     6 x status fc260010, CSR12 000052c8, CSR13 ffffef09, CSR14 fffff7fd
>    28 x status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff
>
> Once it printed
>
> | eth0: 21041 media switched to 10baseT.
>
> after which the network seemed to work again for a few minutes.
>
> I also got 27 times
>
> | eth0: No 21041 10baseT link beat, Media switched to 10base2.
>
> which I find, together with the zillions of `21143 100baseTx sensed media'
> messages very strange, since the card is 10 Mbps-only and connected using UTP
> to a 10 Mbps-only Ethernet hub (D-Link DE816-TP).

I did some more investigations...

If a transmit timeout happens on a 21041, the current code changes the
interface medium:
 1. If 10base2 (if_port = 1) and no 10baseT link beat, stay on 10base2 (1)
 2. If AUI (2) and no 10baseT link beat, swith to 10baseT (0)
 3. If 10base2 (1) or AUI (2) and the 10baseT link beat is present, swith to
    10baseT (0)
 4. If 10baseT (0), switch to 10base2 (1)

To me it looks like these rules are flawed. Apart from the fishyness of the
first two rules, rule 4 unconditionally switches my interface to 10base2, while
I still have a 10baseT link beat ((csr12 & 4) == 0)!

As removing this media change code didn't help, I modified the code in the
following way:

--- linux-2.4.20/drivers/net/tulip/tulip_core.c.orig	Tue Oct 29 18:41:48 2002
+++ linux-2.4.20/drivers/net/tulip/tulip_core.c	Sun May  4 11:51:36 2003
@@ -570,13 +570,20 @@
 			   inl(ioaddr + CSR13), inl(ioaddr + CSR14));
 		tp->mediasense = 1;
 		if ( ! tp->medialock) {
-			if (dev->if_port == 1 || dev->if_port == 2)
+			if (dev->if_port == 1 || dev->if_port == 2) {
+printk("Switching from port %d", dev->if_port);
 				if (csr12 & 0x0004) {
 					dev->if_port = 2 - dev->if_port;
 				} else
 					dev->if_port = 0;
-			else
+printk(" to port %d", dev->if_port);
+			} else if (dev->if_port == 0 && !(csr12 & 0x0004)) {
+printk("Staying on 10baseT\n");
+			} else {
+printk("Switching from port %d", dev->if_port);
 				dev->if_port = 1;
+printk(" to 10base2\n");
+			}
 			tulip_select_media(dev, 0);
 		}
 	} else if (tp->chip_id == DC21140 || tp->chip_id == DC21142


And now it recovers when a transmit timeout is detected:

| May  4 14:36:00 callisto kernel: eth0: Media selection tick, 10baseT, status fc2e0000 mode fffe2202 SIA 00005108 ffffef01 ffffffff ffff0008.
| May  4 14:36:00 callisto kernel: eth0: 21041 media tick  CSR12 00005108.
| May  4 14:36:30 callisto kernel: eth0: Media selection tick, 10baseT, status fc660000 mode fffe2202 SIA 000051c8 ffffef01 ffffffff ffff0008.
| May  4 14:36:30 callisto kernel: eth0: 21041 media tick  CSR12 000051c8.

----> Here it goes wrong

| May  4 14:36:37 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:36:37 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...

(csr12 & 4) == 0, so there's still a 10baseT link beat

| May  4 14:36:37 callisto kernel: Staying on 10baseT
| May  4 14:36:37 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:36:37 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:36:37 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:36:45 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:36:45 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:36:45 callisto kernel: Staying on 10baseT
| May  4 14:36:45 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:36:45 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:36:45 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:36:54 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:36:54 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:36:54 callisto kernel: Staying on 10baseT
| May  4 14:36:54 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:36:54 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:36:54 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:00 callisto kernel: eth0: Media selection tick, 10baseT, status fc660000 mode fffe2002 SIA 000051c8 ffffef01 ffffffff ffff0008.
| May  4 14:37:00 callisto kernel: eth0: 21041 media tick  CSR12 000051c8.
| May  4 14:37:02 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:02 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:02 callisto kernel: Staying on 10baseT
| May  4 14:37:02 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:02 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:02 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:11 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:11 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:11 callisto kernel: Staying on 10baseT
| May  4 14:37:11 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:11 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:11 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:19 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:19 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:19 callisto kernel: Staying on 10baseT
| May  4 14:37:19 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:19 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:19 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:27 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:27 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:27 callisto kernel: Staying on 10baseT
| May  4 14:37:27 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:27 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:27 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:30 callisto kernel: eth0: Media selection tick, 10baseT, status fc660000 mode fffe2002 SIA 000051c8 ffffef01 ffffffff ffff0008.
| May  4 14:37:30 callisto kernel: eth0: 21041 media tick  CSR12 000051c8.
| May  4 14:37:36 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:36 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:36 callisto kernel: Staying on 10baseT
| May  4 14:37:36 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:36 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:36 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:44 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:44 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:44 callisto kernel: Staying on 10baseT
| May  4 14:37:44 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:44 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:44 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:37:52 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:37:52 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:37:52 callisto kernel: Staying on 10baseT
| May  4 14:37:52 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:37:52 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:37:52 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:00 callisto kernel: eth0: Media selection tick, 10baseT, status fc660000 mode fffe2002 SIA 000051c8 ffffef01 ffffffff ffff0008.
| May  4 14:38:00 callisto kernel: eth0: 21041 media tick  CSR12 000051c8.
| May  4 14:38:01 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:01 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:01 callisto kernel: Staying on 10baseT
| May  4 14:38:01 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:01 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:01 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:09 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:09 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 00005108, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:09 callisto kernel: Staying on 10baseT
| May  4 14:38:09 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:09 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:09 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:18 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:18 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:18 callisto kernel: Staying on 10baseT
| May  4 14:38:18 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:18 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:18 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:26 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:26 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:26 callisto kernel: Staying on 10baseT
| May  4 14:38:26 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:26 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:26 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:30 callisto kernel: eth0: Media selection tick, 10baseT, status fc660000 mode fffe2002 SIA 000051c8 ffffef01 ffffffff ffff0008.
| May  4 14:38:30 callisto kernel: eth0: 21041 media tick  CSR12 000051c8.
| May  4 14:38:34 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:34 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:34 callisto kernel: Staying on 10baseT
| May  4 14:38:34 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:34 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:34 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:43 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:43 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:43 callisto kernel: Staying on 10baseT
| May  4 14:38:43 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:43 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:43 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:51 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:51 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:51 callisto kernel: Staying on 10baseT
| May  4 14:38:51 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:51 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:51 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:38:59 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:38:59 callisto kernel: eth0: 21041 transmit timed out, status fc660000, CSR12 00005108, CSR13 ffffef01, CSR14 ffffffff, resetting...
| May  4 14:38:59 callisto kernel: Staying on 10baseT
| May  4 14:38:59 callisto kernel: eth0: 21041 using media 10baseT, CSR12 is 51c8.
| May  4 14:38:59 callisto kernel: eth0: The transmitter stopped.  CSR5 is fc678106, CSR6 fffe2002, new CSR6 80020000.
| May  4 14:38:59 callisto kernel: eth0: 21143 link status interrupt 000011c4, CSR5 fc678106, ffffffff.
| May  4 14:39:00 callisto kernel: eth0: Media selection tick, 10baseT, status fc260000 mode fffe2002 SIA 000011c4 ffffef01 ffffffff ffff0008.
| May  4 14:39:00 callisto kernel: eth0: 21041 media tick  CSR12 000011c4.
| May  4 14:39:00 callisto kernel: eth0: No 21041 10baseT link beat, Media switched to 10base2.

----> Here it lost the 10baseT link beat (why?), so it switches to 10base2

| May  4 14:39:08 callisto kernel: NETDEV WATCHDOG: eth0: transmit timed out
| May  4 14:39:08 callisto kernel: eth0: 21041 transmit timed out, status fc260010, CSR12 000052c8, CSR13 ffffef09, CSR14 fffff7fd, resetting...
| May  4 14:39:08 callisto kernel: Switching from port 1 to port 0<7>eth0: 21041 using media 10baseT, CSR12 is 52c8.

---> Here it switches back to 10baseT

| May  4 14:39:08 callisto kernel: eth0: 21143 link status interrupt 000010c4, CSR5 fc268110, ffffffff.
| May  4 14:39:10 callisto kernel: eth0: Media selection tick, 10baseT, status fc260000 mode fffe2002 SIA 000061c8 ffffef01 ffffffff ffff0008.
| May  4 14:39:10 callisto kernel: eth0: 21041 media tick  CSR12 000061c8.
| May  4 14:39:10 callisto kernel: eth0: Out-of-sync dirty pointer, 2922990 vs. 2923007.

----> Here the card has recoverd

| May  4 14:39:40 callisto kernel: eth0: Media selection tick, 10baseT, status fc360000 mode fffe2002 SIA 00005148 ffffef01 ffffffff ffff0008.
| May  4 14:39:40 callisto kernel: eth0: 21041 media tick  CSR12 00005148.

It still takes a few minutes to recover, but at least it recovers
automatically, and within the TCP timeout limit, so no connections are lost.

I tested this by transfering several gigabytes across the network using netcat,
and it always recovered without needing manual intervention.

So I suggest the following patch:

--- linux-2.4.20/drivers/net/tulip/tulip_core.c.orig	Tue Oct 29 18:41:48 2002
+++ linux-2.4.20/drivers/net/tulip/tulip_core.c	Sun May 11 11:31:29 2003
@@ -575,7 +575,7 @@
 					dev->if_port = 2 - dev->if_port;
 				} else
 					dev->if_port = 0;
-			else
+			else if (dev->if_port != 0 || (csr12 & 0x0004) != 0)
 				dev->if_port = 1;
 			tulip_select_media(dev, 0);
 		}

Any comments?

BTW, is there a version of the new de2104x driver in 2.5.x available for
2.4.20?

FYI, the debug code commented out with `#if defined(way_too_many_messages)'
doesn't work, but I haven't found out yet why exactly it crashes the kernel,
though.


Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

* 21041 transmit timed out
@ 2003-04-20 11:12 Geert Uytterhoeven
  2003-05-11  9:53 ` Geert Uytterhoeven
  0 siblings, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2003-04-20 11:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development

	Hi Jeff,

Under heavy network activity (e.g. downloading ISOs), my Tulip card (D-Link
DE-530+ with DECchip 21041) still goes down with 2.4.20.

Suddenly I start getting messages of the form:
| NETDEV WATCHDOG: eth0: transmit timed out
| eth0: 21041 transmit timed out, status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff, resetting...
| eth0: 21143 100baseTx sensed media.

and the network no longer works. Sometimes it automatically recovers after a
while (without printing additional messages), but usually I need to do a manual
ifconfig down/up sequence to revive the network.

The register values may differ. Last time I saw these, with their respective
number of occurrencies:

 1741 x status fc260010, CSR12 000000c8, CSR13 ffffef09, CSR14 ffffff7f
  581 x status fc260010, CSR12 000002c8, CSR13 ffffef09, CSR14 ffffff7f
   20 x status fc260010, CSR12 000050c8, CSR13 ffffef09, CSR14 fffff7fd
    6 x status fc260010, CSR12 000052c8, CSR13 ffffef09, CSR14 fffff7fd
   28 x status fc660000, CSR12 000051c8, CSR13 ffffef01, CSR14 ffffffff

Once it printed

| eth0: 21041 media switched to 10baseT.

after which the network seemed to work again for a few minutes.

I also got 27 times

| eth0: No 21041 10baseT link beat, Media switched to 10base2.

which I find, together with the zillions of `21143 100baseTx sensed media'
messages very strange, since the card is 10 Mbps-only and connected using UTP
to a 10 Mbps-only Ethernet hub (D-Link DE816-TP).

Driver startup:

| Tulip driver version 0.9.15-pre12 (Aug 9, 2002)
| PCI: Enabling device 00:04.0 (0000 -> 0003)
| tulip0: 21041 Media table, default media 0800 (Autosense).
| tulip0:  21041 media #0, 10baseT.
| tulip0:  21041 media #4, 10baseT-FDX.
| tulip0:  21041 media #1, 10base2.
| eth0: Digital DC21041 Tulip rev 33 at 0xc9855000, 21041 mode, 00:80:C8:5A:F8:5B, IRQ 29.

lspci output:

| 00:04.0 Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] (rev 21)
| 	Subsystem: D-Link System Inc DE-530+
| 	Flags: bus master, medium devsel, latency 0, IRQ 29
| 	I/O ports at 1080 [size=128]
| 	Memory at c1080000 (32-bit, non-prefetchable) [size=128]
| 	Expansion ROM at c11c0000 [disabled] [size=256K]

The machine is a PPC box (CHRP LongTrail).

Is there _anything_ I can do to help resolve this problem? Which debug options
do I have to enable to give you more information?

Thanks!

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

* 21041 transmit timed out
@ 2002-12-29 14:37 Geert Uytterhoeven
  0 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2002-12-29 14:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: tulip-users, Linux Kernel Development


Apparently this problem is still present in 2.4.20. It's been many months ago I
saw it, though.

| NETDEV WATCHDOG: eth0: transmit timed out
| eth0: 21041 transmit timed out, status fc6908c5, CSR12 000051c8, CSR13 ffff0001, CSR14 ffffffff, resetting...
| eth0: 21143 100baseTx sensed media.
        ^^^^^^^^^^^^^^^
ifconfig down/up fixed the problem.

lspci output:

| 00:04.0 Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] (rev 21)
| 	Subsystem: D-Link System Inc DE-530+
| 	Flags: bus master, medium devsel, latency 0, IRQ 29
| 	I/O ports at 1080 [size=128]
| 	Memory at c1080000 (32-bit, non-prefetchable) [size=128]
| 	Expansion ROM at c11c0000 [disabled] [size=256K]

Driver startup output:

| Linux Tulip driver version 0.9.15-pre12 (Aug 9, 2002)
| PCI: Enabling device 00:04.0 (0000 -> 0003)
| tulip0: 21041 Media table, default media 0800 (Autosense).
| tulip0:  21041 media #0, 10baseT.
| tulip0:  21041 media #4, 10baseT-FDX.
| tulip0:  21041 media #1, 10base2.
| eth0: Digital DC21041 Tulip rev 33 at 0xc9855000, 21041 mode, 00:80:C8:5A:F8:5B, IRQ 29.

Machine is a PPC box (CHRP LongTrail).

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-06-13 10:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.05.10012201318030.1508-100000@callisto.of.borg>
2001-03-12 13:19 ` 21041 transmit timed out Geert Uytterhoeven
2001-03-12 17:34   ` Geert Uytterhoeven
2002-12-29 14:37 Geert Uytterhoeven
2003-04-20 11:12 Geert Uytterhoeven
2003-05-11  9:53 ` Geert Uytterhoeven
2003-06-13 10:55   ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).