linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ISSUE: DFE530-TX REV-A3-1 times out on transmit
@ 2001-08-24 14:24 David Schmitt
  2001-08-25 17:05 ` Urban Widmark
  0 siblings, 1 reply; 12+ messages in thread
From: David Schmitt @ 2001-08-24 14:24 UTC (permalink / raw)
  To: linux-kernel


[1.] One line summary of the problem:
	DFE530-TX REV-A3-1 times out on transmit

[2.] Full description of the problem/report:
	After receiving ~50 MB of network traffic the DFE530-TX
	(REV-A3-1) starts emitting 

Aug 24 11:13:57 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 24 11:13:57 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
Aug 24 11:13:57 cheesy kernel: eth0: reset finished after 5 microseconds.

	After some more traffic the resets stop working and the card
	cannot transmit or receive anymore.

Aug 24 11:15:07 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 24 11:15:07 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
Aug 24 11:15:07 cheesy kernel: eth0: reset did not complete in 10 ms.
Aug 24 11:15:07 cheesy kernel: eth0: reset finished after 10005 microseconds.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #1 queued in slot 0.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #2 queued in slot 1.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #3 queued in slot 2.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #4 queued in slot 3.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #5 queued in slot 4.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #6 queued in slot 5.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #7 queued in slot 6.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #8 queued in slot 7.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #9 queued in slot 8.
Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #10 queued in slot 9.
Aug 24 11:15:09 cheesy kernel: eth0: VIA Rhine monitor tick, status 0000.
Aug 24 11:15:11 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 24 11:15:11 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
Aug 24 11:15:11 cheesy kernel: eth0: reset did not complete in 10 ms.
Aug 24 11:15:11 cheesy kernel: eth0: reset finished after 10005 microseconds.

	Reloading the module doesn't help either. Only a reboot
	reenables network connectivity.

[3.] Keywords (i.e., modules, networking, kernel):
	d-link, dfe530-tx rev-a3-1, networking, transmit
	NETDEV WATCHDOG: transmit timed out
	nic network card pci

[4.] Kernel version (from /proc/version):

Linux version 2.4.9-cheesy-1 (david@cheesy) (gcc version 2.95.4 20010319 (Debian prerelease)) #1 Wed Aug 22 17:21:16 CEST 2001

[6.] A small shell script or example program which triggers the
	problem (if possible)

	Downloading amounts of data (>50MB) will eventually trigger
	the problem. Transmitting data at less than full speed will
	not trigger it (or at least I haven't waited long enough?)

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

Linux cheesy 2.4.9-cheesy-1 #1 Wed Aug 22 17:21:16 CEST 2001 i686 unknown
Kernel modules         2.4.6
Gnu C                  2.95.4
Binutils               2.11.90.0.27
Linux C Library        2.2.3
Dynamic Linker (ld.so) 2.2.3
Procps                 2.0.7
Mount                  2.11h
Net-tools              1.60
Kbd                    0.2.3
Sh-utils               2.0.11

[7.2.] Processor information (from /proc/cpuinfo):

cheesy:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 4
model name      : AMD Athlon(tm) Processor
stepping        : 2
cpu MHz         : 1199.699
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips        : 2392.06

cheesy:~#


[7.3.] Module information (from /proc/modules):

cheesy:~# cat /proc/modules
serial                 42816   0 (autoclean)
via-rhine              10704   1
unix                   15008   4 (autoclean)
ide-disk                6912   4 (autoclean)
ide-probe-mod           8592   0 (autoclean)
ide-mod                68432   4 (autoclean) [ide-disk ide-probe-mod]
ext2                   33424   2 (autoclean)
cheesy:~#

[7.4.] SCSI information (from /proc/scsi/scsi):

No SCSI.

[7.5.] Other information that might be relevant to the problem
	(please look in /proc and include all information that you
	think to be relevant):

cheesy:/proc# cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-1ffebfff : System RAM
  00100000-001b6e77 : Kernel code
  001b6e78-001f3a7f : Kernel data
1ffec000-1ffeefff : ACPI Tables
1ffef000-1fffefff : reserved
1ffff000-1fffffff : ACPI Non-volatile Storage
dd000000-dd0000ff : VIA Technologies, Inc. Ethernet Controller
  dd000000-dd0000ff : via-rhine
dd800000-dfdfffff : PCI Bus #01
  dd800000-dd800fff : ATI Technologies Inc Rage XL AGP
  de000000-deffffff : ATI Technologies Inc Rage XL AGP
dff00000-dfffffff : PCI Bus #01
e0000000-e7ffffff : VIA Technologies, Inc. VT8363/8365 [KT133/KM133]
ffff0000-ffffffff : reserved
cheesy:/proc# cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(set)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(set)
0cf8-0cff : PCI conf1
9400-94ff : VIA Technologies, Inc. Ethernet Controller
  9400-94ff : via-rhine
a000-a003 : VIA Technologies, Inc. AC97 Audio Controller
a400-a403 : VIA Technologies, Inc. AC97 Audio Controller
a800-a8ff : VIA Technologies, Inc. AC97 Audio Controller
b000-b01f : VIA Technologies, Inc. UHCI USB (#2)
b400-b41f : VIA Technologies, Inc. UHCI USB
b800-b80f : VIA Technologies, Inc. Bus Master IDE
  b800-b807 : ide0
  b808-b80f : ide1
d000-dfff : PCI Bus #01
  d800-d8ff : ATI Technologies Inc Rage XL AGP
e200-e27f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
e800-e80f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]

[X.] Other notes, patches, fixes, workarounds

Further information from lspci, via-diag and ifconfig output as well
as well as complete kernel syslog from boot to network-lock can be
found on http://www.heureka.co.at/~david/dfe530tx/


Short lspci output:

00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:04.5 Multimedia audio controller: VIA Technologies, Inc. AC97 Audio Controller (rev 50)
00:0b.0 Ethernet controller: VIA Technologies, Inc. Ethernet Controller (rev 43)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP (rev 27)


Thank you for your time and work!

Regards, David Schmitt
-- 
Report, Hardware and Bandwidth provided by Heureka GesmbH, Austria.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-24 14:24 ISSUE: DFE530-TX REV-A3-1 times out on transmit David Schmitt
@ 2001-08-25 17:05 ` Urban Widmark
  2001-08-27  8:27   ` David Schmitt
  0 siblings, 1 reply; 12+ messages in thread
From: Urban Widmark @ 2001-08-25 17:05 UTC (permalink / raw)
  To: David Schmitt; +Cc: linux-kernel

On Fri, 24 Aug 2001, David Schmitt wrote:

> Aug 24 11:15:07 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> Aug 24 11:15:07 cheesy kernel: eth0: reset did not complete in 10 ms.
> Aug 24 11:15:07 cheesy kernel: eth0: reset finished after 10005 microseconds.
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #1 queued in slot 0.
[snip]
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #10 queued in slot 9.
> Aug 24 11:15:09 cheesy kernel: eth0: VIA Rhine monitor tick, status 0000.
> Aug 24 11:15:11 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 24 11:15:11 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> Aug 24 11:15:11 cheesy kernel: eth0: reset did not complete in 10 ms.
> Aug 24 11:15:11 cheesy kernel: eth0: reset finished after 10005 microseconds.
> 
> 	Reloading the module doesn't help either. Only a reboot
> 	reenables network connectivity.

There is a patch in the 2.4.8-acX kernels that fixes a problem with
reseting the card when it is first used. I can't say that I know that it
fixes anything you are seeing, but it could be worth trying.

Did this start with recent versions, or have you never run older kernels
on this hw?

Reloading the module is to the hardware about the same as the watchdog
reset.

Rebooting obviously triggers something else too ... perhaps the BIOS talks
some sense to the card.

> [6.] A small shell script or example program which triggers the
> 	problem (if possible)
> 
> 	Downloading amounts of data (>50MB) will eventually trigger
> 	the problem. Transmitting data at less than full speed will
> 	not trigger it (or at least I haven't waited long enough?)

What do you use to download? from a server on the LAN or something remote?
and how do you slow down the speed of your transmission? How fast is it
when it is fast, and how much do you slow it down?

My other machine does not have anything useful installed, but it did have
chargen and discard open.

nc other.machine chargen > /dev/null
	iptraf says about 64Mbps
nc other.machine discard < /dev/zero
	iptraf says about 44Mbps

Sending about 1.5G in both directions, without problems. I used to have a
netperf setup and that would (more or less) fill the 100Mbps.


> [X.] Other notes, patches, fixes, workarounds
> 
> Further information from lspci, via-diag and ifconfig output as well
> as well as complete kernel syslog from boot to network-lock can be
> found on http://www.heureka.co.at/~david/dfe530tx/

The syslog gives a few hints that something is wrong ...

eth0: Transmit error, Tx status 00008100.
	8 - transmit error
	1 - transmit aborted after excessive collisions

but at the same time the 00 part means that the "collision retry count" is
0 and that it hasn't set a flag that it "experienced collisions in this
transmit event".

I think there were 3 of these, and from all but the last it recovers by
itself. Perhaps the collisions (or whatever it is that the card sees as
collisions) continued for a longer period.

It ends up in "eth0: transmit timed out" and the driver tries to reset the
card. That does not appear to work at all.


It's a nice report, I wish I had something more useful to reply with.

The driver source has links to some datasheets. They might be useful in
improving the reset code.
(Hmm, the tx_timeout code does: reset -> initialise ring -> wait for hw
 but initialise ring talks to the hw, perhaps it should wait for hw first
 ...)

/Urban


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-25 17:05 ` Urban Widmark
@ 2001-08-27  8:27   ` David Schmitt
  2001-08-27  9:13     ` David Schmitt
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Schmitt @ 2001-08-27  8:27 UTC (permalink / raw)
  To: linux-kernel

On Sat, Aug 25, 2001 at 07:05:26PM +0200, Urban Widmark wrote:
> On Fri, 24 Aug 2001, David Schmitt wrote:
> 
> > Aug 24 11:15:07 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Aug 24 11:15:07 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> > Aug 24 11:15:07 cheesy kernel: eth0: reset did not complete in 10 ms.
> > Aug 24 11:15:07 cheesy kernel: eth0: reset finished after 10005 microseconds.
> > Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #1 queued in slot 0.
> [snip]
> > Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #10 queued in slot 9.
> > Aug 24 11:15:09 cheesy kernel: eth0: VIA Rhine monitor tick, status 0000.
> > Aug 24 11:15:11 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Aug 24 11:15:11 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> > Aug 24 11:15:11 cheesy kernel: eth0: reset did not complete in 10 ms.
> > Aug 24 11:15:11 cheesy kernel: eth0: reset finished after 10005 microseconds.
> > 
> > 	Reloading the module doesn't help either. Only a reboot
> > 	reenables network connectivity.
> 
> There is a patch in the 2.4.8-acX kernels that fixes a problem with
> reseting the card when it is first used. I can't say that I know that it
> fixes anything you are seeing, but it could be worth trying.

Ok, I will try that too and report back.

> Did this start with recent versions, or have you never run older kernels
> on this hw?

I tried it now with 2.2.19 and killed it too. See below for details.

> Reloading the module is to the hardware about the same as the watchdog
> reset.

Good news: Under 2.2.19, reloading the module indeed reset the card,
so that it worked again. 

I will upload debugoutput from 2.2.19 too
(http://www.heureka.co.at/~david/dfe530tx/)

> Rebooting obviously triggers something else too ... perhaps the BIOS talks
> some sense to the card.

As mentioned above, it seems like the 2.2.19 version does the Right
Thing (but doesn't recover autmatically).

> > [6.] A small shell script or example program which triggers the
> > 	problem (if possible)
> > 
> > 	Downloading amounts of data (>50MB) will eventually trigger
> > 	the problem. Transmitting data at less than full speed will
> > 	not trigger it (or at least I haven't waited long enough?)
> 
> What do you use to download? from a server on the LAN or something remote?
> and how do you slow down the speed of your transmission? How fast is it
> when it is fast, and how much do you slow it down?

Ok, I could reproduce it kinda more systematically:

# ssh other.machine cat /dev/zero

Generates about 2Mbit incoming traffic. This doesn't trigger the
problem.

but doing one or two parallel ping -f other.machine locks the NIC for
good.


> > [X.] Other notes, patches, fixes, workarounds
> > 
> > Further information from lspci, via-diag and ifconfig output as well
> > as well as complete kernel syslog from boot to network-lock can be
> > found on http://www.heureka.co.at/~david/dfe530tx/
> 
> The syslog gives a few hints that something is wrong ...
> 
> eth0: Transmit error, Tx status 00008100.
> 	8 - transmit error
> 	1 - transmit aborted after excessive collisions
> 
> but at the same time the 00 part means that the "collision retry count" is
> 0 and that it hasn't set a flag that it "experienced collisions in this
> transmit event".

The network where the DFE530TX (and the other.machine) are attached
contains some 20-30 Windows PCs and some Novell Servers which all seem
quite braodcast-happy. The network itself is (mostly) unswitched and
10Mbit halfduplex, so I guess this really is connected to the
collisions.


> I think there were 3 of these, and from all but the last it recovers by
> itself. Perhaps the collisions (or whatever it is that the card sees as
> collisions) continued for a longer period.




> It ends up in "eth0: transmit timed out" and the driver tries to reset the
> card. That does not appear to work at all.

Under 2.2.19 the card doesn't recover automatically (lacking the
watchdog) but manually reloading the module works.

> It's a nice report, I wish I had something more useful to reply with.

Well, you pointed me in the right direction :-)

> The driver source has links to some datasheets. They might be useful in
> improving the reset code.
> (Hmm, the tx_timeout code does: reset -> initialise ring -> wait for hw
>  but initialise ring talks to the hw, perhaps it should wait for hw first
>  ...)

I'm not really into hacking C but, I will try to provide as much info
as possible.


Regards, David
-- 
Sponsored by heureKA, Austria (http://www.heureka.co.at)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-27  8:27   ` David Schmitt
@ 2001-08-27  9:13     ` David Schmitt
  2001-08-27 19:02     ` Urban Widmark
  2001-08-28  7:26     ` Dennis Bjorklund
  2 siblings, 0 replies; 12+ messages in thread
From: David Schmitt @ 2001-08-27  9:13 UTC (permalink / raw)
  To: linux-kernel

Hi!

sorry for replying on my on message.

On Mon, Aug 27, 2001 at 10:27:40AM +0200, David Schmitt wrote:
> On Sat, Aug 25, 2001 at 07:05:26PM +0200, Urban Widmark wrote:
> > On Fri, 24 Aug 2001, David Schmitt wrote:
> > > 	Reloading the module doesn't help either. Only a reboot
> > > 	reenables network connectivity.
> > 
> > There is a patch in the 2.4.8-acX kernels that fixes a problem with
> > reseting the card when it is first used. I can't say that I know that it
> > fixes anything you are seeing, but it could be worth trying.
> 
> Ok, I will try that too and report back.

Nope. Using the patched via-rhine.c from 2.4.8-ac12 didn't help.


Regards, David Schmitt
-- 
Sponsored by heureKA, Austria (http://www.heureka.co.at)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-27  8:27   ` David Schmitt
  2001-08-27  9:13     ` David Schmitt
@ 2001-08-27 19:02     ` Urban Widmark
  2001-08-28 13:45       ` David Schmitt
  2001-08-28  7:26     ` Dennis Bjorklund
  2 siblings, 1 reply; 12+ messages in thread
From: Urban Widmark @ 2001-08-27 19:02 UTC (permalink / raw)
  To: David Schmitt; +Cc: linux-kernel

On Mon, 27 Aug 2001, David Schmitt wrote:

> > Reloading the module is to the hardware about the same as the watchdog
> > reset.
> 
> Good news: Under 2.2.19, reloading the module indeed reset the card,
> so that it worked again. 

Interesting ...

> > Rebooting obviously triggers something else too ... perhaps the BIOS talks
> > some sense to the card.
> 
> As mentioned above, it seems like the 2.2.19 version does the Right
> Thing (but doesn't recover autmatically).

The 2.2.19 version doesn't do anything on timeout (except print a message
that it is resetting, which it isn't :). The driver has a few changes
during the 2.4 series:

2.4.3 was patched to actually reset things on tx_timeout, but that also
changed the startup sequence.

2.4.6 got changes to reload certain things from eeprom when the driver is
loaded (fix a problem with booting from win98 that does a power down).

2.4.7 changes to the transmit code to use "singlecopy" for unaligned
buffers.

2.4.8-acX fixed a bug in the startup code from 2.4.3

Testing the 2.4.2 and 2.4.3 drivers could give something (should work to
simply copy the drivers/net/via-rhine.c from the different versions into a
2.4.9 source tree).


> but doing one or two parallel ping -f other.machine locks the NIC for
> good.

Good (that you have a reliable way to trigger this). For about how long do
you need to run this?

> The network where the DFE530TX (and the other.machine) are attached
> contains some 20-30 Windows PCs and some Novell Servers which all seem
> quite braodcast-happy. The network itself is (mostly) unswitched and
> 10Mbit halfduplex, so I guess this really is connected to the
> collisions.

Depending on the sort of access you have, you could test unplugging
everyone else and repeat the 'ping -f' test.

I don't have the hardware to test now, but when time permits ...

/Urban


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-27  8:27   ` David Schmitt
  2001-08-27  9:13     ` David Schmitt
  2001-08-27 19:02     ` Urban Widmark
@ 2001-08-28  7:26     ` Dennis Bjorklund
  2 siblings, 0 replies; 12+ messages in thread
From: Dennis Bjorklund @ 2001-08-28  7:26 UTC (permalink / raw)
  To: David Schmitt; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 522 bytes --]

On Mon, 27 Aug 2001, David Schmitt wrote:

> As mentioned above, it seems like the 2.2.19 version does the Right
> Thing (but doesn't recover autmatically).

I backported the recover stuff from 2.4.x to 2.2.20-pre9 and it works nice
for me. I've been running it for a couple of weeks without problem and
where it before locked up it now resets (clear improvement for me).

I sent it to Alan in hope that it could make it to 2.2.20 but I got no
reply. I don't know if I should continue send it to him or what.

-- 
/Dennis

[-- Attachment #2: Type: TEXT/PLAIN, Size: 12736 bytes --]

--- linux/drivers/net/via-rhine.c	Sat Aug 18 14:10:07 2001
+++ linux-2.2.19/drivers/net/via-rhine.c	Sat Aug 18 14:23:38 2001
@@ -25,11 +25,15 @@
 
 	LK1.0.0:
 	- Urban Widmark: merges from Beckers 1.08b version and 2.4.0 (VT6102)
+
+	LK1.0.1
+	- Dennis Björklund: backport tx_timeout from 2.4.x to reset on timeout
+                        instead of stop working...
 */
 
 /* These identify the driver base version and may not be removed. */
 static const char version1[] =
-"via-rhine.c:v1.08b-LK1.0.0 12/14/2000  Written by Donald Becker\n";
+"via-rhine.c:v1.08b-LK1.0.1 12/14/2000  Written by Donald Becker\n";
 static const char version2[] =
 "  http://www.scyld.com/network/via-rhine.html\n";
 
@@ -95,9 +99,11 @@
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
 #include <linux/init.h>
+#include <linux/delay.h>
 #include <asm/processor.h>		/* Processor type for cache alignment. */
 #include <asm/bitops.h>
 #include <asm/io.h>
+#include <asm/irq.h>
 
 /* Condensed bus+endian portability operations. */
 #define virt_to_le32desc(addr) cpu_to_le32(virt_to_bus(addr))
@@ -256,6 +262,13 @@
 								 struct device *dev, long ioaddr, int irq,
 								 int chp_idx, int fnd_cnt);
 
+enum via_rhine_chips {
+	VT86C100A = 0,
+	VT6102,
+	VT3043,
+};
+
+/* directly indexed by enum via_rhine_chips, above */
 static struct pci_id_info pci_tbl[] __initdata = {
 	{ "VIA VT86C100A Rhine-II", 0x1106, 0x6100, 0xffff,
 	  RHINE_IOTYPE, 128, via_probe1},
@@ -379,7 +392,6 @@
 static void check_duplex(struct device *dev);
 static void netdev_timer(unsigned long data);
 static void tx_timeout(struct device *dev);
-static void init_ring(struct device *dev);
 static int  start_tx(struct sk_buff *skb, struct device *dev);
 static void intr_handler(int irq, void *dev_instance, struct pt_regs *regs);
 static int  netdev_rx(struct device *dev);
@@ -395,6 +407,31 @@
 /* A list of our installed devices, for removing the driver module. */
 static struct device *root_net_dev = NULL;
 
+static void wait_for_reset(struct device *dev, char *name)
+{
+	struct netdev_private *np = dev->priv;
+	long ioaddr = dev->base_addr;
+	int chip_id = np->chip_id;
+	int i;
+
+	/* 3043 may need long delay after reset (dlink) */
+	if (chip_id == VT3043 || chip_id == VT86C100A)
+		udelay(100);
+
+	i = 0;
+	do {
+		udelay(5);
+		i++;
+		if(i > 2000) {
+			printk(KERN_ERR "%s: reset did not complete in 10 ms.\n", name);
+			break;
+		}
+	} while(readw(ioaddr + ChipCmd) & CmdReset);
+	if (debug > 1)
+		printk(KERN_INFO "%s: reset finished after %d microseconds.\n",
+			   name, 5*i);
+}
+
 /* Ideally we would detect all network cards in slot order.  That would
    be best done a central PCI probe dispatch, which wouldn't work
    well when dynamically adding drivers.  So instead we detect just the
@@ -594,6 +631,141 @@
 	return dev;
 }
 
+static void alloc_rbufs(struct device *dev)
+{
+	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	int i;
+
+	np->cur_rx = 0;
+	np->dirty_rx = 0;
+
+	np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
+	np->rx_head_desc = &np->rx_ring[0];
+
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		np->rx_ring[i].rx_status = 0;
+		np->rx_ring[i].desc_length = cpu_to_le32(np->rx_buf_sz);
+		np->rx_ring[i].next_desc = virt_to_le32desc(&np->rx_ring[i+1]);
+		np->rx_skbuff[i] = 0;
+	}
+	/* Mark the last entry as wrapping the ring. */
+	np->rx_ring[i-1].next_desc = virt_to_le32desc(&np->rx_ring[0]);
+
+	/* Fill in the Rx buffers.  Handle allocation failure gracefully. */
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz);
+		np->rx_skbuff[i] = skb;
+		if (skb == NULL)
+			break;
+		skb->dev = dev;			/* Mark as being used by this device. */
+		np->rx_ring[i].addr = virt_to_le32desc(skb->tail);
+		np->rx_ring[i].rx_status = cpu_to_le32(DescOwn);
+	}
+	np->dirty_rx = (unsigned int)(i - RX_RING_SIZE);
+}
+
+static void free_rbufs(struct device* dev)
+{
+	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	int i;
+
+	/* Free all the skbuffs in the Rx queue. */
+	for (i = 0; i < RX_RING_SIZE; i++) {
+		np->rx_ring[i].rx_status = 0;
+		np->rx_ring[i].addr = 0xBADF00D0; /* An invalid address. */
+		if (np->rx_skbuff[i]) {
+#if LINUX_VERSION_CODE < 0x20100
+			np->rx_skbuff[i]->free = 1;
+#endif
+			dev_kfree_skb(np->rx_skbuff[i]);
+		}
+		np->rx_skbuff[i] = 0;
+	}
+}
+
+static void alloc_tbufs(struct device* dev)
+{
+	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	int i;
+
+	np->tx_full = 0;
+	np->cur_tx = 0;
+	np->dirty_tx = 0;
+
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		np->tx_skbuff[i] = 0;
+		np->tx_ring[i].tx_status = 0;
+		np->tx_ring[i].desc_length = cpu_to_le32(0x00e08000);
+		np->tx_ring[i].next_desc = virt_to_le32desc(&np->tx_ring[i+1]);
+		np->tx_buf[i] = kmalloc(PKT_BUF_SZ, GFP_KERNEL);
+	}
+	np->tx_ring[i-1].next_desc = virt_to_le32desc(&np->tx_ring[0]);
+}
+
+static void free_tbufs(struct device *dev)
+{
+	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	int i;
+
+	for (i = 0; i < TX_RING_SIZE; i++) {
+		if (np->tx_skbuff[i])
+			dev_kfree_skb(np->tx_skbuff[i]);
+		np->tx_skbuff[i] = 0;
+		if (np->tx_buf[i]) {
+			kfree(np->tx_buf[i]);
+			np->tx_buf[i] = 0;
+		}
+	}
+}
+
+static void init_registers(struct device *dev)
+{
+	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	long ioaddr = dev->base_addr;
+	int i;
+
+	for (i = 0; i < 6; i++)
+		writeb(dev->dev_addr[i], ioaddr + StationAddr + i);
+
+	/* Initialize other registers. */
+	writew(0x0006, ioaddr + PCIBusConfig);	/* Tune configuration??? */
+	/* Configure the FIFO thresholds. */
+	writeb(0x20, ioaddr + TxConfig);	/* Initial threshold 32 bytes */
+	np->tx_thresh = 0x20;
+	np->rx_thresh = 0x60;				/* Written in set_rx_mode(). */
+
+	if (dev->if_port == 0)
+		dev->if_port = np->default_port;
+
+	dev->tbusy = 0;
+	dev->interrupt = 0;
+
+	writel(virt_to_bus(np->rx_ring), ioaddr + RxRingPtr);
+	writel(virt_to_bus(np->tx_ring), ioaddr + TxRingPtr);
+
+	set_rx_mode(dev);
+
+	dev->start = 1;
+
+	/* Enable interrupts by setting the interrupt mask. */
+	writew(IntrRxDone | IntrRxErr | IntrRxEmpty| IntrRxOverflow| IntrRxDropped|
+		   IntrTxDone | IntrTxAbort | IntrTxUnderrun |
+		   IntrPCIErr | IntrStatsMax | IntrLinkChange | IntrMIIChange,
+		   ioaddr + IntrEnable);
+
+	np->chip_cmd = CmdStart|CmdTxOn|CmdRxOn|CmdNoTxPoll;
+	if (np->duplex_lock)
+		np->chip_cmd |= CmdFDuplex;
+	writew(np->chip_cmd, ioaddr + ChipCmd);
+
+	check_duplex(dev);
+	/* The LED outputs of various MII xcvrs should be configured.  */
+	/* For NS or Mison phys, turn on bit 1 in register 0x17 */
+	/* For ESI phys, turn on bit 7 in register 0x17. */
+	mdio_write(dev, np->phys[0], 0x17, mdio_read(dev, np->phys[0], 0x17) |
+			   (np->drv_flags & HasESIPhy) ? 0x0080 : 0x0001);
+}
+
 \f
 /* Read and write over the MII Management Data I/O (MDIO) interface. */
 
@@ -650,7 +822,6 @@
 {
 	struct netdev_private *np = (struct netdev_private *)dev->priv;
 	long ioaddr = dev->base_addr;
-	int i;
 
 	/* Reset the chip. */
 	writew(CmdReset, ioaddr + ChipCmd);
@@ -666,48 +837,10 @@
 		printk(KERN_DEBUG "%s: netdev_open() irq %d.\n",
 			   dev->name, dev->irq);
 
-	init_ring(dev);
-
-	writel(virt_to_bus(np->rx_ring), ioaddr + RxRingPtr);
-	writel(virt_to_bus(np->tx_ring), ioaddr + TxRingPtr);
-
-	for (i = 0; i < 6; i++)
-		writeb(dev->dev_addr[i], ioaddr + StationAddr + i);
-
-	/* Initialize other registers. */
-	writew(0x0006, ioaddr + PCIBusConfig);	/* Tune configuration??? */
-	/* Configure the FIFO thresholds. */
-	writeb(0x20, ioaddr + TxConfig);	/* Initial threshold 32 bytes */
-	np->tx_thresh = 0x20;
-	np->rx_thresh = 0x60;				/* Written in set_rx_mode(). */
-
-	if (dev->if_port == 0)
-		dev->if_port = np->default_port;
-
-	dev->tbusy = 0;
-	dev->interrupt = 0;
-
-	set_rx_mode(dev);
-
-	dev->start = 1;
+	alloc_rbufs(dev);
+	alloc_tbufs(dev);
 
-	/* Enable interrupts by setting the interrupt mask. */
-	writew(IntrRxDone | IntrRxErr | IntrRxEmpty| IntrRxOverflow| IntrRxDropped|
-		   IntrTxDone | IntrTxAbort | IntrTxUnderrun |
-		   IntrPCIErr | IntrStatsMax | IntrLinkChange | IntrMIIChange,
-		   ioaddr + IntrEnable);
-
-	np->chip_cmd = CmdStart|CmdTxOn|CmdRxOn|CmdNoTxPoll;
-	if (np->duplex_lock)
-		np->chip_cmd |= CmdFDuplex;
-	writew(np->chip_cmd, ioaddr + ChipCmd);
-
-	check_duplex(dev);
-	/* The LED outputs of various MII xcvrs should be configured.  */
-	/* For NS or Mison phys, turn on bit 1 in register 0x17 */
-	/* For ESI phys, turn on bit 7 in register 0x17. */
-	mdio_write(dev, np->phys[0], 0x17, mdio_read(dev, np->phys[0], 0x17) |
-			   (np->drv_flags & HasESIPhy) ? 0x0080 : 0x0001);
+	init_registers(dev);
 
 	if (debug > 2)
 		printk(KERN_DEBUG "%s: Done netdev_open(), status %4.4x "
@@ -775,6 +908,7 @@
 static void tx_timeout(struct device *dev)
 {
 	struct netdev_private *np = (struct netdev_private *)dev->priv;
+	struct pci_dev *pdev = pci_find_slot(np->pci_bus, np->pci_devfn);
 	long ioaddr = dev->base_addr;
 
 	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
@@ -782,60 +916,32 @@
 		   dev->name, readw(ioaddr + IntrStatus),
 		   mdio_read(dev, np->phys[0], 1));
 
-	/* Perhaps we should reinitialize the hardware here. */
 	dev->if_port = 0;
-	/* Stop and restart the chip's Tx processes . */
 
-	/* Trigger an immediate transmit demand. */
-
-	dev->trans_start = jiffies;
-	np->stats.tx_errors++;
-	return;
-}
+	/* protect against concurrent rx interrupts */
+	disable_irq(pdev->irq);
 
+	/* Reset the chip. */
+	writew(CmdReset, ioaddr + ChipCmd);
 
-/* Initialize the Rx and Tx rings, along with various 'dev' bits. */
-static void init_ring(struct device *dev)
-{
-	struct netdev_private *np = (struct netdev_private *)dev->priv;
-	int i;
-
-	np->tx_full = 0;
-	np->cur_rx = np->cur_tx = 0;
-	np->dirty_rx = np->dirty_tx = 0;
+	/* clear all descriptors */
+	free_tbufs(dev);
+	free_rbufs(dev);
+	alloc_tbufs(dev);
+	alloc_rbufs(dev);
+
+	/* Reinitialize the hardware. */
+	wait_for_reset(dev, dev->name);
+	init_registers(dev);
 
-	np->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32);
-	np->rx_head_desc = &np->rx_ring[0];
+	enable_irq(pdev->irq);
 
-	for (i = 0; i < RX_RING_SIZE; i++) {
-		np->rx_ring[i].rx_status = 0;
-		np->rx_ring[i].desc_length = cpu_to_le32(np->rx_buf_sz);
-		np->rx_ring[i].next_desc = virt_to_le32desc(&np->rx_ring[i+1]);
-		np->rx_skbuff[i] = 0;
-	}
-	/* Mark the last entry as wrapping the ring. */
-	np->rx_ring[i-1].next_desc = virt_to_le32desc(&np->rx_ring[0]);
+	dev->trans_start = jiffies;
+	np->stats.tx_errors++;
 
-	/* Fill in the Rx buffers.  Handle allocation failure gracefully. */
-	for (i = 0; i < RX_RING_SIZE; i++) {
-		struct sk_buff *skb = dev_alloc_skb(np->rx_buf_sz);
-		np->rx_skbuff[i] = skb;
-		if (skb == NULL)
-			break;
-		skb->dev = dev;			/* Mark as being used by this device. */
-		np->rx_ring[i].addr = virt_to_le32desc(skb->tail);
-		np->rx_ring[i].rx_status = cpu_to_le32(DescOwn);
-	}
-	np->dirty_rx = (unsigned int)(i - RX_RING_SIZE);
-
-	for (i = 0; i < TX_RING_SIZE; i++) {
-		np->tx_skbuff[i] = 0;
-		np->tx_ring[i].tx_status = 0;
-		np->tx_ring[i].desc_length = cpu_to_le32(0x00e08000);
-		np->tx_ring[i].next_desc = virt_to_le32desc(&np->tx_ring[i+1]);
-		np->tx_buf[i] = kmalloc(PKT_BUF_SZ, GFP_KERNEL);
-	}
-	np->tx_ring[i-1].next_desc = virt_to_le32desc(&np->tx_ring[0]);
+	/* wake queue */
+	clear_bit(0, (void*)&dev->tbusy);
+	mark_bh(NET_BH);
 
 	return;
 }
@@ -1233,7 +1339,6 @@
 {
 	long ioaddr = dev->base_addr;
 	struct netdev_private *np = (struct netdev_private *)dev->priv;
-	int i;
 
 	dev->start = 0;
 	dev->tbusy = 1;
@@ -1255,27 +1360,8 @@
 
 	free_irq(dev->irq, dev);
 
-	/* Free all the skbuffs in the Rx queue. */
-	for (i = 0; i < RX_RING_SIZE; i++) {
-		np->rx_ring[i].rx_status = 0;
-		np->rx_ring[i].addr = 0xBADF00D0; /* An invalid address. */
-		if (np->rx_skbuff[i]) {
-#if LINUX_VERSION_CODE < 0x20100
-			np->rx_skbuff[i]->free = 1;
-#endif
-			dev_kfree_skb(np->rx_skbuff[i]);
-		}
-		np->rx_skbuff[i] = 0;
-	}
-	for (i = 0; i < TX_RING_SIZE; i++) {
-		if (np->tx_skbuff[i])
-			dev_kfree_skb(np->tx_skbuff[i]);
-		np->tx_skbuff[i] = 0;
-		if (np->tx_buf[i]) {
-			kfree(np->tx_buf[i]);
-			np->tx_buf[i] = 0;
-		}
-	}
+	free_rbufs(dev);
+	free_tbufs(dev);
 
 	MOD_DEC_USE_COUNT;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-27 19:02     ` Urban Widmark
@ 2001-08-28 13:45       ` David Schmitt
  2001-08-28 19:46         ` Urban Widmark
  0 siblings, 1 reply; 12+ messages in thread
From: David Schmitt @ 2001-08-28 13:45 UTC (permalink / raw)
  To: linux-kernel

Re!

After fooling around a bit more I can now distinguish three different
states (with 2.4.9):

1) Working (very easy):

Aug 28 13:45:01 cheesy kernel:  In via_rhine_rx(), entry XX status 00668f00.

2) Recoverable troubles (nasty but bearable):

Aug 28 13:45:04 cheesy kernel:  In via_rhine_rx(), entry XX status 005e9700.
Aug 28 13:45:05 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 28 13:45:05 cheesy kernel: eth0: reset finished after 5 microseconds.
Aug 28 13:45:05 cheesy kernel:  In via_rhine_rx(), entry XX status 00668f00.

Aug 28 13:45:32 cheesy kernel:  In via_rhine_rx(), entry XX status 015a9700.
Aug 28 13:45:33 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 28 13:45:33 cheesy kernel: eth0: reset finished after 105 microseconds.
Aug 28 13:45:33 cheesy kernel:  In via_rhine_rx(), entry XX status 00668f00.

Note 1: '005e9700' doesn't always cause a timeout.
Note 2: the delay in the second paragraph.

and

3) Unrecoverable (really bad):

Aug 28 13:45:51 cheesy kernel:  In via_rhine_rx(), entry XX status 00668f00.
Aug 28 13:45:52 cheesy kernel:  In via_rhine_rx(), entry XX status 00668f00.
Aug 28 13:45:52 cheesy kernel:  In via_rhine_rx(), entry XX status 00729700.
Aug 28 13:45:54 cheesy kernel:  In via_rhine_rx(), entry XX status 00669700.
Aug 28 13:45:54 cheesy kernel:  In via_rhine_rx(), entry XX status 00ee9700.
Aug 28 13:45:54 cheesy kernel:  In via_rhine_rx(), entry XX status 00f79700.
Aug 28 13:45:54 cheesy kernel:  In via_rhine_rx(), entry XX status 00669700.
Aug 28 13:45:55 cheesy kernel:  In via_rhine_rx(), entry XX status 005e9700.
Aug 28 13:45:55 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 28 13:45:55 cheesy kernel: eth0: reset finished after 10005 microseconds.
Aug 28 13:45:59 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 28 13:45:59 cheesy kernel: eth0: reset finished after 10005 microseconds.

Then I noticed the following: Upon unload the driver emits some kind
of exit status:

Correct shutdown:
Aug 28 14:53:34 cheesy kernel: eth0: Shutting down ethercard, status was 085a.

After first resets:
Aug 28 14:54:11 cheesy kernel: eth0: Shutting down ethercard, status was 081a.

After total NIC lockup:
Aug 28 14:56:24 cheesy kernel: eth0: Shutting down ethercard, status was 883a.


Wondering if via-diag shows some differences I got this:

root@cheesy # via-diag -mm -aa -ee
via-diag.c:v2.06 5/22/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
 Index #1: Found a VIA VT3065 Rhine-II adapter at 0x9400.
 Station address 00:05:5d:09:90:1f.
 Tx disabled, Rx enabled, half-duplex (0x800c).
  Receive  mode is 0x6c: Normal unicast and hashed multicast.
  Transmit mode is 0x22: Transmitter set to INTERNAL LOOPBACK!.
[..]

This seems to be the problem. Reloading the module does not help
anymore. I'd guess forcing the transmit mode in the reset to something
sane would help??

Any ideas what can be done for further debugging this problem? 



On Mon, Aug 27, 2001 at 09:02:29PM +0200, Urban Widmark wrote:
[kernels]

I tried it with this kernels:

2.2.19

Resets correctly with rm-/insmod

2.2.19 with Dennis' patch

Resets often, but no lock up.

2.4.9 with via-rhine.c from 2.4.2

Resets correctly with rm-/insmod

2.4.9 with via-rhine.c from 2.4.3

First resets correctly. After third or fourth time locks up, with
transmit mode 0x21 (which via-diag says is the same as 0x20)

2.4.9

Resets a fwe times correctly, then see above.


> > but doing one or two parallel ping -f other.machine locks the NIC for
> > good.
> 
> Good (that you have a reliable way to trigger this). For about how long do
> you need to run this?

10-20 seconds.

> > The network where the DFE530TX (and the other.machine) are attached
> > contains some 20-30 Windows PCs and some Novell Servers which all seem
> > quite braodcast-happy. The network itself is (mostly) unswitched and
> > 10Mbit halfduplex, so I guess this really is connected to the
> > collisions.
> 
> Depending on the sort of access you have, you could test unplugging
> everyone else and repeat the 'ping -f' test.

I try and take the card home, there I have some more possibilities to
test.

Regards, David
-- 
Signaturen sind wie Frauen. Man findet selten eine Vernuenftige
	-- gesehen in at.linux

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-28 13:45       ` David Schmitt
@ 2001-08-28 19:46         ` Urban Widmark
  2001-08-29  6:42           ` Dennis Bjorklund
  2001-08-29 12:48           ` David Schmitt
  0 siblings, 2 replies; 12+ messages in thread
From: Urban Widmark @ 2001-08-28 19:46 UTC (permalink / raw)
  To: David Schmitt; +Cc: linux-kernel, Dennis Bjorklund

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1901 bytes --]

On Tue, 28 Aug 2001, David Schmitt wrote:

> Note 1: '005e9700' doesn't always cause a timeout.

That status is from the Rx, not Tx. I think they are all ok.
005e9700 - length=0x5e, RxOK, accept broadcast, single buffer, end buffer
00668f00 - length=0x66, RxOk, chain buffer, single buffer, end buffer


> Correct shutdown:
> Aug 28 14:53:34 cheesy kernel: eth0: Shutting down ethercard, status was 085a.
> 
> After first resets:
> Aug 28 14:54:11 cheesy kernel: eth0: Shutting down ethercard, status was 081a.
> 
> After total NIC lockup:
> Aug 28 14:56:24 cheesy kernel: eth0: Shutting down ethercard, status was 883a.

8000 means that the chip is still doing a software reset. I think the
difference between 085a/081a is simply that you caught it in different
states.


>  Tx disabled, Rx enabled, half-duplex (0x800c).
>   Receive  mode is 0x6c: Normal unicast and hashed multicast.
>   Transmit mode is 0x22: Transmitter set to INTERNAL LOOPBACK!.

Is this after unloading the module? The via_rhine_close sets the
transmitter to loopback mode (comment says to avoid hardware races ...).
The reset code does not.

That should not be a problem, when loading the module (and on reset) it
changes this to normal mode.

> 2.2.19 with Dennis' patch
> 
> Resets often, but no lock up.

That is interesting. This code should be almost identical to 2.4.x (or
not, Dennis?). The way the timeout code is run may be different of course,
but the driver part is the same.

I'm ignoring that for now (if you don't mind) and have made a patch with
some possible improvements. Someone found a modified driver on some dlink
server that contains (claimed) workarounds for various chip peculiarities
(bugs).

I also added a "force software reset" that is described in the datasheet.
Not sure what the difference is, but it can't hurt trying that if the
normal reset fails.

Perhaps this helps, probably not.

/Urban

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2699 bytes --]

--- linux-2.4.9-orig/drivers/net/via-rhine.c	Sun Aug 19 12:08:22 2001
+++ linux-2.4.9-00/drivers/net/via-rhine.c	Tue Aug 28 21:37:18 2001
@@ -497,6 +497,14 @@
 	if (debug > 1)
 		printk(KERN_INFO "%s: reset finished after %d microseconds.\n",
 			   name, 5*i);
+
+	if (chip_id == VT6102 && readw(ioaddr + ChipCmd) & CmdReset) {
+		/* Try to force software reset (we are dead anyway ...) */
+		writeb(0x40, ioaddr + 0x81);
+		for (i=0; i<2000 && (readw(ioaddr + ChipCmd) & CmdReset); i++)
+			udelay(5);
+	}
+
 }
 
 static int __devinit via_rhine_init_one (struct pci_dev *pdev,
@@ -1078,8 +1086,50 @@
 
 	spin_lock(&np->lock);
 
+	/* Disable interrupts by clearing the interrupt mask. */
+	writew(0x0000, ioaddr + IntrEnable);
+
+	/* shutdown code from the driver supposedly modified by D-Link. */
+	if (np->drv_flags & HasWOL) { 
+		int ww;
+
+		/* FIXME: 0x01 isn't loopback according to the docs, it is reserved! */
+		/* Nic Loop Back On */
+		writeb(readb(ioaddr + TxConfig) | 0x01, ioaddr + TxConfig);
+
+		/* Tx Off */
+		writeb(readb(ioaddr + ChipCmd) ^ 0x10, ioaddr + ChipCmd);
+		for (ww = 0; ww < W_MAX_TIMEOUT; ww++) {
+			if ((readb(ioaddr + ChipCmd) & 0x10) == 0)
+				break;
+		}
+
+		/* Rx Off */
+		writeb(readb(ioaddr + ChipCmd) ^ 0x08, ioaddr + ChipCmd);
+		for (ww = 0; ww < W_MAX_TIMEOUT; ww++) {
+			if ((readb(ioaddr + ChipCmd) & 0x08) == 0)
+				break; 
+		} 
+
+		if (ww == W_MAX_TIMEOUT) {
+			/* Turn on fifo test */
+			writew(readw(ioaddr + GFIFOTest) | 0x0001, ioaddr + GFIFOTest);
+			/* Turn on fifo reject */
+			writew(readw(ioaddr + GFIFOTest) | 0x0800, ioaddr + GFIFOTest);
+			/* Turn off fifo test */
+			writew(readw(ioaddr + GFIFOTest) & 0xFFFE, ioaddr + GFIFOTest);
+		}
+
+		/* Nic Loop Back Off */
+		writeb(readb(ioaddr + TxConfig) & 0xFE, ioaddr + TxConfig);
+	}
+
+	/* Stop the chip's Tx and Rx processes. */
+	writew(CmdStop, ioaddr + ChipCmd);
+
 	/* Reset the chip. */
 	writew(CmdReset, ioaddr + ChipCmd);
+	wait_for_reset(dev, dev->name);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1088,7 +1138,6 @@
 	alloc_rbufs(dev);
 
 	/* Reinitialize the hardware. */
-	wait_for_reset(dev, dev->name);
 	init_registers(dev);
 	
 	spin_unlock(&np->lock);
@@ -1554,7 +1603,8 @@
 			   dev->name, readw(ioaddr + ChipCmd));
 
 	/* Switch to loopback mode to avoid hardware races. */
-	writeb(np->tx_thresh | 0x02, ioaddr + TxConfig);
+	/* FIXME: docs say 0x01 is reserved! Becker version set this and not 0x02 */
+	writeb(np->tx_thresh | 0x01, ioaddr + TxConfig);
 
 	/* Disable interrupts by clearing the interrupt mask. */
 	writew(0x0000, ioaddr + IntrEnable);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-28 19:46         ` Urban Widmark
@ 2001-08-29  6:42           ` Dennis Bjorklund
  2001-08-29 12:48           ` David Schmitt
  1 sibling, 0 replies; 12+ messages in thread
From: Dennis Bjorklund @ 2001-08-29  6:42 UTC (permalink / raw)
  To: Urban Widmark; +Cc: David Schmitt, linux-kernel

On Tue, 28 Aug 2001, Urban Widmark wrote:

> > 2.2.19 with Dennis' patch
> >
> > Resets often, but no lock up.
>
> That is interesting. This code should be almost identical to 2.4.x (or
> not, Dennis?). The way the timeout code is run may be different of course,
> but the driver part is the same.

Well. I took (part of) the init code from 2.2.19 and used that for both
init and reset so there might be differences from 2.4.x. But the only
difference I remember that was really different, were some extra
spinlocks.

-- 
/Dennis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-28 19:46         ` Urban Widmark
  2001-08-29  6:42           ` Dennis Bjorklund
@ 2001-08-29 12:48           ` David Schmitt
  2001-08-29 18:45             ` Urban Widmark
  1 sibling, 1 reply; 12+ messages in thread
From: David Schmitt @ 2001-08-29 12:48 UTC (permalink / raw)
  Cc: linux-kernel

On Tue, Aug 28, 2001 at 09:46:18PM +0200, Urban Widmark wrote:
> I'm ignoring that for now (if you don't mind) and have made a patch with
> some possible improvements. Someone found a modified driver on some dlink
> server that contains (claimed) workarounds for various chip peculiarities
> (bugs).
> 
> I also added a "force software reset" that is described in the datasheet.
> Not sure what the difference is, but it can't hurt trying that if the
> normal reset fails.
> 
> Perhaps this helps, probably not.

under 'normal loads' (ie one tcp d/l at max, few other traffic) the
situation didn' get better, it hangs as often as with the original
via-rhine, at least it feels so. No hard figures here. But even
writing this mail (via ssh) here parallel to a download over the lan
(from the same server) triggers resets.

under heavy loads (ie with multiple flood pings) it resets often but I
couldn't push it over the edge anymore. I have it running now for
several minutes under multiple pingfloods and it always recovered
(from quite a amount of resets).

At least it recovers now. Thank you for your time and work!



Regards, David Schmitt
-- 
Signaturen sind wie Frauen. Man findet selten eine Vernuenftige
	-- gesehen in at.linux

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-29 12:48           ` David Schmitt
@ 2001-08-29 18:45             ` Urban Widmark
  2001-08-31 12:18               ` David Schmitt
  0 siblings, 1 reply; 12+ messages in thread
From: Urban Widmark @ 2001-08-29 18:45 UTC (permalink / raw)
  To: David Schmitt; +Cc: linux-kernel

On Wed, 29 Aug 2001, David Schmitt wrote:

> under 'normal loads' (ie one tcp d/l at max, few other traffic) the
> situation didn' get better, it hangs as often as with the original
> via-rhine, at least it feels so. No hard figures here. But even
> writing this mail (via ssh) here parallel to a download over the lan
> (from the same server) triggers resets.

That is still pretty awful ... but it doesn't stop working?
(you say hangs, but then resets)

> under heavy loads (ie with multiple flood pings) it resets often but I
> couldn't push it over the edge anymore. I have it running now for
> several minutes under multiple pingfloods and it always recovered
> (from quite a amount of resets).

Ok, that means the "D-Link magic" does improve reset.

It may be interesting to find out which parts that help. I simply added
things that looked good ... Lacking information on what the bit-flipping
is supposed to do, one way to try and do that is to remove code and see
how much can be removed without breaking anything.
(Sounds like a childrens game, except for programmers ...)

I'll still try generating collisions and see what happens. If I can't
reproduce this perhaps you would test a different patch to see which
change that made a difference?

/Urban


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit
  2001-08-29 18:45             ` Urban Widmark
@ 2001-08-31 12:18               ` David Schmitt
  0 siblings, 0 replies; 12+ messages in thread
From: David Schmitt @ 2001-08-31 12:18 UTC (permalink / raw)
  To: linux-kernel

On Wed, Aug 29, 2001 at 08:45:34PM +0200, Urban Widmark wrote:
> On Wed, 29 Aug 2001, David Schmitt wrote:
> > under 'normal loads' (ie one tcp d/l at max, few other traffic) the
> > situation didn' get better, it hangs as often as with the original
> > via-rhine, at least it feels so. No hard figures here. But even
> > writing this mail (via ssh) here parallel to a download over the lan
> > (from the same server) triggers resets.
> 
> That is still pretty awful ... but it doesn't stop working?
> (you say hangs, but then resets)

sorry, sloppy language. hang == reset (in this case)

> > under heavy loads (ie with multiple flood pings) it resets often but I
> > couldn't push it over the edge anymore. I have it running now for
> > several minutes under multiple pingfloods and it always recovered
> > (from quite a amount of resets).
> 
> Ok, that means the "D-Link magic" does improve reset.

Yes. Until your patch 2.4.9 resetted three or four times sucessfully
and then the resets stopped working. With your patch it resets as
often but doesn't fail resetting anymore.

> It may be interesting to find out which parts that help. I simply added
> things that looked good ... Lacking information on what the bit-flipping
> is supposed to do, one way to try and do that is to remove code and see
> how much can be removed without breaking anything.
> (Sounds like a childrens game, except for programmers ...)

Hehe, bruteforcing it :-))

> I'll still try generating collisions and see what happens. If I can't
> reproduce this perhaps you would test a different patch to see which
> change that made a difference?

Sure, the machine is fast enough to handle another kernel recompile or
two :^))



Regards, David
-- 
Signaturen sind wie Frauen. Man findet selten eine Vernuenftige
	-- gesehen in at.linux

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-08-31 12:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-24 14:24 ISSUE: DFE530-TX REV-A3-1 times out on transmit David Schmitt
2001-08-25 17:05 ` Urban Widmark
2001-08-27  8:27   ` David Schmitt
2001-08-27  9:13     ` David Schmitt
2001-08-27 19:02     ` Urban Widmark
2001-08-28 13:45       ` David Schmitt
2001-08-28 19:46         ` Urban Widmark
2001-08-29  6:42           ` Dennis Bjorklund
2001-08-29 12:48           ` David Schmitt
2001-08-29 18:45             ` Urban Widmark
2001-08-31 12:18               ` David Schmitt
2001-08-28  7:26     ` Dennis Bjorklund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).