linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
@ 2003-12-13 13:01 Francois Romieu
  2003-12-13 13:54 ` Marcus Blomenkamp
  0 siblings, 1 reply; 7+ messages in thread
From: Francois Romieu @ 2003-12-13 13:01 UTC (permalink / raw)
  To: Marcus Blomenkamp; +Cc: linux-kernel

In-Reply-To: <200312131300.05847.Marcus.Blomenkamp@epost.de>; from
Marcus.Blomenkamp@epost.de on Sat, Dec 13, 2003 at 01:00:05PM +0100
[nfs+r8169 foobar]
> If anyone is interested, i have dmesg output and ethereal log files handy.

Yes.

A few questions:
- the client behaves correctly with a 8139, be it with 2.4.23-pre9 or
  2.6.0-test11 ?
- could you be more specific wrt "special realtek supplied version" ?

--
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
  2003-12-13 13:01 r8169 GigE driver problem, locks up 2.4.23 NFS subsystem Francois Romieu
@ 2003-12-13 13:54 ` Marcus Blomenkamp
       [not found]   ` <20031214144055.A4664@electric-eye.fr.zoreil.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Marcus Blomenkamp @ 2003-12-13 13:54 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

Am Samstag, 13. Dezember 2003 14:01 schrieb Francois Romieu:
> A few questions:
> - the client behaves correctly with a 8139, be it with 2.4.23-pre9 or
>   2.6.0-test11 ?

I just swapped the cards from 8139 to 8169 and recompiled drivers. Never had 
any problem with the old one.

> - could you be more specific wrt "special realtek supplied version" ?

Have a look at 
http://www.realtek.com.tw/downloads/downloads1-3.aspx?lineid=1&famid=4&series=2003072&Software=True
This driver claims to be of revision 1.6 while vanilla ships with 1.2, so i 
gave it a try.

regards, Marcus



[-- Attachment #2: problem-ethereal-udp-fail.gz --]
[-- Type: application/x-gzip, Size: 2134 bytes --]

[-- Attachment #3: problem-lspci --]
[-- Type: text/plain, Size: 5514 bytes --]

00:00.0 Host bridge: Intel Corp. 440BX/ZX - 82443BX/ZX Host bridge (rev 02)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 64
	Region 0: Memory at e4000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [a0] AGP version 1.0
		Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
		Command: RQ=0 SBA+ AGP+ 64bit- FW- Rate=x1

00:01.0 PCI bridge: Intel Corp. 440BX/ZX - 82443BX/ZX AGP bridge (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: d7000000-d7dfffff
	Prefetchable memory behind bridge: d7f00000-e3ffffff
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B+

00:04.0 ISA bridge: Intel Corp. 82371AB PIIX4 ISA (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:04.1 IDE interface: Intel Corp. 82371AB PIIX4 IDE (rev 01) (prog-if 80 [Master])
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Region 4: I/O ports at b800 [size=16]

00:04.2 USB Controller: Intel Corp. 82371AB PIIX4 USB (rev 01) (prog-if 00 [UHCI])
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin D routed to IRQ 9
	Region 4: I/O ports at b400 [size=32]

00:04.3 Bridge: Intel Corp. 82371AB PIIX4 ACPI (rev 02)
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin ? routed to IRQ 9

00:09.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
	Subsystem: C-Media Electronics Inc CMI8738/C3DX PCI Audio Device
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (500ns min, 6000ns max)
	Interrupt: pin A routed to IRQ 9
	Region 0: I/O ports at b000 [size=256]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd.: Unknown device 8169 (rev 10)
	Subsystem: Realtek Semiconductor Co., Ltd.: Unknown device 8169
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (8000ns min, 16000ns max), cache line size 08
	Interrupt: pin A routed to IRQ 5
	Region 0: I/O ports at a800 [size=256]
	Region 1: Memory at d6800000 (32-bit, non-prefetchable) [size=256]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 Multimedia video controller: Zoran Corporation ZR36120 (rev 03)
	Subsystem: Unknown device 93c2:2000
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (500ns min, 4000ns max)
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at d6000000 (32-bit, non-prefetchable) [size=4K]

00:0c.0 Network controller: Techsan Electronics Co Ltd: Unknown device 2103 (rev 01)
	Subsystem: Techsan Electronics Co Ltd: Unknown device 2103
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at d5800000 (32-bit, non-prefetchable) [size=64K]
	Region 1: I/O ports at a400 [size=32]

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon VE QY (prog-if 00 [VGA])
	Subsystem: Unknown device 174b:7112
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (2000ns min), cache line size 08
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at d8000000 (32-bit, prefetchable) [size=128M]
	Region 1: I/O ports at d800 [size=256]
	Region 2: Memory at d7000000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at d7fe0000 [disabled] [size=128K]
	Capabilities: [58] AGP version 2.0
		Status: RQ=47 SBA+ 64bit- FW- Rate=x1,x2
		Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=x1
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-


[-- Attachment #4: problem-dmesg --]
[-- Type: text/plain, Size: 8023 bytes --]

Linux version 2.4.23-pre9 (root@zwiebel) (gcc version 2.95.4 20011002 (Debian prerelease)) #6 Sam Dez 13 09:43:15 CET 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000017ffd000 (usable)
 BIOS-e820: 0000000017ffd000 - 0000000017fff000 (ACPI data)
 BIOS-e820: 0000000017fff000 - 0000000018000000 (ACPI NVS)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
383MB LOWMEM available.
ACPI: have wakeup address 0xc0001000
On node 0 totalpages: 98301
zone(0): 4096 pages.
zone(1): 94205 pages.
zone(2): 0 pages.
ACPI: RSDP (v000 ASUS                                      ) @ 0x000f7f80
ACPI: RSDT (v001 ASUS   P2B      0x42302e31 MSFT 0x31313031) @ 0x17ffd000
ACPI: FADT (v001 ASUS   P2B      0x42302e31 MSFT 0x31313031) @ 0x17ffd080
ACPI: BOOT (v001 ASUS   P2B      0x42302e31 MSFT 0x31313031) @ 0x17ffd040
ACPI: DSDT (v001   ASUS P2B      0x00001000 MSFT 0x01000001) @ 0x00000000
Kernel command line: auto BOOT_IMAGE=Linux-2.4 ro root=301
Initializing CPU#0
Detected 467.734 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 933.88 BogoMIPS
Memory: 385696k/393204k available (1824k kernel code, 7120k reserved, 696k data, 96k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 128K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0183f9ff 00000000 00000000 00000000
CPU:             Common caps: 0183f9ff 00000000 00000000 00000000
CPU: Intel Celeron (Mendocino) stepping 05
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
ACPI: Subsystem revision 20031002
PCI: PCI BIOS revision 2.10 entry at 0xf0720, last bus=1
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: System [ACPI] (supports S0 S1 S4 S5)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
PCI: Probing PCI hardware
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'
Limiting direct PCI/PCI transfers.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
Journalled Block Device driver loaded
udf: registering filesystem
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU0] (supports C1 C2)
i2c-core.o: i2c core module version 2.7.0 (20021208)
i2c-algo-bit.o: i2c bit algorithm module version 2.7.0 (20021208)
i2c-proc.o version 2.7.0 (20021208)
i2c-piix4.o version 2.7.0 (20021208)
i2c-piix4.o: Found PIIX4 device
dmi_scan.o version 2.7.0 (20021208)
dmi_scan.o: SM BIOS found
i2c-piix4.o: SMBus detected and initialized
radeonfb: ref_clk=2700, ref_div=60, xclk=15000 from BIOS
radeonfb: detected DFP panel size from BIOS: 1024x768
Console: switching to colour frame buffer device 128x48
radeonfb: ATI Radeon VE QY DDR SGRAM 32 MB
radeonfb: DVI port DFP monitor connected
radeonfb: CRT port no monitor connected
pty: 256 Unix98 ptys configured
w83781d.o version 2.7.0 (20021208)
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 321M
agpgart: Detected Intel 440BX chipset
agpgart: AGP aperture is 64M @ 0xe4000000
[drm] AGP 0.99 Aperture @ 0xe4000000 64MB
[drm] Initialized radeon 1.7.0 20020828 on minor 0
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 00:04.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xb800-0xb807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xb808-0xb80f, BIOS settings: hdc:pio, hdd:pio
hda: Maxtor 92041U4, ATA DISK drive
hdb: LITEON DVD-ROM LTD163D, ATAPI CD/DVD-ROM drive
blk: queue c03bbb20, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 40020624 sectors (20491 MB) w/512KiB Cache, CHS=2491/255/63, UDMA(33)
hdb: attached ide-cdrom driver.
hdb: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
usb.c: registered new driver hub
host/uhci.c: USB Universal Host Controller Interface driver v1.1
host/uhci.c: USB UHCI at I/O 0xb400, IRQ 9
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
Initializing USB Mass Storage driver...
usb.c: registered new driver usb-storage
USB Mass Storage support registered.
Linux video capture interface: v1.00
mice: PS/2 mouse device common for all mice
i2c-core.o: i2c core module version 2.7.0 (20021208)
i2c-proc.o version 2.7.0 (20021208)
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Ethernet Bridge 008 for NET4.0
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 96k freed
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
Real Time Clock Driver v1.10e
r8169 Gigabit Ethernet driver 1.2 loaded
r8169: PCI device 00:0a.0: unknown chip version, assuming RTL-8169
r8169: PCI device 00:0a.0: TxConfig = 0x4000000
eth0: Identified chip type is 'RTL-8169'.
eth0: RealTek RTL8169 Gigabit Ethernet at 0xda83a000, 00:08:54:d0:e4:70, IRQ 5
eth0: Auto-negotiation Enabled.
eth0: 100Mbps Full-duplex operation.
8139too Fast Ethernet driver 0.9.26
skystar2.c: FlexCopII(rev.130) chip found
DVB: registering new adapter (Technisat SkyStar2 driver).
DVB: registering frontend 0:0 (Zarlink MT312)...
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,6), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
nfs: server kartoffel not responding, still trying
nfs: server kartoffel not responding, still trying
nfs: server kartoffel OK
nfs: server kartoffel not responding, still trying
nfs: server kartoffel not responding, still trying
nfs: server kartoffel OK

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
       [not found]   ` <20031214144055.A4664@electric-eye.fr.zoreil.com>
@ 2003-12-15 10:24     ` Marcus Blomenkamp
  2003-12-15 18:58       ` Francois Romieu
  2003-12-18  0:45       ` Francois Romieu
  0 siblings, 2 replies; 7+ messages in thread
From: Marcus Blomenkamp @ 2003-12-15 10:24 UTC (permalink / raw)
  To: Francois Romieu; +Cc: linux-kernel

Am Sonntag, 14. Dezember 2003 14:40 schrieben Sie:
>
> Ok, this one is merged into Jeff Garzik's patchset. If you try it and it
> does not work, please report if it does not work in a different manner
> (because it is still possible that I have broken something during the
> merge).

Hi.

First of all, i did not feel any difference in behaviour for the different 
ACPI options, so I ran all these tests with acpi=off explicitly. Mainboard is 
Asus P2B with recent ACPI beta bios.

Ok, from this patchset i extracted and updated the 8169.c diff only. And bad 
news: With this driver object built-in the kernel does not boot at all - 
locks up on configuring card 100% reproducible.

I enabled debug options in 8169 and got this output: (manually written...)

r8169 Gigabit Ethernet driver 1.2 loaded
mac_version == RTL_GIGA_MAC_VER_E (0002)
phy_version == RTL_GIGA_PHY_VER_E (0000)
eth0: RTL8169 at 0xd882a00, 00:08:54:d0:e4:70, IRQ 5
mac_version == RTL_GIGA_MAC_VER_E (0002)
phy_version == RTL_GIGA_PHY_VER_E (0000)
Do final_reg2.cfg

I added more dummy printks and intermediate result: it does not return from 
function 'rtl8169_hw_phy_config()' 

This routine messes up the card itself, as a reset/reboot into 2.4 does not 
revitalize it. I definitively have to power-cycle the machine.

>
> Is it possible to get an ethereal dump for a normal nfs operation as well
> as for a failed one on both sides of the link (client + server) ?
> The size of the dump should not be an issue on my side.

I tcpdump'ed both sides on transferring 1 Megabyte to the 8139 based machine 
for both low level transfers (UDP, TCP) and on filesystem level (NFS).

NFS: dd 1M to remote file
UDP/TCP: dd 1M trough netcat
TCP: scp 1M file

Very interestingly i could not reproduce the NFS lockup during this double 
monitoring setup. So i ran it twice - once for each machine in promiscious 
mode. And guess: If the 8169 NIC is in monitoring mode, NFS writes do not 
lock up. I can even recover the machine from stalling by explicitly entering 
promiscious mode and SIGINT'ing the writing process.


>
> A /proc/interrupts of the working/failing client after the nfs operation
> as well as after a normal scp (if possible) could help too.

Everything normal - no interrupt storm visible. On NFS stalling NICs interrupt 
counter does not increase, however explicit pings work fine and do increment 
the counter.

>
> Some 'ifconfig' output will be welcome too.
>
> If the problem is more or less trivially related to the length of the
> frames, it should be possible to notice a change of behavior while
> increasing the size of simple ping (ping -n -c 1 -s _size 192.168.1.254
> where _size ~= 1460...1480). Do you notice something here ?

This one ist interesting too. After having found NFS locking above 
datagramsize==4k i tried to find the exact boundary in this range. And there 
is no. First impressions (will do a deeper analysis later):

send 4k		OK
send 5k		-

IIRC there is a region around 4750 bytes at which it seems to works like a 
Schmitt trigger. If i am coming from below (aka can ping) i can ping with 
dsize=X but if i am coming from above (aka no ping answer) i can not ping 
with the same dsize=X.

I'll copy-mail this to lk, so i'll send the logs to you in a separate mail.

Best regards, Marcus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
  2003-12-15 10:24     ` Marcus Blomenkamp
@ 2003-12-15 18:58       ` Francois Romieu
  2003-12-18  0:45       ` Francois Romieu
  1 sibling, 0 replies; 7+ messages in thread
From: Francois Romieu @ 2003-12-15 18:58 UTC (permalink / raw)
  To: Marcus Blomenkamp; +Cc: linux-kernel

Marcus Blomenkamp <Marcus.Blomenkamp@epost.de> :
[...]
> I added more dummy printks and intermediate result: it does not return from 
> function 'rtl8169_hw_phy_config()' 
> 
> This routine messes up the card itself, as a reset/reboot into 2.4 does not 
> revitalize it. I definitively have to power-cycle the machine.

Daube. Thanks for the work.

If you have some spare time, you can try the patches available at:
<URL:http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.0-test11-bk5>
(a README describes the order).
They will not fix the issue but I would not be too surprized if something
breaks before "rtl8169_hw_phy_config" appears.

I'll be away from my computer until wednesday evening.

--
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
  2003-12-15 10:24     ` Marcus Blomenkamp
  2003-12-15 18:58       ` Francois Romieu
@ 2003-12-18  0:45       ` Francois Romieu
  1 sibling, 0 replies; 7+ messages in thread
From: Francois Romieu @ 2003-12-18  0:45 UTC (permalink / raw)
  To: Marcus Blomenkamp; +Cc: linux-kernel

Marcus Blomenkamp <Marcus.Blomenkamp@epost.de> :
[...]
> I tcpdump'ed both sides on transferring 1 Megabyte to the 8139 based machine 
> for both low level transfers (UDP, TCP) and on filesystem level (NFS).
> 
> NFS: dd 1M to remote file
> UDP/TCP: dd 1M trough netcat
> TCP: scp 1M file
> 
> Very interestingly i could not reproduce the NFS lockup during this double 
> monitoring setup. So i ran it twice - once for each machine in promiscious 
> mode. And guess: If the 8169 NIC is in monitoring mode, NFS writes do not 
> lock up. I can even recover the machine from stalling by explicitly entering 
> promiscious mode and SIGINT'ing the writing process.

The dumps show that the server misses frames and/or sees generous amount
of garbage during UDP transfer. Which transfer rate can you achieve for big
writes between 8129 ?

Can you:
- retrieve the ifconfig output on the client/server just before and after the
  write (8192 sized NFS from 8169 to 8139) ?
- send a dump of a 8192 bytes sized 1Mo write from 8139 to 8139 ?

--
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
  2003-12-13 12:00 Marcus Blomenkamp
@ 2003-12-14  2:02 ` Jeff Garzik
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2003-12-14  2:02 UTC (permalink / raw)
  To: Marcus Blomenkamp; +Cc: linux-kernel

Marcus Blomenkamp wrote:
> Hi all.
> 
> Since ugrading from a realtek-8139 based nic to a realtek-6139 gigabit one i 
> am experiencing strange network problems. Particularly it locks up the NFS 
> subsystem on writing to remote files.

Can you try the vastly updated r8169 driver in the following patch?

http://www.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.6/2.6.0-test11-bk5-netdrvr-exp2.patch.bz2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* r8169 GigE driver problem, locks up 2.4.23 NFS subsystem
@ 2003-12-13 12:00 Marcus Blomenkamp
  2003-12-14  2:02 ` Jeff Garzik
  0 siblings, 1 reply; 7+ messages in thread
From: Marcus Blomenkamp @ 2003-12-13 12:00 UTC (permalink / raw)
  To: linux-kernel

Hi all.

Since ugrading from a realtek-8139 based nic to a realtek-6139 gigabit one i 
am experiencing strange network problems. Particularly it locks up the NFS 
subsystem on writing to remote files.

I tried to narrow it down using several TCP/UDP traffic tools such as 
'netpipe' and 'netcat' and different kernel r8169 driver versions. 

I restricted the network to a single 100mbit crossover cable between a machine 
with said gig-nic and a file server which has not been modified yet.

File server:	'kartoffel', RTL-8139, linux-2.4.20
Client machine:	'zwiebel', RTL-8169S, linux-2.4.23-pre9, linux-2.6.0-test11

I tried 2.4.23-pre9 and 2.6.0-test11 vanilla drivers and a special realtek 
supplied version for linux-2.4. With respect to basic TCP and UDP transfer 
their behaviour was identical. However 2.6 NFS subsystem was able to recover 
from the network stalls, while 2.4 NFS did not release processes from 'D' 
state.

My suspicion is that something related to UDP datagram to IP-over-ethernet 
frame fragmentation is broken. TCP transfers in both directions work fine 
while UDP transmissions over a specific datagram size stall after sending a 
few k. 

These objections manifest into NFS running fine over TCP and running fine over 
UDP with wsize<=4096, while standard NFS mount option wsize=8192 fails.

If anyone is interested, i have dmesg output and ethereal log files handy.

Best regards, Marcus


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-12-18  0:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-13 13:01 r8169 GigE driver problem, locks up 2.4.23 NFS subsystem Francois Romieu
2003-12-13 13:54 ` Marcus Blomenkamp
     [not found]   ` <20031214144055.A4664@electric-eye.fr.zoreil.com>
2003-12-15 10:24     ` Marcus Blomenkamp
2003-12-15 18:58       ` Francois Romieu
2003-12-18  0:45       ` Francois Romieu
  -- strict thread matches above, loose matches on Subject: below --
2003-12-13 12:00 Marcus Blomenkamp
2003-12-14  2:02 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).