All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: r8169 driver crashes in 2.6.32.43
       [not found] ` <20110724201626.GB24418@zoreil.com>
@ 2011-07-25 10:36   ` Kasper Dupont
  2011-07-28  7:04     ` Francois Romieu
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-25 10:36 UTC (permalink / raw)
  To: François romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 24/07/11 22.16, François romieu wrote:
> The Sun, Jul 24, 2011 at 09:58:31PM +0200, Kasper Dupont wrote :
> [...]
> > Any idea how to fix this?
> 
> Apply 1519e57fe81c14bb8fa4855579f19264d1ef63b4 as well and
> eventually f60ac8e7ab7cbb413a0131d5665b053f9f386526.
> 
> Please send r8169 related lines from dmesg, especially the XID
> one and Cc: netdev.

These are the relevant lines from dmesg:

[    1.045727] pata_sch 0000:00:1f.1: setting latency timer to 64
[    1.045946] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.046061] r8169 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    1.046201] r8169 0000:02:00.0: setting latency timer to 64
[    1.046257]   alloc irq_desc for 24 on node -1
[    1.046263]   alloc kstat_irqs on node -1
[    1.046284] r8169 0000:02:00.0: irq 24 for MSI/MSI-X
[    1.048097] eth0: RTL8168c/8111c at 0xf8076000, 00:01:c0:09:a1:25, XID 1c4000c0 IRQ 24
[    1.051517] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.051631] r8169 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    1.051764] r8169 0000:03:00.0: setting latency timer to 64
[    1.051820]   alloc irq_desc for 25 on node -1
[    1.051825]   alloc kstat_irqs on node -1
[    1.051847] r8169 0000:03:00.0: irq 25 for MSI/MSI-X
[    1.053159] usb 1-7: new high speed USB device using ehci_hcd and address 5
[    1.056574] vga16fb: initializing
[    1.056584] vga16fb: mapped to 0xc00a0000
[    1.056819] fb0: VGA16 VGA frame buffer device
[    1.070138] scsi0 : pata_sch
[    1.078253] scsi1 : pata_sch
[    1.079216] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1800 irq 14
[    1.079312] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1808 irq 15
[    1.082178] eth1: RTL8168c/8111c at 0xf8096000, 00:01:c0:09:a1:26, XID 1c4000c0 IRQ 25
[    1.205643] usb 1-7: configuration #1 chosen from 1 choice

It works on 2.6.32.32 it crashes on 2.6.32.33. I tried to
take 2.6.32.43 and apply 1519e57fe81c14bb8fa4855579f19264d1ef63b4,
that did not help. 2.6.32.43 crashes with and without that patch.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-25 10:36   ` r8169 driver crashes in 2.6.32.43 Kasper Dupont
@ 2011-07-28  7:04     ` Francois Romieu
  2011-07-28  8:48       ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2011-07-28  7:04 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: ivecera, hayeswang, gregkh, netdev

Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
[...]
> [    1.045727] pata_sch 0000:00:1f.1: setting latency timer to 64
> [    1.045946] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [    1.046061] r8169 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [    1.046201] r8169 0000:02:00.0: setting latency timer to 64
> [    1.046257]   alloc irq_desc for 24 on node -1
> [    1.046263]   alloc kstat_irqs on node -1
> [    1.046284] r8169 0000:02:00.0: irq 24 for MSI/MSI-X
> [    1.048097] eth0: RTL8168c/8111c at 0xf8076000, 00:01:c0:09:a1:25, XID 1c4000c0 IRQ 24

RTL_GIGA_MAC_VER_22

[...]
> [    1.082178] eth1: RTL8168c/8111c at 0xf8096000, 00:01:c0:09:a1:26, XID 1c4000c0 IRQ 25

sic.

I miss it (the light fast crash prone motherboard from hell does not count).

[...]
> It works on 2.6.32.32 it crashes on 2.6.32.33. I tried to
> take 2.6.32.43 and apply 1519e57fe81c14bb8fa4855579f19264d1ef63b4,
> that did not help. 2.6.32.43 crashes with and without that patch.

1519e57fe81c14bb8fa4855579f19264d1ef63b4 does not help RTL_GIGA_MAC_VER_22
proper but you may apply it, then move the 'case RTL_GIGA_MAC_VER_22:'
statement a few line below and see if it helps (assuming the fifo overflow
event may be ignored):

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 7d9c650..33c0ead 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5383,7 +5383,6 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			switch (tp->mac_version) {
 			/* Work around for rx fifo overflow */
 			case RTL_GIGA_MAC_VER_11:
-			case RTL_GIGA_MAC_VER_22:
 			case RTL_GIGA_MAC_VER_26:
 				netif_stop_queue(dev);
 				rtl8169_tx_timeout(dev);
@@ -5393,6 +5392,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			case RTL_GIGA_MAC_VER_19:
 			case RTL_GIGA_MAC_VER_20:
 			case RTL_GIGA_MAC_VER_21:
+			case RTL_GIGA_MAC_VER_22:
 			case RTL_GIGA_MAC_VER_23:
 			case RTL_GIGA_MAC_VER_24:
 			case RTL_GIGA_MAC_VER_27:
-- 
Ueimor

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28  7:04     ` Francois Romieu
@ 2011-07-28  8:48       ` Kasper Dupont
  2011-07-28 10:58         ` Francois Romieu
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28  8:48 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 09.04, Francois Romieu wrote:
> Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
> [...]
> > [    1.045727] pata_sch 0000:00:1f.1: setting latency timer to 64
> > [    1.045946] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> > [    1.046061] r8169 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> > [    1.046201] r8169 0000:02:00.0: setting latency timer to 64
> > [    1.046257]   alloc irq_desc for 24 on node -1
> > [    1.046263]   alloc kstat_irqs on node -1
> > [    1.046284] r8169 0000:02:00.0: irq 24 for MSI/MSI-X
> > [    1.048097] eth0: RTL8168c/8111c at 0xf8076000, 00:01:c0:09:a1:25, XID 1c4000c0 IRQ 24
> 
> RTL_GIGA_MAC_VER_22
> 
> [...]
> > [    1.082178] eth1: RTL8168c/8111c at 0xf8096000, 00:01:c0:09:a1:26, XID 1c4000c0 IRQ 25
> 
> sic.
> 
> I miss it (the light fast crash prone motherboard from hell does not count).
> 
> [...]
> > It works on 2.6.32.32 it crashes on 2.6.32.33. I tried to
> > take 2.6.32.43 and apply 1519e57fe81c14bb8fa4855579f19264d1ef63b4,
> > that did not help. 2.6.32.43 crashes with and without that patch.
> 
> 1519e57fe81c14bb8fa4855579f19264d1ef63b4 does not help RTL_GIGA_MAC_VER_22
> proper but you may apply it, then move the 'case RTL_GIGA_MAC_VER_22:'
> statement a few line below and see if it helps (assuming the fifo overflow
> event may be ignored):

I tried to apply both 1519e57fe81c14bb8fa4855579f19264d1ef63b4
and f60ac8e7ab7cbb413a0131d5665b053f9f386526. It still crashes,
the first two times I booted that exact build it printed out a
stackdump just before crashing. I have pictures of the two
stack dumps in case that is any help, but unfortunately they
were too deep to fit on 25 lines.

> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 7d9c650..33c0ead 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -5383,7 +5383,6 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  			switch (tp->mac_version) {
>  			/* Work around for rx fifo overflow */
>  			case RTL_GIGA_MAC_VER_11:
> -			case RTL_GIGA_MAC_VER_22:
>  			case RTL_GIGA_MAC_VER_26:
>  				netif_stop_queue(dev);
>  				rtl8169_tx_timeout(dev);
> @@ -5393,6 +5392,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  			case RTL_GIGA_MAC_VER_19:
>  			case RTL_GIGA_MAC_VER_20:
>  			case RTL_GIGA_MAC_VER_21:
> +			case RTL_GIGA_MAC_VER_22:
>  			case RTL_GIGA_MAC_VER_23:
>  			case RTL_GIGA_MAC_VER_24:
>  			case RTL_GIGA_MAC_VER_27:

I tried applying this one as well (in addition to the previous
two). It no longer crashes, but now the network stops working
after the first few packets have been transmitted.

What exactly was 649f25c389e9498923b459bbffff41a2fd1d7a64
trying to fix in the first place? Before that patch the
network on this machine was running fast and stable.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28  8:48       ` Kasper Dupont
@ 2011-07-28 10:58         ` Francois Romieu
  2011-07-28 11:43           ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2011-07-28 10:58 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: ivecera, hayeswang, gregkh, netdev

Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
[...]
> I tried to apply both 1519e57fe81c14bb8fa4855579f19264d1ef63b4
> and f60ac8e7ab7cbb413a0131d5665b053f9f386526. It still crashes,
> the first two times I booted that exact build it printed out a
> stackdump just before crashing. I have pictures of the two
> stack dumps in case that is any help, but unfortunately they
> were too deep to fit on 25 lines.

It will probably not help but I can hardly tell without seeing them. :o)
It will be ok if you publish them somewhere.

Does something prevent you to enable a different video mode ?

[...]
> I tried applying this one as well (in addition to the previous
> two). It no longer crashes, but now the network stops working
> after the first few packets have been transmitted.

Thanks for testing.

> What exactly was 649f25c389e9498923b459bbffff41a2fd1d7a64
> trying to fix in the first place?
>
> Before that patch the network on this machine was running fast and stable.

It's frustrating.

Either the hardware did not experience Rx FIFO overflow internally or it
was able to recover from it without driver intervention. Ivan's hardware
seems to behave differently so blindly revert 
649f25c389e9498923b459bbffff41a2fd1d7a64 is not an option.

Can you add the crap below on top of the pre "no longer crashes, no network"
one (it will work on top of plain -git driver as well) ?

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 7d9c650..f9f2044 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -395,6 +395,7 @@ enum rtl_register_content {
 	/* InterruptStatusBits */
 	SYSErr		= 0x8000,
 	PCSTimeout	= 0x4000,
+	RxFIFOEmpty	= 0x0200,	/* 816x something only ? */
 	SWInt		= 0x0100,
 	TxDescUnavail	= 0x0080,
 	RxFIFOOver	= 0x0040,
@@ -5381,9 +5382,10 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 
 		if (unlikely(status & RxFIFOOver)) {
 			switch (tp->mac_version) {
+				int i;
+
 			/* Work around for rx fifo overflow */
 			case RTL_GIGA_MAC_VER_11:
-			case RTL_GIGA_MAC_VER_22:
 			case RTL_GIGA_MAC_VER_26:
 				netif_stop_queue(dev);
 				rtl8169_tx_timeout(dev);
@@ -5399,6 +5401,21 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			case RTL_GIGA_MAC_VER_28:
 			case RTL_GIGA_MAC_VER_31:
 			/* Experimental science. Pktgen proof. */
+			case RTL_GIGA_MAC_VER_22:
+				netif_info(tp, drv, dev, "S: %08x\n", status);
+				for (i = 0; i < 4000000; i++) {
+					if (RTL_R16(IntrStatus) & RxFIFOEmpty) {
+						RTL_W16(IntrStatus, RxFIFOOver);
+						if (net_ratelimit()) {
+							netif_info(tp, drv, dev,
+								   "FEmp\n");
+						}
+						break;
+					}
+					udelay(10);
+				}
+				if ((i >= 4000000) && net_ratelimit())
+					netif_info(tp, drv, dev, "no FEmp\n");
 			case RTL_GIGA_MAC_VER_12:
 			case RTL_GIGA_MAC_VER_25:
 				if (status == RxFIFOOver)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 10:58         ` Francois Romieu
@ 2011-07-28 11:43           ` Kasper Dupont
  2011-07-28 11:59             ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28 11:43 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 12.58, Francois Romieu wrote:
> Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
> [...]
> > I tried to apply both 1519e57fe81c14bb8fa4855579f19264d1ef63b4
> > and f60ac8e7ab7cbb413a0131d5665b053f9f386526. It still crashes,
> > the first two times I booted that exact build it printed out a
> > stackdump just before crashing. I have pictures of the two
> > stack dumps in case that is any help, but unfortunately they
> > were too deep to fit on 25 lines.
> 
> It will probably not help but I can hardly tell without seeing them. :o)
> It will be ok if you publish them somewhere.

http://kasperd.net/~kasperd/r8169/

> 
> Does something prevent you to enable a different video mode ?

I got some problems with getting the video drivers to behave
properly. I managed to get X to use the correct screen
resolution, but I haven't figured out to get the text console
to do so.

Netconsole over the network interface we are trying to debug
probably isn't the way to go. Do you think it would be feasible
to run netconsole over wifi?

> 
> [...]
> > I tried applying this one as well (in addition to the previous
> > two). It no longer crashes, but now the network stops working
> > after the first few packets have been transmitted.
> 
> Thanks for testing.
> 
> > What exactly was 649f25c389e9498923b459bbffff41a2fd1d7a64
> > trying to fix in the first place?
> >
> > Before that patch the network on this machine was running fast and stable.
> 
> It's frustrating.
> 
> Either the hardware did not experience Rx FIFO overflow internally or it
> was able to recover from it without driver intervention. Ivan's hardware
> seems to behave differently so blindly revert 
> 649f25c389e9498923b459bbffff41a2fd1d7a64 is not an option.
> 
> Can you add the crap below on top of the pre "no longer crashes, no network"
> one (it will work on top of plain -git driver as well) ?

I'll give it a try.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 11:43           ` Kasper Dupont
@ 2011-07-28 11:59             ` Kasper Dupont
  2011-07-28 12:23               ` Francois Romieu
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28 11:59 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 13.43, Kasper Dupont wrote:
> Netconsole over the network interface we are trying to debug
> probably isn't the way to go. Do you think it would be feasible
> to run netconsole over wifi?

But maybe the two interfaces are independent enough that I
can run netconsole over one while triggering a crash on the
other.

> > 
> > Can you add the crap below on top of the pre "no longer crashes, no network"
> > one (it will work on top of plain -git driver as well) ?
> 
> I'll give it a try.

It doesn't compile. It complains about netif_info and drv
not being declared.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 11:59             ` Kasper Dupont
@ 2011-07-28 12:23               ` Francois Romieu
  2011-07-28 12:45                 ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2011-07-28 12:23 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: ivecera, hayeswang, gregkh, netdev

Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
[...]
> But maybe the two interfaces are independent enough that I
> can run netconsole over one while triggering a crash on the
> other.

I am not exactly biased in favor of netconsole yet but go for
it if you are confortable with.

[...]
> It doesn't compile. It complains about netif_info and drv
> not being declared.

?

You can replace them with plain printk(KERN_{INFO/ERR} "S: ...).

What are you compiling the patch against ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 12:23               ` Francois Romieu
@ 2011-07-28 12:45                 ` Kasper Dupont
  2011-07-28 12:54                   ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28 12:45 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 14.23, Francois Romieu wrote:
> You can replace them with plain printk(KERN_{INFO/ERR} "S: ...).

Then it compiles, will let you know shortly how it works out.

> 
> What are you compiling the patch against ?

2.6.32.43. I picked that version because ultimately I want
it fixed in Ubuntu 10.04, which also uses a 2.6.32 kernel.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 12:45                 ` Kasper Dupont
@ 2011-07-28 12:54                   ` Kasper Dupont
  2011-07-28 14:47                     ` Francois Romieu
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28 12:54 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 14.45, Kasper Dupont wrote:
> On 28/07/11 14.23, Francois Romieu wrote:
> > You can replace them with plain printk(KERN_{INFO/ERR} "S: ...).
> 
> Then it compiles, will let you know shortly how it works out.

The last 24 lines before network stopped working:
S: 00000044
FEmp
S: 000000c0
FEmp
S: 00000044
FEmp
S: 00000040
FEmp
S: 00000044
FEmp
S: 00000041
FEmp
S: 000000c0
FEmp
S: 00000040
FEmp
S: 000000c0
FEmp
S: 000000c0
FEmp
S: 00000040
S: 000000c4
S: 00000044
S: 00000040

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 12:54                   ` Kasper Dupont
@ 2011-07-28 14:47                     ` Francois Romieu
  2011-07-28 21:01                       ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2011-07-28 14:47 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: ivecera, hayeswang, gregkh, netdev

Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
> On 28/07/11 14.45, Kasper Dupont wrote:
> > On 28/07/11 14.23, Francois Romieu wrote:
> > > You can replace them with plain printk(KERN_{INFO/ERR} "S: ...).
> > 
> > Then it compiles, will let you know shortly how it works out.
> 
> The last 24 lines before network stopped working:
[...]
> S: 000000c0
> FEmp
> S: 000000c0
> FEmp
> S: 00000040
> S: 000000c4
> S: 00000044
> S: 00000040

Can you revert the last patch and apply the one below ?

Network traffic capture at the remote end of the link would be welcome,
especially ethernet MAC control frames.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 7d9c650..b79fa86 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -287,6 +287,7 @@ enum rtl_registers {
 					/* Unlimited maximum PCI burst. */
 #define	RX_DMA_BURST			(7 << RXCFG_DMA_SHIFT)
 
+	TimerCount	= 0x48,
 	RxMissed	= 0x4c,
 	Cfg9346		= 0x50,
 	Config0		= 0x51,
@@ -395,6 +396,7 @@ enum rtl_register_content {
 	/* InterruptStatusBits */
 	SYSErr		= 0x8000,
 	PCSTimeout	= 0x4000,
+	RxFIFOEmpty	= 0x0200,	/* 816x something only ? */
 	SWInt		= 0x0100,
 	TxDescUnavail	= 0x0080,
 	RxFIFOOver	= 0x0040,
@@ -5110,8 +5112,9 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	wmb();
+	mmiowb();
 
+	RTL_W32(TimerCount, 12500);
 	RTL_W8(TxPoll, NPQ);
 
 	if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
@@ -5222,15 +5225,6 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
 			netif_wake_queue(dev);
 		}
-		/*
-		 * 8168 hack: TxPoll requests are lost when the Tx packets are
-		 * too close. Let's kick an extra TxPoll request when a burst
-		 * of start_xmit activity is detected (if it is not detected,
-		 * it is slow enough). -- FR
-		 */
-		smp_rmb();
-		if (tp->cur_tx != dirty_tx)
-			RTL_W8(TxPoll, NPQ);
 	}
 }
 
@@ -5379,11 +5373,21 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			break;
 		}
 
+		if (unlikely(status & PCSTimeout)) {
+			printk(KERN_INFO "%08lx %08x %08x %08x %08x %08x\n",
+			       jiffies, status, tp->cur_rx, tp->dirty_rx,
+			       tp->cur_tx, tp->dirty_tx);
+			smp_rmb();
+			if (tp->cur_tx != tp->dirty_tx)
+				RTL_W8(TxPoll, NPQ);
+		}
+
 		if (unlikely(status & RxFIFOOver)) {
 			switch (tp->mac_version) {
+				int i;
+
 			/* Work around for rx fifo overflow */
 			case RTL_GIGA_MAC_VER_11:
-			case RTL_GIGA_MAC_VER_22:
 			case RTL_GIGA_MAC_VER_26:
 				netif_stop_queue(dev);
 				rtl8169_tx_timeout(dev);
@@ -5399,6 +5403,21 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 			case RTL_GIGA_MAC_VER_28:
 			case RTL_GIGA_MAC_VER_31:
 			/* Experimental science. Pktgen proof. */
+			case RTL_GIGA_MAC_VER_22:
+				printk(KERN_INFO "%08lx %08x %08x %08x %08x %08x\n",
+				       jiffies, status,
+				       tp->cur_rx, tp->dirty_rx,
+				       tp->cur_tx, tp->dirty_tx);
+				for (i = 0; i < 4000000; i++) {
+					if (RTL_R16(IntrStatus) & RxFIFOEmpty) {
+						RTL_W16(IntrStatus, RxFIFOOver);
+						printk(KERN_INFO "FEmp %d\n", i);
+						break;
+					}
+					udelay(10);
+				}
+				if (i >= 4000000)
+					printk(KERN_ERR "no FEmp\n");
 			case RTL_GIGA_MAC_VER_12:
 			case RTL_GIGA_MAC_VER_25:
 				if (status == RxFIFOOver)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 14:47                     ` Francois Romieu
@ 2011-07-28 21:01                       ` Kasper Dupont
  2011-08-05 14:08                         ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-07-28 21:01 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 28/07/11 16.47, Francois Romieu wrote:
> Kasper Dupont <kasperd@cpvhh.24.jul.2011.kasperd.net> :
> > On 28/07/11 14.45, Kasper Dupont wrote:
> > > On 28/07/11 14.23, Francois Romieu wrote:
> > > > You can replace them with plain printk(KERN_{INFO/ERR} "S: ...).
> > > 
> > > Then it compiles, will let you know shortly how it works out.
> > 
> > The last 24 lines before network stopped working:
> [...]
> > S: 000000c0
> > FEmp
> > S: 000000c0
> > FEmp
> > S: 00000040
> > S: 000000c4
> > S: 00000044
> > S: 00000040
> 
> Can you revert the last patch and apply the one below ?

I tested it and put a picture of the output in the same place
as the error messages from before. If necessary I can try to
copy the dmesg output back over the other network interface
(assuming it is still up when the first one stops working).

> 
> Network traffic capture at the remote end of the link would be welcome,
> especially ethernet MAC control frames.

I did a dump, but it showed nothing useful. The last packet
from the machine before the network stopped was a TCP ACK.
The sending machine send three more packets before it slowed
down, then send another three packets at lower speed and then
started sending arp requests, which were not answered.

There were three separate switches on the path between the
two machines. If necessary I can try to set up a network
with just a direct link between two hosts. I am not sure
which control frames you are referring to. Will tcpdump
capture them by default?

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-07-28 21:01                       ` Kasper Dupont
@ 2011-08-05 14:08                         ` Kasper Dupont
  2011-08-05 14:40                           ` Kasper Dupont
  0 siblings, 1 reply; 13+ messages in thread
From: Kasper Dupont @ 2011-08-05 14:08 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

I did a bit more of experiments. I took the unmodified
2.6.32.43 kernel and added printk statements to see when
it entered the interrupt handler and when it left it.

That way I was able to confirm that the system locked
up inside the interrupt handler.

Next I added printk statements to see how many times the
loop in the interrupt handler was run. It seemed that
when it locked up inside the handler it would run the
loop just two times and then lock up before leaving the
handler.

I added more printk statements to see which branches were
taken inside the loop. Unfortunately those printk
statements changed the timing enough that the crashes
were no longer as reproducable.

I saw a pattern repeating. It would do the stop queue
thing, then leave the handler and while not inside this
interrupt handler there would be a message about the
interface coming up again. Seems like it was doing stop
queue calls much more frequently than it should be.

After a few attempts I managed to get it to lock up again
with all the printk statements in place. What I found was
that in the beginning of the loop status was 0x85. It
would then call the napi event code. At the end of the
first itteration of the loop status was 0.

At that point it did not itterate through the loop again
and it did not leave the interrupt handler either. I'll
power cycle the machine and take a closer look on the
source to see what could possible be happening at that
point.

I also did a bit of testing with the patches that causes
it to drop the network instead of crashing. On those I
am able to bring up the second interface and get data off
the machine for debugging, so if there is any debug info
you think would be useful in those cases, let me know.

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169 driver crashes in 2.6.32.43
  2011-08-05 14:08                         ` Kasper Dupont
@ 2011-08-05 14:40                           ` Kasper Dupont
  0 siblings, 0 replies; 13+ messages in thread
From: Kasper Dupont @ 2011-08-05 14:40 UTC (permalink / raw)
  To: Francois Romieu; +Cc: ivecera, hayeswang, gregkh, netdev

On 05/08/11 16.08, Kasper Dupont wrote:
> At that point it did not itterate through the loop again
> and it did not leave the interrupt handler either. I'll
> power cycle the machine and take a closer look on the
> source to see what could possible be happening at that
> point.

I looked at the source around the place where that lockup
happened. There was absolutely no I/O or loops happening
between the two printk calls. It seemed the only real
candidate for a culprit responsible for that lockup was
the printk calls themselves.

Is it plausible that in 2.6.32.43 it is not safe to call
printk from within an interrupt handler?

I added some more printk statements in an attempt to find
out how it was possible for the code to lock up between
the end of the loop and the exit from the interrupt handler.

I wasn't able to reproduce the lockup in the same spot, but
instead I saw a lockup inside the loop in the branch where
it does netif_stop_queue.

Right now I suspect those builds where I added printk
statements lockup due to the printk statements. But does
the plain 2.6.32.43 kernel then also lockup due to printk
statements in the interrupt handler, or is it something
else?

-- 
Kasper Dupont -- Rigtige mænd skriver deres egne backupprogrammer
#define _(_)"d.%.4s%."_"2s" /* This is my email address */
char*_="@2kaspner"_()"%03"_("4s%.")"t\n";printf(_+11,_+6,_,11,_+2,_+7,_+6);

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-08-05 14:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110724195831.GA8718@colin.search.kasperd.net>
     [not found] ` <20110724201626.GB24418@zoreil.com>
2011-07-25 10:36   ` r8169 driver crashes in 2.6.32.43 Kasper Dupont
2011-07-28  7:04     ` Francois Romieu
2011-07-28  8:48       ` Kasper Dupont
2011-07-28 10:58         ` Francois Romieu
2011-07-28 11:43           ` Kasper Dupont
2011-07-28 11:59             ` Kasper Dupont
2011-07-28 12:23               ` Francois Romieu
2011-07-28 12:45                 ` Kasper Dupont
2011-07-28 12:54                   ` Kasper Dupont
2011-07-28 14:47                     ` Francois Romieu
2011-07-28 21:01                       ` Kasper Dupont
2011-08-05 14:08                         ` Kasper Dupont
2011-08-05 14:40                           ` Kasper Dupont

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.