All of lore.kernel.org
 help / color / mirror / Atom feed
* r8169 OOPSen in rtl_rx
@ 2013-08-13  9:43 Peter Zijlstra
  2013-08-13 21:15 ` Francois Romieu
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-13  9:43 UTC (permalink / raw)
  To: nic_swsd, romieu; +Cc: netdev

Hi r8169 people,

I've got an AMD x86_64 machine with two realtek NICs:

01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)

I currently run a 3.10.0.6 based kernel on the machine and frequently
(several times a week) get OOPSen in the rtl_rx path. Now the horribly
sad part is that this machine doesn't (currently) have a working serial
line -- its got pins on the board but I need to go hunt for an expansion
bracket for it :/

I recently added the RTL8111 (rev 6) card so that this machine could
do firewall duties (it was a general server using the RTL-8169 for a
long time before that and always ran without problems).

I have tried netconsole, but that's not working, which leads me to
believe its the inward-facing NIC that's buggered -- which would be the
RTL-8169 (rev 10) -- pure speculation though, it could just crash hard
enough for nothing to really work anymore.

The video-card also doesn't support 80x50/60 text modes and
KMS/framebuffer also didn't work (as in, I get graphics based text at
high res but OOPSen don't actually make it to the screen).

So all I've got to offer currently is a partial backtrace -- see
attached image. Partial transcribe:

  ? rtl8169_try_rx_copy.isra.77
  rtl_rx
  rtl8169_poll
  net_rx_action
  ? get_vtime_delta
  __do_softirq
  irq_exit
  do_IRQ
  common_interrupt
  ? native_safe_halt
  ? rcu_eqs_enter_common.isra.48
  default_idle
  amd_e400_idle
  arch_cpu_idle
  cpu_idle_loop
  ...

I did look at the r8169 log between 3.10 and current head and there
wasn't anything obviously related to RX crashes so I haven't upgraded to
3.11-rc; if you think I should try please say so.

I'm also willing to try patches -- although as said, reproduction can
take a few days -- although sometimes I'm 'lucky' and it crashes
multiple times a day :/

~ Peter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 OOPSen in rtl_rx
  2013-08-13  9:43 r8169 OOPSen in rtl_rx Peter Zijlstra
@ 2013-08-13 21:15 ` Francois Romieu
  2013-08-14  9:29   ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Francois Romieu @ 2013-08-13 21:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: nic_swsd, netdev

Peter Zijlstra <peterz@infradead.org> :
[...]
> I've got an AMD x86_64 machine with two realtek NICs:
> 
> 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)

Which XID (see kernel dmesg) ?

[...]
> So all I've got to offer currently is a partial backtrace -- see
> attached image.

(no attachment)

> Partial transcribe:
> 
>   ? rtl8169_try_rx_copy.isra.77

/me scratches head.

You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and
see if it helps.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 OOPSen in rtl_rx
  2013-08-13 21:15 ` Francois Romieu
@ 2013-08-14  9:29   ` Peter Zijlstra
  2013-08-14  9:52     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-14  9:29 UTC (permalink / raw)
  To: Francois Romieu; +Cc: nic_swsd, netdev

[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]

On Tue, Aug 13, 2013 at 11:15:34PM +0200, Francois Romieu wrote:
> Peter Zijlstra <peterz@infradead.org> :
> [...]
> > I've got an AMD x86_64 machine with two realtek NICs:
> > 
> > 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
> > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
> 
> Which XID (see kernel dmesg) ?

$ dmesg | grep -i xid
[    1.706427] r8169 0000:01:08.0 eth0: RTL8110s at 0xffffc9000063ac00, 00:0e:2e:87:8b:70, XID 04000000 IRQ 18
[    1.717012] r8169 0000:03:00.0 eth1: RTL8168e/8111e at 0xffffc90000646000, a0:f3:c1:00:74:a3, XID 0c200000 IRQ 43

> > So all I've got to offer currently is a partial backtrace -- see
> > attached image.
> 
> (no attachment)

Oh, duh.. 

> > Partial transcribe:
> > 
> >   ? rtl8169_try_rx_copy.isra.77
> 
> /me scratches head.
> 
> You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and
> see if it helps.

So eth0 runs at 1000Mb/s but has a MTU:1500, eth1 runs at 100Mb/s also
MTU:1500.

I'll try a kernel with the below. Hopefully the change in dumpstack_64.c
will avoid printing the 'process' stack and give a little more useful
information.

Will let you know.

Thanks

---
 arch/x86/kernel/dumpstack_64.c       | 7 ++++---
 drivers/net/ethernet/realtek/r8169.c | 2 ++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index addb207..f76e98f 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -182,7 +182,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 				stack = (unsigned long *) (irq_stack_end[-1]);
 				irq_stack_end = NULL;
 				ops->stack(data, "EOI");
-				continue;
+				goto out;
 			}
 		}
 		break;
@@ -192,6 +192,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	 * This handles the process stack:
 	 */
 	bp = ops->walk_stack(tinfo, stack, bp, ops, data, NULL, &graph);
+out:
 	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
@@ -231,8 +232,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 				pr_cont(" <EOI> ");
 			}
 		} else {
-		if (((long) stack & (THREAD_SIZE-1)) == 0)
-			break;
+			if (((long) stack & (THREAD_SIZE-1)) == 0)
+				break;
 		}
 		if (i && ((i % STACKSLOTS_PER_LINE) == 0))
 			pr_cont("\n");
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 393f961..76d1c18 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 			else
 				pkt_size = status & 0x00003fff;
 
+			WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN));
+
 			/*
 			 * The driver does not support incoming fragmented
 			 * frames. They are seen as a symptom of over-mtu

[-- Attachment #2: IMG_20130810_195601.jpg --]
[-- Type: image/jpeg, Size: 301515 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: r8169 OOPSen in rtl_rx
  2013-08-14  9:29   ` Peter Zijlstra
@ 2013-08-14  9:52     ` Peter Zijlstra
  2013-09-05 15:20       ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-14  9:52 UTC (permalink / raw)
  To: Francois Romieu; +Cc: nic_swsd, netdev

On Wed, Aug 14, 2013 at 11:29:15AM +0200, Peter Zijlstra wrote:
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 393f961..76d1c18 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
>  			else
>  				pkt_size = status & 0x00003fff;
>  
> +			WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN));
> +
>  			/*
>  			 * The driver does not support incoming fragmented
>  			 * frames. They are seen as a symptom of over-mtu

OK, I changed that to:

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 393f961..81e0bf4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 			else
 				pkt_size = status & 0x00003fff;
 
+			if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) {
+				dev->stats.rx_dropped++;
+				printk("%s Funny sized packet: %d\n", dev->name, pkt_size);
+				goto release_descriptor;
+			}
+
 			/*
 			 * The driver does not support incoming fragmented
 			 * frames. They are seen as a symptom of over-mtu

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: r8169 OOPSen in rtl_rx
  2013-08-14  9:52     ` Peter Zijlstra
@ 2013-09-05 15:20       ` Peter Zijlstra
  2013-09-05 23:09         ` Francois Romieu
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-09-05 15:20 UTC (permalink / raw)
  To: Francois Romieu; +Cc: nic_swsd, netdev

On Wed, Aug 14, 2013 at 11:52:33AM +0200, Peter Zijlstra wrote:
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 393f961..81e0bf4 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
>  			else
>  				pkt_size = status & 0x00003fff;
>  
> +			if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) {
> +				dev->stats.rx_dropped++;
> +				printk("%s Funny sized packet: %d\n", dev->name, pkt_size);
> +				goto release_descriptor;
> +			}
> +
>  			/*
>  			 * The driver does not support incoming fragmented
>  			 * frames. They are seen as a symptom of over-mtu

Yay, it triggered..

$ dmesg | awk '/Funny sized packet/ { t[$6]++ } END { for (i in t) {
printf "%d %d\n", t[i], i; } }' | sort -n
1 4237
1 4983
1 5811
1 6062
1 6594
2 10709
2 12073
2 9197
4 14624
4 14870
266 16364

dev->name is always the same and the internal NIC (eth0, RTL8110s).

When it happens the NIC stops working as every packet is mal-sized,
however an ifconfig down; ifconfig up will restore it to working order.

It appears to happen when I saturate my outside link such that all
packets are fwd to the internal network -- I've got a 30Mbit/s down link
which isn't all that much given its a GBE capable card.

When I try and saturate the internal nic, with traffic from the firewall
to an internal machine we reach GBE speeds but nothing falls over.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 OOPSen in rtl_rx
  2013-09-05 15:20       ` Peter Zijlstra
@ 2013-09-05 23:09         ` Francois Romieu
  0 siblings, 0 replies; 6+ messages in thread
From: Francois Romieu @ 2013-09-05 23:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: nic_swsd, netdev

Peter Zijlstra <peterz@infradead.org> :
[...]
> Yay, it triggered..

Bingo.

Can you display the whole descriptor entry (opts1 and opts2) and its
index (cur_rx) when abnormal packets are detected ?
We can always check the packet size but I'd welcome some more specific
pattern in the remaining bits of the descriptor.

Btw, you may try to revert aee77e4accbeb2c86b1d294cd84fec4a12dde3bd
("r8169: use unlimited DMA burst for TX") and see if it changes the
Rx / Tx balance. It would only be a bandaid though.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-05 23:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-13  9:43 r8169 OOPSen in rtl_rx Peter Zijlstra
2013-08-13 21:15 ` Francois Romieu
2013-08-14  9:29   ` Peter Zijlstra
2013-08-14  9:52     ` Peter Zijlstra
2013-09-05 15:20       ` Peter Zijlstra
2013-09-05 23:09         ` Francois Romieu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.