* r8169 OOPSen in rtl_rx
@ 2013-08-13 9:43 Peter Zijlstra
2013-08-13 21:15 ` Francois Romieu
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-13 9:43 UTC (permalink / raw)
To: nic_swsd, romieu; +Cc: netdev
Hi r8169 people,
I've got an AMD x86_64 machine with two realtek NICs:
01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
I currently run a 3.10.0.6 based kernel on the machine and frequently
(several times a week) get OOPSen in the rtl_rx path. Now the horribly
sad part is that this machine doesn't (currently) have a working serial
line -- its got pins on the board but I need to go hunt for an expansion
bracket for it :/
I recently added the RTL8111 (rev 6) card so that this machine could
do firewall duties (it was a general server using the RTL-8169 for a
long time before that and always ran without problems).
I have tried netconsole, but that's not working, which leads me to
believe its the inward-facing NIC that's buggered -- which would be the
RTL-8169 (rev 10) -- pure speculation though, it could just crash hard
enough for nothing to really work anymore.
The video-card also doesn't support 80x50/60 text modes and
KMS/framebuffer also didn't work (as in, I get graphics based text at
high res but OOPSen don't actually make it to the screen).
So all I've got to offer currently is a partial backtrace -- see
attached image. Partial transcribe:
? rtl8169_try_rx_copy.isra.77
rtl_rx
rtl8169_poll
net_rx_action
? get_vtime_delta
__do_softirq
irq_exit
do_IRQ
common_interrupt
? native_safe_halt
? rcu_eqs_enter_common.isra.48
default_idle
amd_e400_idle
arch_cpu_idle
cpu_idle_loop
...
I did look at the r8169 log between 3.10 and current head and there
wasn't anything obviously related to RX crashes so I haven't upgraded to
3.11-rc; if you think I should try please say so.
I'm also willing to try patches -- although as said, reproduction can
take a few days -- although sometimes I'm 'lucky' and it crashes
multiple times a day :/
~ Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx
2013-08-13 9:43 r8169 OOPSen in rtl_rx Peter Zijlstra
@ 2013-08-13 21:15 ` Francois Romieu
2013-08-14 9:29 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Francois Romieu @ 2013-08-13 21:15 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: nic_swsd, netdev
Peter Zijlstra <peterz@infradead.org> :
[...]
> I've got an AMD x86_64 machine with two realtek NICs:
>
> 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Which XID (see kernel dmesg) ?
[...]
> So all I've got to offer currently is a partial backtrace -- see
> attached image.
(no attachment)
> Partial transcribe:
>
> ? rtl8169_try_rx_copy.isra.77
/me scratches head.
You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and
see if it helps.
--
Ueimor
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx
2013-08-13 21:15 ` Francois Romieu
@ 2013-08-14 9:29 ` Peter Zijlstra
2013-08-14 9:52 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-14 9:29 UTC (permalink / raw)
To: Francois Romieu; +Cc: nic_swsd, netdev
[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]
On Tue, Aug 13, 2013 at 11:15:34PM +0200, Francois Romieu wrote:
> Peter Zijlstra <peterz@infradead.org> :
> [...]
> > I've got an AMD x86_64 machine with two realtek NICs:
> >
> > 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
> > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
>
> Which XID (see kernel dmesg) ?
$ dmesg | grep -i xid
[ 1.706427] r8169 0000:01:08.0 eth0: RTL8110s at 0xffffc9000063ac00, 00:0e:2e:87:8b:70, XID 04000000 IRQ 18
[ 1.717012] r8169 0000:03:00.0 eth1: RTL8168e/8111e at 0xffffc90000646000, a0:f3:c1:00:74:a3, XID 0c200000 IRQ 43
> > So all I've got to offer currently is a partial backtrace -- see
> > attached image.
>
> (no attachment)
Oh, duh..
> > Partial transcribe:
> >
> > ? rtl8169_try_rx_copy.isra.77
>
> /me scratches head.
>
> You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and
> see if it helps.
So eth0 runs at 1000Mb/s but has a MTU:1500, eth1 runs at 100Mb/s also
MTU:1500.
I'll try a kernel with the below. Hopefully the change in dumpstack_64.c
will avoid printing the 'process' stack and give a little more useful
information.
Will let you know.
Thanks
---
arch/x86/kernel/dumpstack_64.c | 7 ++++---
drivers/net/ethernet/realtek/r8169.c | 2 ++
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index addb207..f76e98f 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -182,7 +182,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
stack = (unsigned long *) (irq_stack_end[-1]);
irq_stack_end = NULL;
ops->stack(data, "EOI");
- continue;
+ goto out;
}
}
break;
@@ -192,6 +192,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
* This handles the process stack:
*/
bp = ops->walk_stack(tinfo, stack, bp, ops, data, NULL, &graph);
+out:
put_cpu();
}
EXPORT_SYMBOL(dump_trace);
@@ -231,8 +232,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
pr_cont(" <EOI> ");
}
} else {
- if (((long) stack & (THREAD_SIZE-1)) == 0)
- break;
+ if (((long) stack & (THREAD_SIZE-1)) == 0)
+ break;
}
if (i && ((i % STACKSLOTS_PER_LINE) == 0))
pr_cont("\n");
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 393f961..76d1c18 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
else
pkt_size = status & 0x00003fff;
+ WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN));
+
/*
* The driver does not support incoming fragmented
* frames. They are seen as a symptom of over-mtu
[-- Attachment #2: IMG_20130810_195601.jpg --]
[-- Type: image/jpeg, Size: 301515 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx
2013-08-14 9:29 ` Peter Zijlstra
@ 2013-08-14 9:52 ` Peter Zijlstra
2013-09-05 15:20 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-08-14 9:52 UTC (permalink / raw)
To: Francois Romieu; +Cc: nic_swsd, netdev
On Wed, Aug 14, 2013 at 11:29:15AM +0200, Peter Zijlstra wrote:
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 393f961..76d1c18 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
> else
> pkt_size = status & 0x00003fff;
>
> + WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN));
> +
> /*
> * The driver does not support incoming fragmented
> * frames. They are seen as a symptom of over-mtu
OK, I changed that to:
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 393f961..81e0bf4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
else
pkt_size = status & 0x00003fff;
+ if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) {
+ dev->stats.rx_dropped++;
+ printk("%s Funny sized packet: %d\n", dev->name, pkt_size);
+ goto release_descriptor;
+ }
+
/*
* The driver does not support incoming fragmented
* frames. They are seen as a symptom of over-mtu
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx
2013-08-14 9:52 ` Peter Zijlstra
@ 2013-09-05 15:20 ` Peter Zijlstra
2013-09-05 23:09 ` Francois Romieu
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2013-09-05 15:20 UTC (permalink / raw)
To: Francois Romieu; +Cc: nic_swsd, netdev
On Wed, Aug 14, 2013 at 11:52:33AM +0200, Peter Zijlstra wrote:
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 393f961..81e0bf4 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
> else
> pkt_size = status & 0x00003fff;
>
> + if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) {
> + dev->stats.rx_dropped++;
> + printk("%s Funny sized packet: %d\n", dev->name, pkt_size);
> + goto release_descriptor;
> + }
> +
> /*
> * The driver does not support incoming fragmented
> * frames. They are seen as a symptom of over-mtu
Yay, it triggered..
$ dmesg | awk '/Funny sized packet/ { t[$6]++ } END { for (i in t) {
printf "%d %d\n", t[i], i; } }' | sort -n
1 4237
1 4983
1 5811
1 6062
1 6594
2 10709
2 12073
2 9197
4 14624
4 14870
266 16364
dev->name is always the same and the internal NIC (eth0, RTL8110s).
When it happens the NIC stops working as every packet is mal-sized,
however an ifconfig down; ifconfig up will restore it to working order.
It appears to happen when I saturate my outside link such that all
packets are fwd to the internal network -- I've got a 30Mbit/s down link
which isn't all that much given its a GBE capable card.
When I try and saturate the internal nic, with traffic from the firewall
to an internal machine we reach GBE speeds but nothing falls over.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx
2013-09-05 15:20 ` Peter Zijlstra
@ 2013-09-05 23:09 ` Francois Romieu
0 siblings, 0 replies; 6+ messages in thread
From: Francois Romieu @ 2013-09-05 23:09 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: nic_swsd, netdev
Peter Zijlstra <peterz@infradead.org> :
[...]
> Yay, it triggered..
Bingo.
Can you display the whole descriptor entry (opts1 and opts2) and its
index (cur_rx) when abnormal packets are detected ?
We can always check the packet size but I'd welcome some more specific
pattern in the remaining bits of the descriptor.
Btw, you may try to revert aee77e4accbeb2c86b1d294cd84fec4a12dde3bd
("r8169: use unlimited DMA burst for TX") and see if it changes the
Rx / Tx balance. It would only be a bandaid though.
--
Ueimor
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-09-05 23:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-13 9:43 r8169 OOPSen in rtl_rx Peter Zijlstra
2013-08-13 21:15 ` Francois Romieu
2013-08-14 9:29 ` Peter Zijlstra
2013-08-14 9:52 ` Peter Zijlstra
2013-09-05 15:20 ` Peter Zijlstra
2013-09-05 23:09 ` Francois Romieu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.