* r8169 hard-freezes the system on big network loads
@ 2011-08-14 11:08 Kjun Chen
2011-08-21 12:33 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Kjun Chen @ 2011-08-14 11:08 UTC (permalink / raw)
To: netdev; +Cc: romieu, nic_swsd
Hi,
as I have mentioned to linux-kernel, this is perfectly reproducible: receiving
70 MB/s or more freezes my laptop (Dell Vostro, amd64, 6 GB RAM, 8x Intel Core
i7 CPU Q 740 @ 1.73GHz) completely, sometimes within seconds, sometimes only
after a minute.
Watching the normal console I get loads of
r8169 0000:13:00.0: eth0: link up
r8169 0000:13:00.0: eth0: link up
[...]
one message about every 1-2 seconds (sometimes even 2 per second) while
network is active on 2.6.37.6. Up to the latest kernel (3.0.1) this freeze
happens. However, 2.6.32.28 works with no problems, and it doesn't show those
"eth0: link up" messages. I haven't tried kernels between .32 and .37.
lspci says:
13:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 03)
The solution with 2.6.37 and above: use the r8168 module from the realtek
website. I have tested it with >30 GB at rates of 112 MB/s and experienced no
freezes anymore.
If you need any other information or help, please let me know.
cheers,
Michael
--
Sambodha: The Return of True Self-Knowledge
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-08-14 11:08 r8169 hard-freezes the system on big network loads Kjun Chen
@ 2011-08-21 12:33 ` Francois Romieu
2011-08-21 13:20 ` Michael Brade
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-21 12:33 UTC (permalink / raw)
To: Kjun Chen; +Cc: netdev, nic_swsd, Michael Brade
(Michael, please don't use the e-mail address of your Solar meditation teacher)
Kjun Chen <kjun-chen@sambodha.org> :
[...]
> If you need any other information or help, please let me know.
The XID line included in any recent kernel dmesg by the r8169 driver would
be welcome.
Thanks.
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-08-21 12:33 ` Francois Romieu
@ 2011-08-21 13:20 ` Michael Brade
2011-08-21 22:11 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-08-21 13:20 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, nic_swsd
On Sunday 21 August 2011 14:33:11 you wrote:
> (Michael, please don't use the e-mail address of your Solar meditation
> teacher)
Gee...?! that was an accident (and I thought I know what I am doing...)
> > If you need any other information or help, please let me know.
>
> The XID line included in any recent kernel dmesg by the r8169 driver would
> be welcome.
r8169 0000:13:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000,
f0:4d:a2:b8:ce:62, XID 083000c0 IRQ 52
hope that helps,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-08-21 13:20 ` Michael Brade
@ 2011-08-21 22:11 ` Francois Romieu
2011-08-23 13:17 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-21 22:11 UTC (permalink / raw)
To: Michael Brade; +Cc: netdev, nic_swsd
Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> r8169 0000:13:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000,
> f0:4d:a2:b8:ce:62, XID 083000c0 IRQ 52
RTL_GIGA_MAC_VER_26
> hope that helps,
Yes. There is enough data for me to reproduce the bug with the
exact same chipset.
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-08-21 22:11 ` Francois Romieu
@ 2011-08-23 13:17 ` Francois Romieu
2011-09-11 20:16 ` Michael Brade
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-23 13:17 UTC (permalink / raw)
To: Michael Brade; +Cc: netdev, nic_swsd
Francois Romieu <romieu@fr.zoreil.com> :
[...]
> Yes. There is enough data for me to reproduce the bug with the
> exact same chipset.
I can not generate a single rx error and the driver refuses to crash :o/
Can you apply the patch below on top of 3.1.0-rc3 and see if it makes
a difference ?
Thanks.
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 02339b3..c54ed17 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5326,10 +5326,6 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
dev->stats.rx_length_errors++;
if (status & RxCRC)
dev->stats.rx_crc_errors++;
- if (status & RxFOVF) {
- rtl8169_schedule_work(dev, rtl8169_reset_task);
- dev->stats.rx_fifo_errors++;
- }
rtl8169_mark_to_asic(desc, rx_buf_sz);
} else {
struct sk_buff *skb;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-08-23 13:17 ` Francois Romieu
@ 2011-09-11 20:16 ` Michael Brade
2011-09-13 8:11 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-09-11 20:16 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, nic_swsd
On Tuesday 23 August 2011 15:17:26 Francois Romieu wrote:
> Francois Romieu <romieu@fr.zoreil.com> :
> [...]
>
> > Yes. There is enough data for me to reproduce the bug with the
> > exact same chipset.
>
> I can not generate a single rx error and the driver refuses to crash :o/
>
> Can you apply the patch below on top of 3.1.0-rc3 and see if it makes
> a difference ?
Sorry for the delay, I have had only two days for email in the last few weeks
and additionally kernel.org was and still is down.
Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well? If so, I have another bad
news: 3.0.1 still crashes with this patch. It took me a lot longer to crash it
but eventually it did happen. Not sure why it took longer, I guess I didn't
generate enough throughput.
If you want me to use 3.1.0 then we'll have to wait until git.kernel.org is
back...
thanks,
Michael
> Thanks.
>
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 02339b3..c54ed17 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -5326,10 +5326,6 @@ static int rtl8169_rx_interrupt(struct net_device
> *dev, dev->stats.rx_length_errors++;
> if (status & RxCRC)
> dev->stats.rx_crc_errors++;
> - if (status & RxFOVF) {
> - rtl8169_schedule_work(dev, rtl8169_reset_task);
> - dev->stats.rx_fifo_errors++;
> - }
> rtl8169_mark_to_asic(desc, rx_buf_sz);
> } else {
> struct sk_buff *skb;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-09-11 20:16 ` Michael Brade
@ 2011-09-13 8:11 ` Francois Romieu
2011-09-14 21:36 ` Michael Brade
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-09-13 8:11 UTC (permalink / raw)
To: Michael Brade; +Cc: netdev, nic_swsd, Hayes
[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]
Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?
:o(
Almost any release may exhibit the bug. The attached patch (#0003)
should be a better candidate as an official fix though.
> If so, I have another bad news: 3.0.1 still crashes with this patch.
> It took me a lot longer to crash it but eventually it did happen.
> Not sure why it took longer, I guess I didn't generate enough throughput.
It sure sucks from a user experience viewpoint but it is not _that_ bad.
Are the symptoms in any way different or do you still notice more-or-less
periodic link-up messages and no real network traffic ?
> If you want me to use 3.1.0 then we'll have to wait until git.kernel.org is
> back...
https://github.com/torvalds/linux.git is available in the meantime.
You will want the patch below as well if you try 3.1-rc6.
[PATCH] r8169: don't reset software ring indexes after disabling hardware Rx.
Bad things happen when the driver resets ring indexes after disabling
hardware Rx (and Tx) in the RxFIFOOver event recovery path of the irq
handler while it races with the NAPI Rx processing method.
Ring indexes init is now done before enabling hardware Rx / Tx.
NB: this is not a straight candidate for -stable since it is coupled
with commit 92fc43b4159b518f5baae57301f26d770b0834c9 (July 11).
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
---
drivers/net/r8169.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 05566b1..22b9c7a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -717,7 +717,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
struct net_device *dev);
static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance);
static int rtl8169_init_ring(struct net_device *dev);
-static void rtl_hw_start(struct net_device *dev);
+static void rtl_start(struct net_device *dev);
static int rtl8169_close(struct net_device *dev);
static void rtl_set_rx_mode(struct net_device *dev);
static void rtl8169_tx_timeout(struct net_device *dev);
@@ -3589,8 +3589,6 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
break;
udelay(100);
}
-
- rtl8169_init_ring_indexes(tp);
}
static int __devinit
@@ -3948,7 +3946,7 @@ static int rtl8169_open(struct net_device *dev)
rtl_pll_power_up(tp);
- rtl_hw_start(dev);
+ rtl_start(dev);
tp->saved_wolopts = 0;
pm_runtime_put_noidle(&pdev->dev);
@@ -4014,10 +4012,14 @@ static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
(InterFrameGap << TxInterFrameGapShift));
}
-static void rtl_hw_start(struct net_device *dev)
+static void rtl_start(struct net_device *dev)
{
struct rtl8169_private *tp = netdev_priv(dev);
+ rtl8169_init_ring_indexes(tp);
+
+ smp_mb();
+
tp->hw_start(dev);
netif_start_queue(dev);
@@ -4997,7 +4999,7 @@ static void rtl8169_reset_task(struct work_struct *work)
rtl8169_tx_clear(tp);
rtl8169_hw_reset(tp);
- rtl_hw_start(dev);
+ rtl_start(dev);
netif_wake_queue(dev);
rtl8169_check_link_status(dev, tp, tp->mmio_addr);
--
1.7.6
[-- Attachment #2: 0003-r8169-remove-erroneous-processing-of-always-set-bit.patch --]
[-- Type: text/plain, Size: 1589 bytes --]
>From 44071c614418d9cae2faab8307307578d104065b Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 25 Aug 2011 18:47:24 +0200
Subject: [PATCH 3/3] r8169: remove erroneous processing of always set bit.
When set, RxFOVF (resp. RxBOVF) is always 1 (resp. 0).
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
---
drivers/net/r8169.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 22b9c7a..19b91a8 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -407,6 +407,7 @@ enum rtl_register_content {
RxOK = 0x0001,
/* RxStatusDesc */
+ RxBOVF = (1 << 24),
RxFOVF = (1 << 23),
RxRWT = (1 << 22),
RxRES = (1 << 21),
@@ -682,6 +683,7 @@ struct rtl8169_private {
struct mii_if_info mii;
struct rtl8169_counters counters;
u32 saved_wolopts;
+ u32 opts1_mask;
struct rtl_fw {
const struct firmware *fw;
@@ -3782,6 +3784,9 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
tp->intr_event = cfg->intr_event;
tp->napi_event = cfg->napi_event;
+ tp->opts1_mask = (tp->mac_version != RTL_GIGA_MAC_VER_01) ?
+ ~(RxBOVF | RxFOVF) : ~0;
+
init_timer(&tp->timer);
tp->timer.data = (unsigned long) dev;
tp->timer.function = rtl8169_phy_timer;
@@ -5323,7 +5328,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
u32 status;
rmb();
- status = le32_to_cpu(desc->opts1);
+ status = le32_to_cpu(desc->opts1) & tp->opts1_mask;
if (status & DescOwn)
break;
--
1.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-09-13 8:11 ` Francois Romieu
@ 2011-09-14 21:36 ` Michael Brade
2011-09-15 0:03 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-09-14 21:36 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, nic_swsd, Hayes
On Tuesday 13 September 2011 10:11:26 you wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
>
> > Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?
> :
> :o(
>
> Almost any release may exhibit the bug. The attached patch (#0003)
> should be a better candidate as an official fix though.
ok, good news: I did not experience any freeze anymore even though I
transfered 60 GB. And I applied both of your patches and
- if (status & RxFOVF) {
- rtl8169_schedule_work(dev, rtl8169_reset_task);
- dev->stats.rx_fifo_errors++;
- }
> > If so, I have another bad news: 3.0.1 still crashes with this patch.
> > It took me a lot longer to crash it but eventually it did happen.
> > Not sure why it took longer, I guess I didn't generate enough throughput.
>
> It sure sucks from a user experience viewpoint but it is not _that_ bad.
I disagree - I actually lose data because I mount my data and backups with
iSCSI and exactly then it crashes.
> Are the symptoms in any way different or do you still notice more-or-less
> periodic link-up messages and no real network traffic ?
dmesg looks like this:
[ 1611.380420] r8169 0000:13:00.0: eth0: link up
[ 1611.995417] r8169 0000:13:00.0: eth0: link up
[ 1612.323050] r8169 0000:13:00.0: eth0: link up
[ 1612.574016] r8169 0000:13:00.0: eth0: link up
[ 1613.450630] r8169 0000:13:00.0: eth0: link up
[ 1613.929383] r8169 0000:13:00.0: eth0: link up
[ 1614.950939] r8169 0000:13:00.0: eth0: link up
[ 1615.699660] r8169 0000:13:00.0: eth0: link up
[ 1616.005507] r8169 0000:13:00.0: eth0: link up
[ 1616.746199] r8169 0000:13:00.0: eth0: link up
[ 1617.879670] r8169 0000:13:00.0: eth0: link up
[ 1618.461433] r8169 0000:13:00.0: eth0: link up
so yes but what do you mean with "no real network traffic"? I still get
100 MB/s.
cheers,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-09-14 21:36 ` Michael Brade
@ 2011-09-15 0:03 ` Francois Romieu
2011-09-15 10:26 ` Michael Brade
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-09-15 0:03 UTC (permalink / raw)
To: Michael Brade; +Cc: netdev, nic_swsd, Hayes
Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> ok, good news: I did not experience any freeze anymore even though I
> transfered 60 GB. And I applied both of your patches and
>
> - if (status & RxFOVF) {
> - rtl8169_schedule_work(dev, rtl8169_reset_task);
> - dev->stats.rx_fifo_errors++;
> - }
It should not be necessary to remove this part : the status mask is
supposed to take care of it. One of my patches is wrong if this part
needs to go away.
[...]
> > Are the symptoms in any way different or do you still notice more-or-less
> > periodic link-up messages and no real network traffic ?
>
> dmesg looks like this:
>
> [ 1611.380420] r8169 0000:13:00.0: eth0: link up
> [ 1611.995417] r8169 0000:13:00.0: eth0: link up
> [ 1612.323050] r8169 0000:13:00.0: eth0: link up
> [ 1612.574016] r8169 0000:13:00.0: eth0: link up
I will have to figure why there are so much of theses messages.
[...]
> so yes but what do you mean with "no real network traffic"? I still get
> 100 MB/s.
100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit / fast}
ethernet link ?
Thanks.
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: r8169 hard-freezes the system on big network loads
2011-09-15 0:03 ` Francois Romieu
@ 2011-09-15 10:26 ` Michael Brade
0 siblings, 0 replies; 10+ messages in thread
From: Michael Brade @ 2011-09-15 10:26 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, nic_swsd, Hayes
On Thursday 15 September 2011 02:03:32 Francois Romieu wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
>
> > ok, good news: I did not experience any freeze anymore even though I
> > transfered 60 GB. And I applied both of your patches and
> >
> > - if (status & RxFOVF) {
> > - rtl8169_schedule_work(dev,
> > rtl8169_reset_task); -
> > dev->stats.rx_fifo_errors++;
> > - }
>
> It should not be necessary to remove this part : the status mask is
> supposed to take care of it. One of my patches is wrong if this part
> needs to go away.
ok, I only removed it because you told me so the first time.
> [...]
>
> > so yes but what do you mean with "no real network traffic"? I still get
> > 100 MB/s.
>
> 100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit /
> fast} ethernet link ?
100 Mbytes on a gigabit link, so almost 100% usage (with ups and downs, of course; maybe
between 90 MB/s and 112 MB/s).
thanks,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-09-15 10:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-14 11:08 r8169 hard-freezes the system on big network loads Kjun Chen
2011-08-21 12:33 ` Francois Romieu
2011-08-21 13:20 ` Michael Brade
2011-08-21 22:11 ` Francois Romieu
2011-08-23 13:17 ` Francois Romieu
2011-09-11 20:16 ` Michael Brade
2011-09-13 8:11 ` Francois Romieu
2011-09-14 21:36 ` Michael Brade
2011-09-15 0:03 ` Francois Romieu
2011-09-15 10:26 ` Michael Brade
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.