All of lore.kernel.org
 help / color / mirror / Atom feed
* r8169 hard-freezes the system on big network loads
@ 2011-08-14 11:08 Kjun Chen
  2011-08-21 12:33 ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Kjun Chen @ 2011-08-14 11:08 UTC (permalink / raw)
  To: netdev; +Cc: romieu, nic_swsd

Hi,

as I have mentioned to linux-kernel, this is perfectly reproducible: receiving 
70 MB/s or more freezes my laptop (Dell Vostro, amd64, 6 GB RAM, 8x Intel Core 
i7 CPU Q 740 @ 1.73GHz) completely, sometimes within seconds, sometimes only 
after a minute.

Watching the normal console I get loads of

r8169 0000:13:00.0: eth0: link up
r8169 0000:13:00.0: eth0: link up
[...]

one message about every 1-2 seconds (sometimes even 2 per second) while 
network is active on 2.6.37.6. Up to the latest kernel (3.0.1) this freeze 
happens. However, 2.6.32.28 works with no problems, and it doesn't show those 
"eth0: link up" messages. I haven't tried kernels between .32 and .37.

lspci says:

13:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI 
Express Gigabit Ethernet controller (rev 03)

The solution with 2.6.37 and above: use the r8168 module from the realtek 
website. I have tested it with >30 GB at rates of 112 MB/s and experienced no 
freezes anymore.

If you need any other information or help, please let me know.

cheers,
  Michael


-- 
Sambodha: The Return of True Self-Knowledge


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-08-14 11:08 r8169 hard-freezes the system on big network loads Kjun Chen
@ 2011-08-21 12:33 ` Francois Romieu
  2011-08-21 13:20   ` Michael Brade
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-21 12:33 UTC (permalink / raw)
  To: Kjun Chen; +Cc: netdev, nic_swsd, Michael Brade

(Michael, please don't use the e-mail address of your Solar meditation teacher)

Kjun Chen <kjun-chen@sambodha.org> :
[...]
> If you need any other information or help, please let me know.

The XID line included in any recent kernel dmesg by the r8169 driver would
be welcome.

Thanks.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-08-21 12:33 ` Francois Romieu
@ 2011-08-21 13:20   ` Michael Brade
  2011-08-21 22:11     ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-08-21 13:20 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, nic_swsd

On Sunday 21 August 2011 14:33:11 you wrote:
> (Michael, please don't use the e-mail address of your Solar meditation
> teacher)

Gee...?! that was an accident (and I thought I know what I am doing...)

> > If you need any other information or help, please let me know.
> 
> The XID line included in any recent kernel dmesg by the r8169 driver would
> be welcome.

r8169 0000:13:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 
f0:4d:a2:b8:ce:62, XID 083000c0 IRQ 52

hope that helps,
   Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-08-21 13:20   ` Michael Brade
@ 2011-08-21 22:11     ` Francois Romieu
  2011-08-23 13:17       ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-21 22:11 UTC (permalink / raw)
  To: Michael Brade; +Cc: netdev, nic_swsd

Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> r8169 0000:13:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 
> f0:4d:a2:b8:ce:62, XID 083000c0 IRQ 52

RTL_GIGA_MAC_VER_26

> hope that helps,

Yes. There is enough data for me to reproduce the bug with the
exact same chipset.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-08-21 22:11     ` Francois Romieu
@ 2011-08-23 13:17       ` Francois Romieu
  2011-09-11 20:16         ` Michael Brade
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-08-23 13:17 UTC (permalink / raw)
  To: Michael Brade; +Cc: netdev, nic_swsd

Francois Romieu <romieu@fr.zoreil.com> :
[...]
> Yes. There is enough data for me to reproduce the bug with the
> exact same chipset.

I can not generate a single rx error and the driver refuses to crash :o/ 

Can you apply the patch below on top of 3.1.0-rc3 and see if it makes
a difference ?

Thanks.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 02339b3..c54ed17 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -5326,10 +5326,6 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 				dev->stats.rx_length_errors++;
 			if (status & RxCRC)
 				dev->stats.rx_crc_errors++;
-			if (status & RxFOVF) {
-				rtl8169_schedule_work(dev, rtl8169_reset_task);
-				dev->stats.rx_fifo_errors++;
-			}
 			rtl8169_mark_to_asic(desc, rx_buf_sz);
 		} else {
 			struct sk_buff *skb;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-08-23 13:17       ` Francois Romieu
@ 2011-09-11 20:16         ` Michael Brade
  2011-09-13  8:11           ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-09-11 20:16 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, nic_swsd

On Tuesday 23 August 2011 15:17:26 Francois Romieu wrote:
> Francois Romieu <romieu@fr.zoreil.com> :
> [...]
> 
> > Yes. There is enough data for me to reproduce the bug with the
> > exact same chipset.
> 
> I can not generate a single rx error and the driver refuses to crash :o/
> 
> Can you apply the patch below on top of 3.1.0-rc3 and see if it makes
> a difference ?

Sorry for the delay, I have had only two days for email in the last few weeks 
and additionally kernel.org was and still is down.

Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well? If so, I have another bad 
news: 3.0.1 still crashes with this patch. It took me a lot longer to crash it 
but eventually it did happen. Not sure why it took longer, I guess I didn't 
generate enough throughput.

If you want me to use 3.1.0 then we'll have to wait until git.kernel.org is 
back...

thanks,
  Michael


 
> Thanks.
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 02339b3..c54ed17 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -5326,10 +5326,6 @@ static int rtl8169_rx_interrupt(struct net_device
> *dev, dev->stats.rx_length_errors++;
>  			if (status & RxCRC)
>  				dev->stats.rx_crc_errors++;
> -			if (status & RxFOVF) {
> -				rtl8169_schedule_work(dev, rtl8169_reset_task);
> -				dev->stats.rx_fifo_errors++;
> -			}
>  			rtl8169_mark_to_asic(desc, rx_buf_sz);
>  		} else {
>  			struct sk_buff *skb;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-09-11 20:16         ` Michael Brade
@ 2011-09-13  8:11           ` Francois Romieu
  2011-09-14 21:36             ` Michael Brade
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-09-13  8:11 UTC (permalink / raw)
  To: Michael Brade; +Cc: netdev, nic_swsd, Hayes

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?

:o(

Almost any release may exhibit the bug. The attached patch (#0003)
should be a better candidate as an official fix though.

> If so, I have another bad news: 3.0.1 still crashes with this patch.
> It took me a lot longer to crash it but eventually it did happen.
> Not sure why it took longer, I guess I didn't generate enough throughput.

It sure sucks from a user experience viewpoint but it is not _that_ bad.

Are the symptoms in any way different or do you still notice more-or-less
periodic link-up messages and no real network traffic ?

> If you want me to use 3.1.0 then we'll have to wait until git.kernel.org is 
> back...

https://github.com/torvalds/linux.git is available in the meantime.

You will want the patch below as well if you try 3.1-rc6.

[PATCH] r8169: don't reset software ring indexes after disabling hardware Rx.

Bad things happen when the driver resets ring indexes after disabling
hardware Rx (and Tx) in the RxFIFOOver event recovery path of the irq
handler while it races with the NAPI Rx processing method.

Ring indexes init is now done before enabling hardware Rx / Tx.

NB: this is not a straight candidate for -stable since it is coupled
with commit 92fc43b4159b518f5baae57301f26d770b0834c9 (July 11).

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 05566b1..22b9c7a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -717,7 +717,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev);
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance);
 static int rtl8169_init_ring(struct net_device *dev);
-static void rtl_hw_start(struct net_device *dev);
+static void rtl_start(struct net_device *dev);
 static int rtl8169_close(struct net_device *dev);
 static void rtl_set_rx_mode(struct net_device *dev);
 static void rtl8169_tx_timeout(struct net_device *dev);
@@ -3589,8 +3589,6 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
 			break;
 		udelay(100);
 	}
-
-	rtl8169_init_ring_indexes(tp);
 }
 
 static int __devinit
@@ -3948,7 +3946,7 @@ static int rtl8169_open(struct net_device *dev)
 
 	rtl_pll_power_up(tp);
 
-	rtl_hw_start(dev);
+	rtl_start(dev);
 
 	tp->saved_wolopts = 0;
 	pm_runtime_put_noidle(&pdev->dev);
@@ -4014,10 +4012,14 @@ static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
 		(InterFrameGap << TxInterFrameGapShift));
 }
 
-static void rtl_hw_start(struct net_device *dev)
+static void rtl_start(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 
+	rtl8169_init_ring_indexes(tp);
+
+	smp_mb();
+
 	tp->hw_start(dev);
 
 	netif_start_queue(dev);
@@ -4997,7 +4999,7 @@ static void rtl8169_reset_task(struct work_struct *work)
 	rtl8169_tx_clear(tp);
 
 	rtl8169_hw_reset(tp);
-	rtl_hw_start(dev);
+	rtl_start(dev);
 	netif_wake_queue(dev);
 	rtl8169_check_link_status(dev, tp, tp->mmio_addr);
 
-- 
1.7.6


[-- Attachment #2: 0003-r8169-remove-erroneous-processing-of-always-set-bit.patch --]
[-- Type: text/plain, Size: 1589 bytes --]

>From 44071c614418d9cae2faab8307307578d104065b Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 25 Aug 2011 18:47:24 +0200
Subject: [PATCH 3/3] r8169: remove erroneous processing of always set bit.

When set, RxFOVF (resp. RxBOVF) is always 1 (resp. 0).

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
---
 drivers/net/r8169.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 22b9c7a..19b91a8 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -407,6 +407,7 @@ enum rtl_register_content {
 	RxOK		= 0x0001,
 
 	/* RxStatusDesc */
+	RxBOVF	= (1 << 24),
 	RxFOVF	= (1 << 23),
 	RxRWT	= (1 << 22),
 	RxRES	= (1 << 21),
@@ -682,6 +683,7 @@ struct rtl8169_private {
 	struct mii_if_info mii;
 	struct rtl8169_counters counters;
 	u32 saved_wolopts;
+	u32 opts1_mask;
 
 	struct rtl_fw {
 		const struct firmware *fw;
@@ -3782,6 +3784,9 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	tp->intr_event = cfg->intr_event;
 	tp->napi_event = cfg->napi_event;
 
+	tp->opts1_mask = (tp->mac_version != RTL_GIGA_MAC_VER_01) ?
+		~(RxBOVF | RxFOVF) : ~0;
+
 	init_timer(&tp->timer);
 	tp->timer.data = (unsigned long) dev;
 	tp->timer.function = rtl8169_phy_timer;
@@ -5323,7 +5328,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 		u32 status;
 
 		rmb();
-		status = le32_to_cpu(desc->opts1);
+		status = le32_to_cpu(desc->opts1) & tp->opts1_mask;
 
 		if (status & DescOwn)
 			break;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-09-13  8:11           ` Francois Romieu
@ 2011-09-14 21:36             ` Michael Brade
  2011-09-15  0:03               ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Brade @ 2011-09-14 21:36 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, nic_swsd, Hayes

On Tuesday 13 September 2011 10:11:26 you wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
> 
> > Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?
> :
> :o(
> 
> Almost any release may exhibit the bug. The attached patch (#0003)
> should be a better candidate as an official fix though.

ok, good news: I did not experience any freeze anymore even though I 
transfered 60 GB. And I applied both of your patches and 

-                   if (status & RxFOVF) {
-                           rtl8169_schedule_work(dev, rtl8169_reset_task);
-                           dev->stats.rx_fifo_errors++;
-                   }

 
> > If so, I have another bad news: 3.0.1 still crashes with this patch.
> > It took me a lot longer to crash it but eventually it did happen.
> > Not sure why it took longer, I guess I didn't generate enough throughput.
> 
> It sure sucks from a user experience viewpoint but it is not _that_ bad.

I disagree - I actually lose data because I mount my data and backups with 
iSCSI and exactly then it crashes.

> Are the symptoms in any way different or do you still notice more-or-less
> periodic link-up messages and no real network traffic ?

dmesg looks like this:

[ 1611.380420] r8169 0000:13:00.0: eth0: link up
[ 1611.995417] r8169 0000:13:00.0: eth0: link up
[ 1612.323050] r8169 0000:13:00.0: eth0: link up
[ 1612.574016] r8169 0000:13:00.0: eth0: link up
[ 1613.450630] r8169 0000:13:00.0: eth0: link up
[ 1613.929383] r8169 0000:13:00.0: eth0: link up
[ 1614.950939] r8169 0000:13:00.0: eth0: link up
[ 1615.699660] r8169 0000:13:00.0: eth0: link up
[ 1616.005507] r8169 0000:13:00.0: eth0: link up
[ 1616.746199] r8169 0000:13:00.0: eth0: link up
[ 1617.879670] r8169 0000:13:00.0: eth0: link up
[ 1618.461433] r8169 0000:13:00.0: eth0: link up

so yes but what do you mean with "no real network traffic"? I still get
100 MB/s.

cheers,
  Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-09-14 21:36             ` Michael Brade
@ 2011-09-15  0:03               ` Francois Romieu
  2011-09-15 10:26                 ` Michael Brade
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2011-09-15  0:03 UTC (permalink / raw)
  To: Michael Brade; +Cc: netdev, nic_swsd, Hayes

Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> ok, good news: I did not experience any freeze anymore even though I 
> transfered 60 GB. And I applied both of your patches and 
> 
> -                   if (status & RxFOVF) {
> -                           rtl8169_schedule_work(dev, rtl8169_reset_task);
> -                           dev->stats.rx_fifo_errors++;
> -                   }

It should not be necessary to remove this part : the status mask is
supposed to take care of it. One of my patches is wrong if this part
needs to go away.

[...]
> > Are the symptoms in any way different or do you still notice more-or-less
> > periodic link-up messages and no real network traffic ?
> 
> dmesg looks like this:
> 
> [ 1611.380420] r8169 0000:13:00.0: eth0: link up
> [ 1611.995417] r8169 0000:13:00.0: eth0: link up
> [ 1612.323050] r8169 0000:13:00.0: eth0: link up
> [ 1612.574016] r8169 0000:13:00.0: eth0: link up

I will have to figure why there are so much of theses messages.

[...]
> so yes but what do you mean with "no real network traffic"? I still get
> 100 MB/s.

100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit / fast}
ethernet link ?

Thanks.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: r8169 hard-freezes the system on big network loads
  2011-09-15  0:03               ` Francois Romieu
@ 2011-09-15 10:26                 ` Michael Brade
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Brade @ 2011-09-15 10:26 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, nic_swsd, Hayes

On Thursday 15 September 2011 02:03:32 Francois Romieu wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
>
> > ok, good news: I did not experience any freeze anymore even though I
> > transfered 60 GB. And I applied both of your patches and
> >
> > -                   if (status & RxFOVF) {
> > -                           rtl8169_schedule_work(dev,
> > rtl8169_reset_task); -                          
> > dev->stats.rx_fifo_errors++;
> > -                   }
>
> It should not be necessary to remove this part : the status mask is
> supposed to take care of it. One of my patches is wrong if this part
> needs to go away.

ok, I only removed it because you told me so the first time.

> [...]
>
> > so yes but what do you mean with "no real network traffic"? I still get
> > 100 MB/s.
>
> 100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit /
> fast} ethernet link ?

100 Mbytes on a gigabit link, so almost 100% usage (with ups and downs, of course; maybe 
between 90 MB/s and 112 MB/s).

thanks,
  Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-09-15 10:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-14 11:08 r8169 hard-freezes the system on big network loads Kjun Chen
2011-08-21 12:33 ` Francois Romieu
2011-08-21 13:20   ` Michael Brade
2011-08-21 22:11     ` Francois Romieu
2011-08-23 13:17       ` Francois Romieu
2011-09-11 20:16         ` Michael Brade
2011-09-13  8:11           ` Francois Romieu
2011-09-14 21:36             ` Michael Brade
2011-09-15  0:03               ` Francois Romieu
2011-09-15 10:26                 ` Michael Brade

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.