All of lore.kernel.org
 help / color / mirror / Atom feed
* atl1c issues on 3.8.2
@ 2013-03-12 15:17 Michael Büsch
  2013-03-12 15:45 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Büsch @ 2013-03-12 15:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-netdev

[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]

Hi,

Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px.
iperf (for example) does not do that. But after scp stalled the interface,
iperf transfers fail, too.


0mb@milhouse:~$ iperf -c 192.168.4.2 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.4.2, TCP port 5001
TCP window size: 96.0 KByte (default)
------------------------------------------------------------
[  5] local 192.168.4.1 port 41558 connected with 192.168.4.2 port 5001
[  4] local 192.168.4.1 port 5001 connected with 192.168.4.2 port 58296
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec   111 MBytes  93.0 Mbits/sec
[  4]  0.0-10.1 sec   105 MBytes  87.2 Mbits/sec
0mb@milhouse:~$ scp testfile mb.marge:
Enter passphrase for key '/home/mb/.ssh/key': 
testfile                                                  12% 6912KB   1.8MB/s - stalled -^testfile                                                 12% 6912KB   1.6MB/s - stalled -1mb@milhouse:~$ ^C
130mb@milhouse:~$ iperf -c 192.168.4.2 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
connect failed: No route to host



dmesg is spammed with these messages:


> [51069.954315] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51069.954409] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [51155.933162] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Down
> [51157.441946] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51276.049211] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51276.049371] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51290.233447] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51290.233641] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51305.025257] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51305.025419] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51323.305245] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51323.305405] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51338.393216] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51338.393375] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51350.739196] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Down
> [51353.810485] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51376.817238] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51376.817399] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51391.425209] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51391.425371] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>


This did not happen with earlier kernels. (But 3.7 has other issues as well. See my other mail)

Any ideas what's so special about scp?

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-12 15:17 atl1c issues on 3.8.2 Michael Büsch
@ 2013-03-12 15:45 ` Eric Dumazet
       [not found]   ` <20130312180942.4198e88e@milhouse>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-03-12 15:45 UTC (permalink / raw)
  To: Michael Büsch; +Cc: Eric Dumazet, linux-netdev

On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote:
> Hi,
> 
> Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px.
> iperf (for example) does not do that. But after scp stalled the interface,
> iperf transfers fail, too.

I am pretty sure David stable list contains the needed fix 

http://patchwork.ozlabs.org/bundle/davem/stable/?state=*

Should be included in 3.8.3 

Detail : http://patchwork.ozlabs.org/patch/221737/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
       [not found]   ` <20130312180942.4198e88e@milhouse>
@ 2013-03-13  5:57     ` Eric Dumazet
  2013-03-14 14:31       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-03-13  5:57 UTC (permalink / raw)
  To: Michael Büsch; +Cc: Eric Dumazet, linux-netdev, David S.Miller

On Tue, 2013-03-12 at 18:09 +0100, Michael Büsch wrote:
> On Tue, 12 Mar 2013 16:45:44 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote:
> > > Hi,
> > > 
> > > Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px.
> > > iperf (for example) does not do that. But after scp stalled the interface,
> > > iperf transfers fail, too.
> > 
> > I am pretty sure David stable list contains the needed fix 
> > 
> > http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> 
> No this didn't fix it.
> 
> However, I tried to revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db again,
> which already caused trouble for me in 3.7
> and this fixed the issue.
> 
> So it seems that this still is the same or a related issue that I reported
> for 3.7. I just wrongly stated that the problem was fixed in 3.8, because my
> simple ping test doesn't catch it on 3.8.
> 



kmalloc(2000) never had the guarantee that the result would not span two
4K pages.

Apparently the NIC doesn't allow a rx descriptor spanning two 4K pages
or has a particular hardware bug that I can not possibly find myself.
(I don't have atl1c nor any documentation)

atl1c driver authors will need to find the bug and fix the driver.

Drivers that deal with this kind of hardware limitation allocates page
themselves and provide skbs with a fragment to upper stack, or use
build_skb() once the frame is received.

drivers/net/ethernet/intel/igb/igb_main.c is a an example.

Could you try (on net-next tree) different values for the
NETDEV_FRAG_PAGE_MAX_ORDER constant, as it might give to Atheros some
hints ?

(8192 & 16384)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 821c7f4..769fdac 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1844,7 +1844,7 @@ static inline void __skb_queue_purge(struct sk_buff_head *list)
 		kfree_skb(skb);
 }
 
-#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
+#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(8192)
 #define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER)
 #define NETDEV_PAGECNT_MAX_BIAS	   NETDEV_FRAG_PAGE_MAX_SIZE
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-13  5:57     ` Eric Dumazet
@ 2013-03-14 14:31       ` Eric Dumazet
  2013-03-14 22:17         ` Michael Büsch
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-03-14 14:31 UTC (permalink / raw)
  To: Michael Büsch, Pavel Emelyanov
  Cc: Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman

On Wed, 2013-03-13 at 06:57 +0100, Eric Dumazet wrote:
> On Tue, 2013-03-12 at 18:09 +0100, Michael Büsch wrote:
> > On Tue, 12 Mar 2013 16:45:44 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote:
> > > > Hi,
> > > > 
> > > > Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px.
> > > > iperf (for example) does not do that. But after scp stalled the interface,
> > > > iperf transfers fail, too.
> > > 
> > > I am pretty sure David stable list contains the needed fix 
> > > 
> > > http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> > 
> > No this didn't fix it.
> > 
> > However, I tried to revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db again,
> > which already caused trouble for me in 3.7
> > and this fixed the issue.
> > 
> > So it seems that this still is the same or a related issue that I reported
> > for 3.7. I just wrongly stated that the problem was fixed in 3.8, because my
> > simple ping test doesn't catch it on 3.8.
> > 
> 
> 


And it seems the possible fix is here :

http://patchwork.ozlabs.org/patch/227666/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-14 14:31       ` Eric Dumazet
@ 2013-03-14 22:17         ` Michael Büsch
  2013-03-14 23:06           ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Büsch @ 2013-03-14 22:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Thu, 14 Mar 2013 15:31:00 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> And it seems the possible fix is here :
> 
> http://patchwork.ozlabs.org/patch/227666/

I can still reproduce with this fix applied.

However, I noticed that I cannot reproduce, if the wireless interface (ath9k) of
the netbook is down while testing the ethernet. The wireless does not carry any
test traffic. It's just idle.
I do not know if this always had been the case, because wireless was always up (and mostly
idle) in my previous ethernet tests.

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-14 22:17         ` Michael Büsch
@ 2013-03-14 23:06           ` Eric Dumazet
  2013-03-15 19:44             ` Michael Büsch
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-03-14 23:06 UTC (permalink / raw)
  To: Michael Büsch
  Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman

On Thu, 2013-03-14 at 23:17 +0100, Michael Büsch wrote:

> I can still reproduce with this fix applied.
> 
> However, I noticed that I cannot reproduce, if the wireless interface (ath9k) of
> the netbook is down while testing the ethernet. The wireless does not carry any
> test traffic. It's just idle.
> I do not know if this always had been the case, because wireless was always up (and mostly
> idle) in my previous ethernet tests.
> 

OK, then it must be kind of corruption issue in ath9k, or whatever ?

You could try various DEBUGing stuff, like CONFIG_DEBUG_PAGEALLOC and
CONFIG_SLUB_DEBUG_ON

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-14 23:06           ` Eric Dumazet
@ 2013-03-15 19:44             ` Michael Büsch
  2013-03-22 11:28               ` Michael Büsch
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Büsch @ 2013-03-15 19:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 978 bytes --]

On Fri, 15 Mar 2013 00:06:02 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> You could try various DEBUGing stuff, like CONFIG_DEBUG_PAGEALLOC and
> CONFIG_SLUB_DEBUG_ON

This bug is so weird, so I did some double-checking.
Just to minimize the mistakes on my side.
I compiled a kernel without the revert of the original commit
and without the skb fix you suggested.
It turns out that I am only able to reproduce the issue, if the ath9k interface is
up while testing the atl1c ethernet.
And I also double-checked that reverting the original commit fixes the issue.
No stalls with up or down ath9k then.
So that confirms my previous results.

I tried to enable pagealloc debug and slub debug on a kernel with the suggested skb
fix, but without the revert of the commit. Nothing special appeared
in the logs. I'm currently building a kernel with almost all debugging options
turned on. I will test that tomorrow.

Thanks for your help.

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-15 19:44             ` Michael Büsch
@ 2013-03-22 11:28               ` Michael Büsch
  2013-05-27 16:43                 ` Michael Büsch
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Büsch @ 2013-03-22 11:28 UTC (permalink / raw)
  To: Michael Büsch
  Cc: Eric Dumazet, Pavel Emelyanov, Eric Dumazet, linux-netdev,
	David S.Miller, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On Fri, 15 Mar 2013 20:44:57 +0100
Michael Büsch <m@bues.ch> wrote:

> I'm currently building a kernel with almost all debugging options
> turned on. I will test that tomorrow.

It took me a little bit longer than expected, but running the tests on
a kernel with almost all debugging options enabled shows no additional kernel messages. :/

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: atl1c issues on 3.8.2
  2013-03-22 11:28               ` Michael Büsch
@ 2013-05-27 16:43                 ` Michael Büsch
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Büsch @ 2013-05-27 16:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 265 bytes --]

Any news on this?

Am I still the only one with this issue?
It's still 100% reproducible and I can workaround it by reverting 
69b08f62e17439ee3d436faf0b9a7ca6fffb78db

It can't possibly be that I'm the only one on this planet seeing this...

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-05-27 17:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-12 15:17 atl1c issues on 3.8.2 Michael Büsch
2013-03-12 15:45 ` Eric Dumazet
     [not found]   ` <20130312180942.4198e88e@milhouse>
2013-03-13  5:57     ` Eric Dumazet
2013-03-14 14:31       ` Eric Dumazet
2013-03-14 22:17         ` Michael Büsch
2013-03-14 23:06           ` Eric Dumazet
2013-03-15 19:44             ` Michael Büsch
2013-03-22 11:28               ` Michael Büsch
2013-05-27 16:43                 ` Michael Büsch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.