All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: BQL support in gianfar causes network hiccup
@ 2013-09-02 13:21 Per Dalén
  2013-09-02 13:53 ` Claudiu Manoil
  0 siblings, 1 reply; 10+ messages in thread
From: Per Dalén @ 2013-09-02 13:21 UTC (permalink / raw)
  To: netdev

 > On Mo, 2013-04-29 at 15:20 +0200, Tino Keitel wrote:
 > > On Mo, 2013-04-29 at 15:14 +0200, Claudiu Manoil wrote:
 > >
 > > > The proposed fix (passes this test):
 > > > http://patchwork.ozlabs.org/patch/240365/
 > > >
 > > > Does it work for you?

Hi,

This patch do not work for me. I have the same problems that Tino on my 
card. My card is a Freescale P2020 and I use "iperf -c 10.10.51.36 -n 
100M -P 50" to trigger the error.
The network interface is set to 100M half duplex.
I have tested this using 3.4.35 and 3.4.60 kernel.

NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:256
Modules linked in:
NIP: c039bb94 LR: c039bb94 CTR: c02ee500
REGS: dffe7e60 TRAP: 0700   Not tainted  (3.4.60+)
MSR: 00029000 <CE,EE,ME>  CR: 40000084  XER: 20000000
TASK = df843200[0] 'swapper/1' THREAD: df864000 CPU: 1
GPR00: c039bb94 dffe7f10 df843200 00000046 00021000 ffffffff c02eecf0 
00029000
GPR08: 00000001 00008000 00463000 00003fff 80000028 00000000 00000000 
00000004
GPR16: 00000001 df858e14 c051f1f4 00000001 df858c14 df858a14 df858814 
ffffffff
GPR24: 00000001 df813234 00000004 df8f9a00 c05e0000 c05e0000 df813000 
00000000
NIP [c039bb94] dev_watchdog+0x2d4/0x2e4
LR [c039bb94] dev_watchdog+0x2d4/0x2e4
Call Trace:
[dffe7f10] [c039bb94] dev_watchdog+0x2d4/0x2e4 (unreliable)
[dffe7f40] [c003e944] run_timer_softirq+0x11c/0x20c
[dffe7fa0] [c00377dc] __do_softirq+0xe4/0x16c
[dffe7ff0] [c000c408] call_do_softirq+0x14/0x24
[df865e80] [c00043c4] do_softirq+0xb4/0xe4
[df865ea0] [c0037b4c] irq_exit+0xb0/0xcc
[df865eb0] [c0008958] timer_interrupt+0x188/0x1a4
[df865ee0] [c000db68] ret_from_except+0x0/0x18
--- Exception: 901 at cpu_idle+0x90/0xf4
     LR = cpu_idle+0x90/0xf4
[df865fa0] [c0007b30] cpu_idle+0x5c/0xf4 (unreliable)
[df865fc0] [c049d68c] start_secondary+0x2b0/0x2b4
[df865ff0] [c0001cf8] __secondary_start+0x30/0x84
Instruction dump:
4e800421 80fe0244 4bffff40 7fc3f378 4bfeab6d 7fc4f378 7c651b78 3c60c055
7fe6fb78 38633108 4cc63182 480fc889 <0fe00000> 39200001 993cea1a 4bffffb4
---[ end trace 4be011773b23dae3 ]---


When I revert "gianfar: Add support for byte queue limits.", commit 
d8a0f1b0af67679bba886784de10d8c21acc4e0e, the network test work fine 
without any "hiccup".
If the network interface is set to 1000M full duplex there's also no 
"hiccup".

 >
 > Hi,
 >
 > I tested the patch over the weekend and it looks fine. I then re-checked
 > without the patch and got the bug immediately. So the patch seems to
 > work fine.
 >

Thanks,
Per Dalén

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-02 13:21 BQL support in gianfar causes network hiccup Per Dalén
@ 2013-09-02 13:53 ` Claudiu Manoil
  2013-09-02 14:37   ` Per Dalén
  0 siblings, 1 reply; 10+ messages in thread
From: Claudiu Manoil @ 2013-09-02 13:53 UTC (permalink / raw)
  To: Per Dalén; +Cc: netdev

On 9/2/2013 4:21 PM, Per Dalén wrote:
>  > On Mo, 2013-04-29 at 15:20 +0200, Tino Keitel wrote:
>  > > On Mo, 2013-04-29 at 15:14 +0200, Claudiu Manoil wrote:
>  > >
>  > > > The proposed fix (passes this test):
>  > > > http://patchwork.ozlabs.org/patch/240365/
>  > > >
>  > > > Does it work for you?
>
> Hi,
>
> This patch do not work for me. I have the same problems that Tino on my
> card. My card is a Freescale P2020 and I use "iperf -c 10.10.51.36 -n
> 100M -P 50" to trigger the error.
> The network interface is set to 100M half duplex.
> I have tested this using 3.4.35 and 3.4.60 kernel.
>
> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:256
> Modules linked in:
> NIP: c039bb94 LR: c039bb94 CTR: c02ee500
> REGS: dffe7e60 TRAP: 0700   Not tainted  (3.4.60+)
> MSR: 00029000 <CE,EE,ME>  CR: 40000084  XER: 20000000
> TASK = df843200[0] 'swapper/1' THREAD: df864000 CPU: 1
> GPR00: c039bb94 dffe7f10 df843200 00000046 00021000 ffffffff c02eecf0
> 00029000
> GPR08: 00000001 00008000 00463000 00003fff 80000028 00000000 00000000
> 00000004
> GPR16: 00000001 df858e14 c051f1f4 00000001 df858c14 df858a14 df858814
> ffffffff
> GPR24: 00000001 df813234 00000004 df8f9a00 c05e0000 c05e0000 df813000
> 00000000
> NIP [c039bb94] dev_watchdog+0x2d4/0x2e4
> LR [c039bb94] dev_watchdog+0x2d4/0x2e4
> Call Trace:
> [dffe7f10] [c039bb94] dev_watchdog+0x2d4/0x2e4 (unreliable)
> [dffe7f40] [c003e944] run_timer_softirq+0x11c/0x20c
> [dffe7fa0] [c00377dc] __do_softirq+0xe4/0x16c
> [dffe7ff0] [c000c408] call_do_softirq+0x14/0x24
> [df865e80] [c00043c4] do_softirq+0xb4/0xe4
> [df865ea0] [c0037b4c] irq_exit+0xb0/0xcc
> [df865eb0] [c0008958] timer_interrupt+0x188/0x1a4
> [df865ee0] [c000db68] ret_from_except+0x0/0x18
> --- Exception: 901 at cpu_idle+0x90/0xf4
>      LR = cpu_idle+0x90/0xf4
> [df865fa0] [c0007b30] cpu_idle+0x5c/0xf4 (unreliable)
> [df865fc0] [c049d68c] start_secondary+0x2b0/0x2b4
> [df865ff0] [c0001cf8] __secondary_start+0x30/0x84
> Instruction dump:
> 4e800421 80fe0244 4bffff40 7fc3f378 4bfeab6d 7fc4f378 7c651b78 3c60c055
> 7fe6fb78 38633108 4cc63182 480fc889 <0fe00000> 39200001 993cea1a 4bffffb4
> ---[ end trace 4be011773b23dae3 ]---
>
>
> When I revert "gianfar: Add support for byte queue limits.", commit
> d8a0f1b0af67679bba886784de10d8c21acc4e0e, the network test work fine
> without any "hiccup".
> If the network interface is set to 1000M full duplex there's also no
> "hiccup".
>

Hi,

That patch wasn't approved (for good reasons).
The proposed fix is currently under review:
http://patchwork.ozlabs.org/patch/271242/
"gianfar: Fix reported number of sent bytes to BQL"

Does this one work for you? You might need to pull
one recent gianfar clean-up patch from net-next in order
to apply this one without incidents.

Thanks,
Claudiu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-02 13:53 ` Claudiu Manoil
@ 2013-09-02 14:37   ` Per Dalén
  2013-09-02 15:35     ` Claudiu Manoil
  0 siblings, 1 reply; 10+ messages in thread
From: Per Dalén @ 2013-09-02 14:37 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: netdev

On 09/02/2013 03:53 PM, Claudiu Manoil wrote:
> On 9/2/2013 4:21 PM, Per Dalén wrote:
>>  > On Mo, 2013-04-29 at 15:20 +0200, Tino Keitel wrote:
>>  > > On Mo, 2013-04-29 at 15:14 +0200, Claudiu Manoil wrote:
>>  > >
>>  > > > The proposed fix (passes this test):
>>  > > > http://patchwork.ozlabs.org/patch/240365/
>>  > > >
>>  > > > Does it work for you?
>>
>> Hi,
>>
>> This patch do not work for me. I have the same problems that Tino on my
>> card. My card is a Freescale P2020 and I use "iperf -c 10.10.51.36 -n
>> 100M -P 50" to trigger the error.
>> The network interface is set to 100M half duplex.
>> I have tested this using 3.4.35 and 3.4.60 kernel.
>>
>> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
>> ------------[ cut here ]------------
>> WARNING: at net/sched/sch_generic.c:256
>> Modules linked in:
>> NIP: c039bb94 LR: c039bb94 CTR: c02ee500
>> REGS: dffe7e60 TRAP: 0700   Not tainted  (3.4.60+)
>> MSR: 00029000 <CE,EE,ME>  CR: 40000084  XER: 20000000
>> TASK = df843200[0] 'swapper/1' THREAD: df864000 CPU: 1
>> GPR00: c039bb94 dffe7f10 df843200 00000046 00021000 ffffffff c02eecf0
>> 00029000
>> GPR08: 00000001 00008000 00463000 00003fff 80000028 00000000 00000000
>> 00000004
>> GPR16: 00000001 df858e14 c051f1f4 00000001 df858c14 df858a14 df858814
>> ffffffff
>> GPR24: 00000001 df813234 00000004 df8f9a00 c05e0000 c05e0000 df813000
>> 00000000
>> NIP [c039bb94] dev_watchdog+0x2d4/0x2e4
>> LR [c039bb94] dev_watchdog+0x2d4/0x2e4
>> Call Trace:
>> [dffe7f10] [c039bb94] dev_watchdog+0x2d4/0x2e4 (unreliable)
>> [dffe7f40] [c003e944] run_timer_softirq+0x11c/0x20c
>> [dffe7fa0] [c00377dc] __do_softirq+0xe4/0x16c
>> [dffe7ff0] [c000c408] call_do_softirq+0x14/0x24
>> [df865e80] [c00043c4] do_softirq+0xb4/0xe4
>> [df865ea0] [c0037b4c] irq_exit+0xb0/0xcc
>> [df865eb0] [c0008958] timer_interrupt+0x188/0x1a4
>> [df865ee0] [c000db68] ret_from_except+0x0/0x18
>> --- Exception: 901 at cpu_idle+0x90/0xf4
>>      LR = cpu_idle+0x90/0xf4
>> [df865fa0] [c0007b30] cpu_idle+0x5c/0xf4 (unreliable)
>> [df865fc0] [c049d68c] start_secondary+0x2b0/0x2b4
>> [df865ff0] [c0001cf8] __secondary_start+0x30/0x84
>> Instruction dump:
>> 4e800421 80fe0244 4bffff40 7fc3f378 4bfeab6d 7fc4f378 7c651b78 3c60c055
>> 7fe6fb78 38633108 4cc63182 480fc889 <0fe00000> 39200001 993cea1a 4bffffb4
>> ---[ end trace 4be011773b23dae3 ]---
>>
>>
>> When I revert "gianfar: Add support for byte queue limits.", commit
>> d8a0f1b0af67679bba886784de10d8c21acc4e0e, the network test work fine
>> without any "hiccup".
>> If the network interface is set to 1000M full duplex there's also no
>> "hiccup".
>>
>
> Hi,
>
> That patch wasn't approved (for good reasons).
> The proposed fix is currently under review:
> http://patchwork.ozlabs.org/patch/271242/
> "gianfar: Fix reported number of sent bytes to BQL"
>
> Does this one work for you? You might need to pull
> one recent gianfar clean-up patch from net-next in order
> to apply this one without incidents.
>

No, still the same error:

NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:256
Modules linked in:
NIP: c039ce0c LR: c039ce0c CTR: c02ef578
...
4e800421 80fe0244 4bffff40 7fc3f378 4bfea9d1 7fc4f378 7c651b78 3c60c055
7fe6fb78 38635304 4cc63182 480fcddd <0fe00000> 39200001 993c0a3c 4bffffb4
---[ end trace 5f5e1e3c30024010 ]---


> Thanks,
> Claudiu
>
>
>

Thanks,
Per

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-02 14:37   ` Per Dalén
@ 2013-09-02 15:35     ` Claudiu Manoil
  2013-09-02 16:50       ` Per Dalén
  0 siblings, 1 reply; 10+ messages in thread
From: Claudiu Manoil @ 2013-09-02 15:35 UTC (permalink / raw)
  To: Per Dalén; +Cc: netdev

On 9/2/2013 5:37 PM, Per Dalén wrote:
>> The proposed fix is currently under review:
>> http://patchwork.ozlabs.org/patch/271242/
>> "gianfar: Fix reported number of sent bytes to BQL"
>>
>> Does this one work for you? You might need to pull
>> one recent gianfar clean-up patch from net-next in order
>> to apply this one without incidents.
>>
>
> No, still the same error:
>
> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:256
> Modules linked in:
> NIP: c039ce0c LR: c039ce0c CTR: c02ef578
> ...
> 4e800421 80fe0244 4bffff40 7fc3f378 4bfea9d1 7fc4f378 7c651b78 3c60c055
> 7fe6fb78 38635304 4cc63182 480fcddd <0fe00000> 39200001 993c0a3c 4bffffb4
> ---[ end trace 5f5e1e3c30024010 ]---
>
>

Tried to reproduce the issue with a recent net-next kernel (Linux
p2020rdb 3.11.0-rc6) + BQL fix patch (http://patchwork.ozlabs.org/patc
/271242/), but the iperf test finished without incidents (see log
below).
Will try if the problem is apparent without the fix patch, on the same
net-next kernel (3.11.0-rc6).

Claudiu
--
root@p2020rdb:~# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:04:9f:01:1e:64
           inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
           inet6 addr: fe80::204:9fff:fe01:1e64/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:1744045 errors:0 dropped:0 overruns:0 frame:0
           TX packets:7242116 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:143012380 (136.3 MiB)  TX bytes:2373808232 (2.2 GiB)
           Base address:0x6000

root@p2020rdb:~# ethtool eth2
Settings for eth2:
         Supported ports: [ MII ]
         Supported link modes:   10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Full
         Supports auto-negotiation: Yes
         Advertised link modes:  10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Full
         Advertised pause frame use: No
         Advertised auto-negotiation: No
         Speed: 100Mb/s
         Duplex: Half
         Port: MII
         PHYAD: 1
         Transceiver: external
         Auto-negotiation: off
         Supports Wake-on: g
         Wake-on: d
         Current message level: 0x0000003f (63)
                                drv probe link timer ifdown ifup
         Link detected: yes
root@p2020rdb:~# time iperf -c 192.168.1.4 -n 100M -P 50
------------------------------------------------------------
Client connecting to 192.168.1.4, TCP port 5001
TCP window size: 20.7 KByte (default)
------------------------------------------------------------
[ 52] local 192.168.1.100 port 54653 connected with 192.168.1.4 port 5001
[  3] local 192.168.1.100 port 54604 connected with 192.168.1.4 port 5001
[  4] local 192.168.1.100 port 54605 connected with 192.168.1.4 port 5001
[  7] local 192.168.1.100 port 54608 connected with 192.168.1.4 port 5001
[  5] local 192.168.1.100 port 54606 connected with 192.168.1.4 port 5001
[  8] local 192.168.1.100 port 54609 connected with 192.168.1.4 port 5001
[ 10] local 192.168.1.100 port 54610 connected with 192.168.1.4 port 5001
[ 11] local 192.168.1.100 port 54611 connected with 192.168.1.4 port 5001
[  9] local 192.168.1.100 port 54612 connected with 192.168.1.4 port 5001
[ 13] local 192.168.1.100 port 54614 connected with 192.168.1.4 port 5001
[ 12] local 192.168.1.100 port 54613 connected with 192.168.1.4 port 5001
[ 14] local 192.168.1.100 port 54615 connected with 192.168.1.4 port 5001
[ 15] local 192.168.1.100 port 54616 connected with 192.168.1.4 port 5001
[ 16] local 192.168.1.100 port 54617 connected with 192.168.1.4 port 5001
[ 17] local 192.168.1.100 port 54618 connected with 192.168.1.4 port 5001
[ 18] local 192.168.1.100 port 54619 connected with 192.168.1.4 port 5001
[ 20] local 192.168.1.100 port 54621 connected with 192.168.1.4 port 5001
[ 19] local 192.168.1.100 port 54620 connected with 192.168.1.4 port 5001
[  6] local 192.168.1.100 port 54607 connected with 192.168.1.4 port 5001
[ 22] local 192.168.1.100 port 54623 connected with 192.168.1.4 port 5001
[ 21] local 192.168.1.100 port 54622 connected with 192.168.1.4 port 5001
[ 24] local 192.168.1.100 port 54624 connected with 192.168.1.4 port 5001
[ 26] local 192.168.1.100 port 54626 connected with 192.168.1.4 port 5001
[ 28] local 192.168.1.100 port 54628 connected with 192.168.1.4 port 5001
[ 25] local 192.168.1.100 port 54625 connected with 192.168.1.4 port 5001
[ 30] local 192.168.1.100 port 54631 connected with 192.168.1.4 port 5001
[ 27] local 192.168.1.100 port 54627 connected with 192.168.1.4 port 5001
[ 23] local 192.168.1.100 port 54629 connected with 192.168.1.4 port 5001
[ 31] local 192.168.1.100 port 54632 connected with 192.168.1.4 port 5001
[ 32] local 192.168.1.100 port 54633 connected with 192.168.1.4 port 5001
[ 33] local 192.168.1.100 port 54634 connected with 192.168.1.4 port 5001
[ 35] local 192.168.1.100 port 54636 connected with 192.168.1.4 port 5001
[ 34] local 192.168.1.100 port 54635 connected with 192.168.1.4 port 5001
[ 36] local 192.168.1.100 port 54637 connected with 192.168.1.4 port 5001
[ 29] local 192.168.1.100 port 54630 connected with 192.168.1.4 port 5001
[ 40] local 192.168.1.100 port 54640 connected with 192.168.1.4 port 5001
[ 38] local 192.168.1.100 port 54639 connected with 192.168.1.4 port 5001
[ 37] local 192.168.1.100 port 54638 connected with 192.168.1.4 port 5001
[ 39] local 192.168.1.100 port 54642 connected with 192.168.1.4 port 5001
[ 41] local 192.168.1.100 port 54641 connected with 192.168.1.4 port 5001
[ 42] local 192.168.1.100 port 54643 connected with 192.168.1.4 port 5001
[ 43] local 192.168.1.100 port 54644 connected with 192.168.1.4 port 5001
[ 45] local 192.168.1.100 port 54645 connected with 192.168.1.4 port 5001
[ 44] local 192.168.1.100 port 54648 connected with 192.168.1.4 port 5001
[ 46] local 192.168.1.100 port 54646 connected with 192.168.1.4 port 5001
[ 47] local 192.168.1.100 port 54647 connected with 192.168.1.4 port 5001
[ 48] local 192.168.1.100 port 54649 connected with 192.168.1.4 port 5001
[ 49] local 192.168.1.100 port 54650 connected with 192.168.1.4 port 5001
[ 50] local 192.168.1.100 port 54651 connected with 192.168.1.4 port 5001
[ 51] local 192.168.1.100 port 54652 connected with 192.168.1.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[ 27]  0.0-337.3 sec    100 MBytes  2.49 Mbits/sec
[ 28]  0.0-340.6 sec    100 MBytes  2.46 Mbits/sec
[ 51]  0.0-383.1 sec    100 MBytes  2.19 Mbits/sec
[ 47]  0.0-384.5 sec    100 MBytes  2.18 Mbits/sec
[ 11]  0.0-386.1 sec    100 MBytes  2.17 Mbits/sec
[ 24]  0.0-388.4 sec    100 MBytes  2.16 Mbits/sec
[ 38]  0.0-397.5 sec    100 MBytes  2.11 Mbits/sec
[ 32]  0.0-402.9 sec    100 MBytes  2.08 Mbits/sec
[ 39]  0.0-412.0 sec    100 MBytes  2.04 Mbits/sec
[  6]  0.0-414.5 sec    100 MBytes  2.02 Mbits/sec
[ 50]  0.0-416.3 sec    100 MBytes  2.02 Mbits/sec
[ 31]  0.0-421.3 sec    100 MBytes  1.99 Mbits/sec
[ 25]  0.0-424.8 sec    100 MBytes  1.97 Mbits/sec
[ 17]  0.0-426.4 sec    100 MBytes  1.97 Mbits/sec
[ 52]  0.0-427.7 sec    100 MBytes  1.96 Mbits/sec
[ 20]  0.0-436.2 sec    100 MBytes  1.92 Mbits/sec
[ 29]  0.0-437.5 sec    100 MBytes  1.92 Mbits/sec
[ 33]  0.0-438.6 sec    100 MBytes  1.91 Mbits/sec
[ 37]  0.0-438.6 sec    100 MBytes  1.91 Mbits/sec
[ 40]  0.0-438.7 sec    100 MBytes  1.91 Mbits/sec
[ 14]  0.0-440.1 sec    100 MBytes  1.91 Mbits/sec
[ 15]  0.0-441.5 sec    100 MBytes  1.90 Mbits/sec
[ 22]  0.0-441.7 sec    100 MBytes  1.90 Mbits/sec
[ 35]  0.0-442.7 sec    100 MBytes  1.89 Mbits/sec
[ 48]  0.0-443.2 sec    100 MBytes  1.89 Mbits/sec
[ 26]  0.0-444.8 sec    100 MBytes  1.89 Mbits/sec
[ 16]  0.0-447.0 sec    100 MBytes  1.88 Mbits/sec
[ 49]  0.0-447.6 sec    100 MBytes  1.87 Mbits/sec
[ 12]  0.0-449.5 sec    100 MBytes  1.87 Mbits/sec
[ 10]  0.0-450.3 sec    100 MBytes  1.86 Mbits/sec
[ 46]  0.0-451.0 sec    100 MBytes  1.86 Mbits/sec
[ 18]  0.0-452.9 sec    100 MBytes  1.85 Mbits/sec
[ 19]  0.0-454.2 sec    100 MBytes  1.85 Mbits/sec
[ 30]  0.0-454.5 sec    100 MBytes  1.85 Mbits/sec
[ 21]  0.0-456.2 sec    100 MBytes  1.84 Mbits/sec
[  7]  0.0-456.4 sec    100 MBytes  1.84 Mbits/sec
[ 44]  0.0-456.9 sec    100 MBytes  1.84 Mbits/sec
[ 42]  0.0-458.9 sec    100 MBytes  1.83 Mbits/sec
[ 45]  0.0-458.9 sec    100 MBytes  1.83 Mbits/sec
[  4]  0.0-459.3 sec    100 MBytes  1.83 Mbits/sec
[  3]  0.0-461.5 sec    100 MBytes  1.82 Mbits/sec
[ 36]  0.0-462.2 sec    100 MBytes  1.81 Mbits/sec
[ 13]  0.0-462.7 sec    100 MBytes  1.81 Mbits/sec
[ 34]  0.0-463.0 sec    100 MBytes  1.81 Mbits/sec
[ 43]  0.0-463.1 sec    100 MBytes  1.81 Mbits/sec
[  9]  0.0-463.5 sec    100 MBytes  1.81 Mbits/sec
[  8]  0.0-463.9 sec    100 MBytes  1.81 Mbits/sec
[  5]  0.0-464.2 sec    100 MBytes  1.81 Mbits/sec
[ 41]  0.0-465.1 sec    100 MBytes  1.80 Mbits/sec
[ 23]  0.0-465.3 sec    100 MBytes  1.80 Mbits/sec
[SUM]  0.0-465.3 sec  4.88 GBytes  90.2 Mbits/sec

real    7m45.296s
user    0m0.780s
sys     0m9.636s

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-02 15:35     ` Claudiu Manoil
@ 2013-09-02 16:50       ` Per Dalén
  2013-09-03  7:06         ` Claudiu Manoil
  0 siblings, 1 reply; 10+ messages in thread
From: Per Dalén @ 2013-09-02 16:50 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: netdev

On 09/02/2013 05:35 PM, Claudiu Manoil wrote:
> On 9/2/2013 5:37 PM, Per Dalén wrote:
>>> The proposed fix is currently under review:
>>> http://patchwork.ozlabs.org/patch/271242/
>>> "gianfar: Fix reported number of sent bytes to BQL"
>>>
>>> Does this one work for you? You might need to pull
>>> one recent gianfar clean-up patch from net-next in order
>>> to apply this one without incidents.
>>>
>>
>> No, still the same error:
>>
>> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
>> ------------[ cut here ]------------
>> WARNING: at net/sched/sch_generic.c:256
>> Modules linked in:
>> NIP: c039ce0c LR: c039ce0c CTR: c02ef578
>> ...
>> 4e800421 80fe0244 4bffff40 7fc3f378 4bfea9d1 7fc4f378 7c651b78 3c60c055
>> 7fe6fb78 38635304 4cc63182 480fcddd <0fe00000> 39200001 993c0a3c 4bffffb4
>> ---[ end trace 5f5e1e3c30024010 ]---
>>
>>
>
> Tried to reproduce the issue with a recent net-next kernel (Linux
> p2020rdb 3.11.0-rc6) + BQL fix patch (http://patchwork.ozlabs.org/patc
> /271242/), but the iperf test finished without incidents (see log
> below).
> Will try if the problem is apparent without the fix patch, on the same
> net-next kernel (3.11.0-rc6).
>

I was able to reproduce it on our card and the P2020RDB using 3.11.0-rc7 
+ David Miller's -next networking tree and your patch 
(http://patchwork.ozlabs.org/patch/271242/).

root@p2020rdb:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35530
[  5] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35531
[  6] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35532
...
[ 51] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35578
[ 52] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35576
[ 53] local 10.10.51.36 port 5001 connected with 10.10.49.116 port 35579

NETDEV WATCHDOG: eth2 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:264
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc7+ #19
task: ef84f340 ti: ef86a000 task.ti: ef86a000
NIP: c03e9a88 LR: c03e9a88 CTR: c031658c
REGS: ef86bcd0 TRAP: 0700   Not tainted  (3.11.0-rc7+)
MSR: 00029000 <CE,EE,ME>  CR: 22000044  XER: 20000000

GPR00: c03e9a88 ef86bd80 ef84f340 0000003f c0e9e3fc c0e9e9d0 c0316d40 
00021000
GPR08: 00000007 00000800 00000000 000000ee 000000ee 00000000 00000101 
00000004
GPR16: 00000001 00200040 ffff00ee c0657040 00000000 ef880e18 ef880c18 
ffffffff
GPR24: 00000001 ef98c1f4 00000004 efaf31c0 c0650000 c0650000 ef98c000 
00000000
NIP [c03e9a88] dev_watchdog+0x2d4/0x2e4
LR [c03e9a88] dev_watchdog+0x2d4/0x2e4
Call Trace:
[ef86bd80] [c03e9a88] dev_watchdog+0x2d4/0x2e4 (unreliable)
[ef86bdb0] [c0042cf8] call_timer_fn.isra.25+0x28/0x84
[ef86bdd0] [c0042ee4] run_timer_softirq+0x190/0x208
[ef86be20] [c003b1b8] __do_softirq+0xf4/0x1bc
[ef86be70] [c003b430] irq_exit+0xbc/0xc8
[ef86be80] [c0009490] timer_interrupt+0x1b4/0x1ec
[ef86beb0] [c000ead0] ret_from_except+0x0/0x18
--- Exception: 901 at arch_cpu_idle+0x24/0x5c
     LR = arch_cpu_idle+0x24/0x5c
[ef86bf70] [c009bd9c] rcu_idle_enter+0xa4/0xe4 (unreliable)
[ef86bf80] [c007744c] cpu_startup_entry+0xc4/0x15c
[ef86bfb0] [c000fdd8] start_secondary+0x2f0/0x310
[ef86bff0] [c0001ff8] __secondary_start+0x30/0x84
Instruction dump:
4e800421 80fe0208 4bffff40 7fc3f378 4bfe7409 7fc4f378 7c651b78 3c60c05c
7fe6fb78 3863e0b4 4cc63182 4810da7d <0fe00000> 39200001 993c1341 4bffffb4
---[ end trace 9e08f5e256de5040 ]---

root@p2020rdb:~# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:11:22:33:44:aa
           inet addr:10.10.51.36  Bcast:10.10.51.255  Mask:255.255.252.0
           inet6 addr: fe80::211:22ff:fe33:44aa/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:336843 errors:0 dropped:0 overruns:0 frame:0
           TX packets:209019 errors:126 dropped:0 overruns:0 carrier:125
           collisions:0 txqueuelen:1000
           RX bytes:505655782 (482.2 MiB)  TX bytes:15060182 (14.3 MiB)
           Base address:0xc000

root@p2020rdb:~# ethtool eth2
Settings for eth2:
         Supported ports: [ MII ]
         Supported link modes:   10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Full
         Supported pause frame use: No
         Supports auto-negotiation: Yes
         Advertised link modes:  10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Full
         Advertised pause frame use: No
         Advertised auto-negotiation: Yes
         Speed: 100Mb/s
         Duplex: Half
         Port: MII
         PHYAD: 1
         Transceiver: external
         Auto-negotiation: on
         Supports Wake-on: g
         Wake-on: d
         Current message level: 0x0000003f (63)
                                drv probe link timer ifdown ifup
         Link detected: yes
root@p2020rdb:~# dmesg
...
NETDEV WATCHDOG: eth2 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:264
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc7+ #19
task: ef84f340 ti: ef86a000 task.ti: ef86a000
NIP: c03e9a88 LR: c03e9a88 CTR: c031658c
REGS: ef86bcd0 TRAP: 0700   Not tainted  (3.11.0-rc7+)
MSR: 00029000 <CE,EE,ME>  CR: 22000044  XER: 20000000

GPR00: c03e9a88 ef86bd80 ef84f340 0000003f c0e9e3fc c0e9e9d0 c0316d40 
00021000
GPR08: 00000007 00000800 00000000 000000ee 000000ee 00000000 00000101 
00000004
GPR16: 00000001 00200040 ffff00ee c0657040 00000000 ef880e18 ef880c18 
ffffffff
GPR24: 00000001 ef98c1f4 00000004 efaf31c0 c0650000 c0650000 ef98c000 
00000000
NIP [c03e9a88] dev_watchdog+0x2d4/0x2e4
LR [c03e9a88] dev_watchdog+0x2d4/0x2e4
Call Trace:
[ef86bd80] [c03e9a88] dev_watchdog+0x2d4/0x2e4 (unreliable)
[ef86bdb0] [c0042cf8] call_timer_fn.isra.25+0x28/0x84
[ef86bdd0] [c0042ee4] run_timer_softirq+0x190/0x208
[ef86be20] [c003b1b8] __do_softirq+0xf4/0x1bc
[ef86be70] [c003b430] irq_exit+0xbc/0xc8
[ef86be80] [c0009490] timer_interrupt+0x1b4/0x1ec
[ef86beb0] [c000ead0] ret_from_except+0x0/0x18
--- Exception: 901 at arch_cpu_idle+0x24/0x5c
     LR = arch_cpu_idle+0x24/0x5c
[ef86bf70] [c009bd9c] rcu_idle_enter+0xa4/0xe4 (unreliable)
[ef86bf80] [c007744c] cpu_startup_entry+0xc4/0x15c
[ef86bfb0] [c000fdd8] start_secondary+0x2f0/0x310
[ef86bff0] [c0001ff8] __secondary_start+0x30/0x84
Instruction dump:
4e800421 80fe0208 4bffff40 7fc3f378 4bfe7409 7fc4f378 7c651b78 3c60c05c
7fe6fb78 3863e0b4 4cc63182 4810da7d <0fe00000> 39200001 993c1341 4bffffb4
---[ end trace 9e08f5e256de5040 ]---


Thanks,
Per

> Claudiu
> --
> root@p2020rdb:~# ifconfig eth2
> eth2      Link encap:Ethernet  HWaddr 00:04:9f:01:1e:64
>            inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
>            inet6 addr: fe80::204:9fff:fe01:1e64/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:1744045 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:7242116 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:143012380 (136.3 MiB)  TX bytes:2373808232 (2.2 GiB)
>            Base address:0x6000
>
> root@p2020rdb:~# ethtool eth2
> Settings for eth2:
>          Supported ports: [ MII ]
>          Supported link modes:   10baseT/Half 10baseT/Full
>                                  100baseT/Half 100baseT/Full
>                                  1000baseT/Full
>          Supports auto-negotiation: Yes
>          Advertised link modes:  10baseT/Half 10baseT/Full
>                                  100baseT/Half 100baseT/Full
>                                  1000baseT/Full
>          Advertised pause frame use: No
>          Advertised auto-negotiation: No
>          Speed: 100Mb/s
>          Duplex: Half
>          Port: MII
>          PHYAD: 1
>          Transceiver: external
>          Auto-negotiation: off
>          Supports Wake-on: g
>          Wake-on: d
>          Current message level: 0x0000003f (63)
>                                 drv probe link timer ifdown ifup
>          Link detected: yes
> root@p2020rdb:~# time iperf -c 192.168.1.4 -n 100M -P 50
> ------------------------------------------------------------
> Client connecting to 192.168.1.4, TCP port 5001
> TCP window size: 20.7 KByte (default)
> ------------------------------------------------------------
> [ 52] local 192.168.1.100 port 54653 connected with 192.168.1.4 port 5001
> [  3] local 192.168.1.100 port 54604 connected with 192.168.1.4 port 5001
> [  4] local 192.168.1.100 port 54605 connected with 192.168.1.4 port 5001
> [  7] local 192.168.1.100 port 54608 connected with 192.168.1.4 port 5001
> [  5] local 192.168.1.100 port 54606 connected with 192.168.1.4 port 5001
> [  8] local 192.168.1.100 port 54609 connected with 192.168.1.4 port 5001
> [ 10] local 192.168.1.100 port 54610 connected with 192.168.1.4 port 5001
> [ 11] local 192.168.1.100 port 54611 connected with 192.168.1.4 port 5001
> [  9] local 192.168.1.100 port 54612 connected with 192.168.1.4 port 5001
> [ 13] local 192.168.1.100 port 54614 connected with 192.168.1.4 port 5001
> [ 12] local 192.168.1.100 port 54613 connected with 192.168.1.4 port 5001
> [ 14] local 192.168.1.100 port 54615 connected with 192.168.1.4 port 5001
> [ 15] local 192.168.1.100 port 54616 connected with 192.168.1.4 port 5001
> [ 16] local 192.168.1.100 port 54617 connected with 192.168.1.4 port 5001
> [ 17] local 192.168.1.100 port 54618 connected with 192.168.1.4 port 5001
> [ 18] local 192.168.1.100 port 54619 connected with 192.168.1.4 port 5001
> [ 20] local 192.168.1.100 port 54621 connected with 192.168.1.4 port 5001
> [ 19] local 192.168.1.100 port 54620 connected with 192.168.1.4 port 5001
> [  6] local 192.168.1.100 port 54607 connected with 192.168.1.4 port 5001
> [ 22] local 192.168.1.100 port 54623 connected with 192.168.1.4 port 5001
> [ 21] local 192.168.1.100 port 54622 connected with 192.168.1.4 port 5001
> [ 24] local 192.168.1.100 port 54624 connected with 192.168.1.4 port 5001
> [ 26] local 192.168.1.100 port 54626 connected with 192.168.1.4 port 5001
> [ 28] local 192.168.1.100 port 54628 connected with 192.168.1.4 port 5001
> [ 25] local 192.168.1.100 port 54625 connected with 192.168.1.4 port 5001
> [ 30] local 192.168.1.100 port 54631 connected with 192.168.1.4 port 5001
> [ 27] local 192.168.1.100 port 54627 connected with 192.168.1.4 port 5001
> [ 23] local 192.168.1.100 port 54629 connected with 192.168.1.4 port 5001
> [ 31] local 192.168.1.100 port 54632 connected with 192.168.1.4 port 5001
> [ 32] local 192.168.1.100 port 54633 connected with 192.168.1.4 port 5001
> [ 33] local 192.168.1.100 port 54634 connected with 192.168.1.4 port 5001
> [ 35] local 192.168.1.100 port 54636 connected with 192.168.1.4 port 5001
> [ 34] local 192.168.1.100 port 54635 connected with 192.168.1.4 port 5001
> [ 36] local 192.168.1.100 port 54637 connected with 192.168.1.4 port 5001
> [ 29] local 192.168.1.100 port 54630 connected with 192.168.1.4 port 5001
> [ 40] local 192.168.1.100 port 54640 connected with 192.168.1.4 port 5001
> [ 38] local 192.168.1.100 port 54639 connected with 192.168.1.4 port 5001
> [ 37] local 192.168.1.100 port 54638 connected with 192.168.1.4 port 5001
> [ 39] local 192.168.1.100 port 54642 connected with 192.168.1.4 port 5001
> [ 41] local 192.168.1.100 port 54641 connected with 192.168.1.4 port 5001
> [ 42] local 192.168.1.100 port 54643 connected with 192.168.1.4 port 5001
> [ 43] local 192.168.1.100 port 54644 connected with 192.168.1.4 port 5001
> [ 45] local 192.168.1.100 port 54645 connected with 192.168.1.4 port 5001
> [ 44] local 192.168.1.100 port 54648 connected with 192.168.1.4 port 5001
> [ 46] local 192.168.1.100 port 54646 connected with 192.168.1.4 port 5001
> [ 47] local 192.168.1.100 port 54647 connected with 192.168.1.4 port 5001
> [ 48] local 192.168.1.100 port 54649 connected with 192.168.1.4 port 5001
> [ 49] local 192.168.1.100 port 54650 connected with 192.168.1.4 port 5001
> [ 50] local 192.168.1.100 port 54651 connected with 192.168.1.4 port 5001
> [ 51] local 192.168.1.100 port 54652 connected with 192.168.1.4 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [ 27]  0.0-337.3 sec    100 MBytes  2.49 Mbits/sec
> [ 28]  0.0-340.6 sec    100 MBytes  2.46 Mbits/sec
> [ 51]  0.0-383.1 sec    100 MBytes  2.19 Mbits/sec
> [ 47]  0.0-384.5 sec    100 MBytes  2.18 Mbits/sec
> [ 11]  0.0-386.1 sec    100 MBytes  2.17 Mbits/sec
> [ 24]  0.0-388.4 sec    100 MBytes  2.16 Mbits/sec
> [ 38]  0.0-397.5 sec    100 MBytes  2.11 Mbits/sec
> [ 32]  0.0-402.9 sec    100 MBytes  2.08 Mbits/sec
> [ 39]  0.0-412.0 sec    100 MBytes  2.04 Mbits/sec
> [  6]  0.0-414.5 sec    100 MBytes  2.02 Mbits/sec
> [ 50]  0.0-416.3 sec    100 MBytes  2.02 Mbits/sec
> [ 31]  0.0-421.3 sec    100 MBytes  1.99 Mbits/sec
> [ 25]  0.0-424.8 sec    100 MBytes  1.97 Mbits/sec
> [ 17]  0.0-426.4 sec    100 MBytes  1.97 Mbits/sec
> [ 52]  0.0-427.7 sec    100 MBytes  1.96 Mbits/sec
> [ 20]  0.0-436.2 sec    100 MBytes  1.92 Mbits/sec
> [ 29]  0.0-437.5 sec    100 MBytes  1.92 Mbits/sec
> [ 33]  0.0-438.6 sec    100 MBytes  1.91 Mbits/sec
> [ 37]  0.0-438.6 sec    100 MBytes  1.91 Mbits/sec
> [ 40]  0.0-438.7 sec    100 MBytes  1.91 Mbits/sec
> [ 14]  0.0-440.1 sec    100 MBytes  1.91 Mbits/sec
> [ 15]  0.0-441.5 sec    100 MBytes  1.90 Mbits/sec
> [ 22]  0.0-441.7 sec    100 MBytes  1.90 Mbits/sec
> [ 35]  0.0-442.7 sec    100 MBytes  1.89 Mbits/sec
> [ 48]  0.0-443.2 sec    100 MBytes  1.89 Mbits/sec
> [ 26]  0.0-444.8 sec    100 MBytes  1.89 Mbits/sec
> [ 16]  0.0-447.0 sec    100 MBytes  1.88 Mbits/sec
> [ 49]  0.0-447.6 sec    100 MBytes  1.87 Mbits/sec
> [ 12]  0.0-449.5 sec    100 MBytes  1.87 Mbits/sec
> [ 10]  0.0-450.3 sec    100 MBytes  1.86 Mbits/sec
> [ 46]  0.0-451.0 sec    100 MBytes  1.86 Mbits/sec
> [ 18]  0.0-452.9 sec    100 MBytes  1.85 Mbits/sec
> [ 19]  0.0-454.2 sec    100 MBytes  1.85 Mbits/sec
> [ 30]  0.0-454.5 sec    100 MBytes  1.85 Mbits/sec
> [ 21]  0.0-456.2 sec    100 MBytes  1.84 Mbits/sec
> [  7]  0.0-456.4 sec    100 MBytes  1.84 Mbits/sec
> [ 44]  0.0-456.9 sec    100 MBytes  1.84 Mbits/sec
> [ 42]  0.0-458.9 sec    100 MBytes  1.83 Mbits/sec
> [ 45]  0.0-458.9 sec    100 MBytes  1.83 Mbits/sec
> [  4]  0.0-459.3 sec    100 MBytes  1.83 Mbits/sec
> [  3]  0.0-461.5 sec    100 MBytes  1.82 Mbits/sec
> [ 36]  0.0-462.2 sec    100 MBytes  1.81 Mbits/sec
> [ 13]  0.0-462.7 sec    100 MBytes  1.81 Mbits/sec
> [ 34]  0.0-463.0 sec    100 MBytes  1.81 Mbits/sec
> [ 43]  0.0-463.1 sec    100 MBytes  1.81 Mbits/sec
> [  9]  0.0-463.5 sec    100 MBytes  1.81 Mbits/sec
> [  8]  0.0-463.9 sec    100 MBytes  1.81 Mbits/sec
> [  5]  0.0-464.2 sec    100 MBytes  1.81 Mbits/sec
> [ 41]  0.0-465.1 sec    100 MBytes  1.80 Mbits/sec
> [ 23]  0.0-465.3 sec    100 MBytes  1.80 Mbits/sec
> [SUM]  0.0-465.3 sec  4.88 GBytes  90.2 Mbits/sec
>
> real    7m45.296s
> user    0m0.780s
> sys     0m9.636s
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-02 16:50       ` Per Dalén
@ 2013-09-03  7:06         ` Claudiu Manoil
  2013-09-03  7:55           ` Per Dalén
  0 siblings, 1 reply; 10+ messages in thread
From: Claudiu Manoil @ 2013-09-03  7:06 UTC (permalink / raw)
  To: Per Dalén; +Cc: netdev

On 9/2/2013 7:50 PM, Per Dalén wrote:
> On 09/02/2013 05:35 PM, Claudiu Manoil wrote:
>> On 9/2/2013 5:37 PM, Per Dalén wrote:
>>>> The proposed fix is currently under review:
>>>> http://patchwork.ozlabs.org/patch/271242/
>>>> "gianfar: Fix reported number of sent bytes to BQL"
>>>>
>>>> Does this one work for you? You might need to pull
>>>> one recent gianfar clean-up patch from net-next in order
>>>> to apply this one without incidents.
>>>>
>>>
>>> No, still the same error:
>>>
>>> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
>>> ------------[ cut here ]------------
>>> WARNING: at net/sched/sch_generic.c:256
>>> Modules linked in:
>>> NIP: c039ce0c LR: c039ce0c CTR: c02ef578
>>> ...
>>> 4e800421 80fe0244 4bffff40 7fc3f378 4bfea9d1 7fc4f378 7c651b78 3c60c055
>>> 7fe6fb78 38635304 4cc63182 480fcddd <0fe00000> 39200001 993c0a3c
>>> 4bffffb4
>>> ---[ end trace 5f5e1e3c30024010 ]---
>>>
>>>
>>
>> Tried to reproduce the issue with a recent net-next kernel (Linux
>> p2020rdb 3.11.0-rc6) + BQL fix patch (http://patchwork.ozlabs.org/patc
>> /271242/), but the iperf test finished without incidents (see log
>> below).
>> Will try if the problem is apparent without the fix patch, on the same
>> net-next kernel (3.11.0-rc6).
>>
>
> I was able to reproduce it on our card and the P2020RDB using 3.11.0-rc7
> + David Miller's -next networking tree and your patch
> (http://patchwork.ozlabs.org/patch/271242/).
>
> root@p2020rdb:~# iperf -s

Ok, I see, iperf -s on P2020. This way I was able to get the tx timeout
too.  With iperf -c on P2020 it doesn't come up.  Now it'll be
interesting to find out what BQL/ BQL integration in gianfar has to do
with this.

Thanks.
claudiu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-03  7:06         ` Claudiu Manoil
@ 2013-09-03  7:55           ` Per Dalén
  2013-09-03 15:42             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Per Dalén @ 2013-09-03  7:55 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: netdev

On 09/03/2013 09:06 AM, Claudiu Manoil wrote:
> On 9/2/2013 7:50 PM, Per Dalén wrote:
>> On 09/02/2013 05:35 PM, Claudiu Manoil wrote:
>>> On 9/2/2013 5:37 PM, Per Dalén wrote:
>>>>> The proposed fix is currently under review:
>>>>> http://patchwork.ozlabs.org/patch/271242/
>>>>> "gianfar: Fix reported number of sent bytes to BQL"
>>>>>
>>>>> Does this one work for you? You might need to pull
>>>>> one recent gianfar clean-up patch from net-next in order
>>>>> to apply this one without incidents.
>>>>>
>>>>
>>>> No, still the same error:
>>>>
>>>> NETDEV WATCHDOG: eth0 (fsl-gianfar): transmit queue 0 timed out
>>>> ------------[ cut here ]------------
>>>> WARNING: at net/sched/sch_generic.c:256
>>>> Modules linked in:
>>>> NIP: c039ce0c LR: c039ce0c CTR: c02ef578
>>>> ...
>>>> 4e800421 80fe0244 4bffff40 7fc3f378 4bfea9d1 7fc4f378 7c651b78 3c60c055
>>>> 7fe6fb78 38635304 4cc63182 480fcddd <0fe00000> 39200001 993c0a3c
>>>> 4bffffb4
>>>> ---[ end trace 5f5e1e3c30024010 ]---
>>>>
>>>>
>>>
>>> Tried to reproduce the issue with a recent net-next kernel (Linux
>>> p2020rdb 3.11.0-rc6) + BQL fix patch (http://patchwork.ozlabs.org/patc
>>> /271242/), but the iperf test finished without incidents (see log
>>> below).
>>> Will try if the problem is apparent without the fix patch, on the same
>>> net-next kernel (3.11.0-rc6).
>>>
>>
>> I was able to reproduce it on our card and the P2020RDB using 3.11.0-rc7
>> + David Miller's -next networking tree and your patch
>> (http://patchwork.ozlabs.org/patch/271242/).
>>
>> root@p2020rdb:~# iperf -s
>
> Ok, I see, iperf -s on P2020. This way I was able to get the tx timeout
> too.  With iperf -c on P2020 it doesn't come up.  Now it'll be
> interesting to find out what BQL/ BQL integration in gianfar has to do
> with this.

Yes, it's weird. The only reason I removed the BQL commit 
(d8a0f1b0af67679bba886784de10d8c21acc4e0e) was because the error Tino 
Keitel had was the similar as mine.

>
> Thanks.
> claudiu
>
>

Thanks,
Per

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-03  7:55           ` Per Dalén
@ 2013-09-03 15:42             ` Eric Dumazet
  2013-09-03 16:09               ` Claudiu Manoil
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2013-09-03 15:42 UTC (permalink / raw)
  To: Per Dalén; +Cc: Claudiu Manoil, netdev

On Tue, 2013-09-03 at 09:55 +0200, Per Dalén wrote:

> Yes, it's weird. The only reason I removed the BQL commit 
> (d8a0f1b0af67679bba886784de10d8c21acc4e0e) was because the error Tino 
> Keitel had was the similar as mine.


I suspect a genuine race in this driver. BQL only makes this race happen
more often.

gfar_poll_sq() has the following :

/* run Tx cleanup to completion */
if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx])
        gfar_clean_tx_ring(tx_queue);

While gfar_poll() has a different method :

if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) { 
    gfar_clean_tx_ring(tx_queue);
    has_tx_work = 1;
}

Note the has_tx_work use in gfar_poll() only.

Note that memory barriers seems to be missing.

1) In your cases, is it gfar_poll_sq() or gfar_poll() that is used ?

2) Is the bug happening if only one CPU is used ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-03 15:42             ` Eric Dumazet
@ 2013-09-03 16:09               ` Claudiu Manoil
  2013-09-03 19:33                 ` Per Dalén
  0 siblings, 1 reply; 10+ messages in thread
From: Claudiu Manoil @ 2013-09-03 16:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Per Dalén, netdev



On 9/3/2013 6:42 PM, Eric Dumazet wrote:
> On Tue, 2013-09-03 at 09:55 +0200, Per Dalén wrote:
>
>> Yes, it's weird. The only reason I removed the BQL commit
>> (d8a0f1b0af67679bba886784de10d8c21acc4e0e) was because the error Tino
>> Keitel had was the similar as mine.
>
>
> I suspect a genuine race in this driver. BQL only makes this race happen
> more often.
>
> gfar_poll_sq() has the following :
>
> /* run Tx cleanup to completion */
> if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx])
>          gfar_clean_tx_ring(tx_queue);
>
> While gfar_poll() has a different method :
>
> if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) {
>      gfar_clean_tx_ring(tx_queue);
>      has_tx_work = 1;
> }
>
> Note the has_tx_work use in gfar_poll() only.
>
> Note that memory barriers seems to be missing.
>
> 1) In your cases, is it gfar_poll_sq() or gfar_poll() that is used ?
>

It's gfar_poll_sq(). P2020 single Tx/Rx queues.

I'm also seeing carrier errors and packet collisions in this case
(100/Half link).

> 2) Is the bug happening if only one CPU is used ?
>

I didn't try this. Maybe Per did?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BQL support in gianfar causes network hiccup
  2013-09-03 16:09               ` Claudiu Manoil
@ 2013-09-03 19:33                 ` Per Dalén
  0 siblings, 0 replies; 10+ messages in thread
From: Per Dalén @ 2013-09-03 19:33 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Eric Dumazet, Per Dalén, netdev

>
>
> On 9/3/2013 6:42 PM, Eric Dumazet wrote:
>> On Tue, 2013-09-03 at 09:55 +0200, Per Dalén wrote:
>>
>>> Yes, it's weird. The only reason I removed the BQL commit
>>> (d8a0f1b0af67679bba886784de10d8c21acc4e0e) was because the error Tino
>>> Keitel had was the similar as mine.
>>
>>
>> I suspect a genuine race in this driver. BQL only makes this race happen
>> more often.
>>
>> gfar_poll_sq() has the following :
>>
>> /* run Tx cleanup to completion */
>> if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx])
>>          gfar_clean_tx_ring(tx_queue);
>>
>> While gfar_poll() has a different method :
>>
>> if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) {
>>      gfar_clean_tx_ring(tx_queue);
>>      has_tx_work = 1;
>> }
>>
>> Note the has_tx_work use in gfar_poll() only.
>>
>> Note that memory barriers seems to be missing.
>>
>> 1) In your cases, is it gfar_poll_sq() or gfar_poll() that is used ?
>>
>
> It's gfar_poll_sq(). P2020 single Tx/Rx queues.
>
> I'm also seeing carrier errors and packet collisions in this case
> (100/Half link).
>
>> 2) Is the bug happening if only one CPU is used ?
>>
>
> I didn't try this. Maybe Per did?
>

I use both CPUs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-09-03 19:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-02 13:21 BQL support in gianfar causes network hiccup Per Dalén
2013-09-02 13:53 ` Claudiu Manoil
2013-09-02 14:37   ` Per Dalén
2013-09-02 15:35     ` Claudiu Manoil
2013-09-02 16:50       ` Per Dalén
2013-09-03  7:06         ` Claudiu Manoil
2013-09-03  7:55           ` Per Dalén
2013-09-03 15:42             ` Eric Dumazet
2013-09-03 16:09               ` Claudiu Manoil
2013-09-03 19:33                 ` Per Dalén

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.